Originally it reported a 80.7% true positives rating on the training set, but after Oren’s preprocessing script removed some noise true positives shot up to 90.8%!
These numbers are exciting but they don't actually mean much because we didn't do cross-validation or have a training/test set division. This was mostly to make sure the toy dataset would work with the classifier. Adding correct methodology and increasing accuracy are both high priorities now.
There are 20 classes that have 50 samples each, for a total of 1000 samples.
Below is the confusion matrix:
row/col legend:
1 +
2 (
3 )
4 =
5 f
6 -
7 /
8 ^
9 alpha
10 x
11. 0
12. 1
13. 2
14. 3
15. 4
16. 5
17. 6
18. 7
19. 8
20. 9
Classifier TODO:
Look into ways to increase accuracy.
Cross validation
Different lambda parameters
Run the code on a larger dataset.
Consider alternate ways to classify (SVM?)
No comments:
Post a Comment