Sunday, April 8, 2012

Current Classifier results


I’ve written code that takes our images and creates a data_x matrix where the rows are character samples and a data_y matrix of corresponding class identities. We fed these into code I wrote for the stanford ml-class. The code builds an one-vs-all classifier using regularized logistic regression.

Originally it reported a 80.7% true positives rating on the training set, but after Oren’s preprocessing script removed some noise true positives shot up to 90.8%!



These numbers are exciting but they don't actually mean much because we didn't do cross-validation or have a training/test set division. This was mostly to make sure the toy dataset would work with the classifier. Adding correct methodology and increasing accuracy are both high priorities now.

There are 20 classes that have 50 samples each, for a total of 1000 samples.

Below is the confusion matrix:




row/col legend:
1 +
2 (
3 )
4 =
5 f
6 -
7 /
8 ^
9 alpha
10 x
11.      0
12.      1
13.      2
14.      3
15. 4
16.      5
17.      6
18.      7
19.      8
20.     9

Classifier TODO:
Look into ways to increase accuracy.
Cross validation
Different lambda parameters
Run the code on a larger dataset.
Consider alternate ways to classify (SVM?)

No comments:

Post a Comment