Wednesday, June 6, 2012

!Closing Remarks!

This is the closing blog post! The quarter has gone fast!

All the code is and will continue to be available here

I'm going to write a bit about previous work (because I never wrote a post about that), list some papers I found helpful, go over each part of the pipeline and conclude with what I learned and what I would do different if I had a paradox-free time machine.



Previous Work:




The abstract for our project proposal was:


While handwriting provides an efficient means to write mathematical symbols quickly, it is a poor medium for rapid exchange and editing of documents. Meanwhile, advanced typesetting systems like LaTeX and MathML have provided an environment where mathematical symbols can be typeset with precision, but at the cost of typing time and a steep learning curve. In order to facilitate the exchange, preservation and ease of editing of mathematical documents, we propose a method of offline handwritten equational recognition. Our system takes a handwritten document, for example a students calculus homework, then partitions, classifies and parses the document into LaTeX.



The following two links show similar but on-line versions of what we were trying to accomplish:

http://webdemo.visionobjects.com/equation.html
http://detexify.kirelabs.org/classify.html





In reading through papers, I found that this project was highly ambitious. In survey papers, repetitively it was mentioned that classification and parsing verification were mostly tackled as separate research.

While there is the standard MNIST dataset for digits, there didn't seem to be as standard a dataset for handwritten mathematical symbols. The closest thing to that is the infty project. This dataset was used in some of the papers we read to research the project, unfortunately we never got the dataset to work.





Papers That Helped the Most:


Entire project:
This was the paper we read for beginning the project:
http://www.eecs.ucf.edu/courses/cap6938/fall2008/penui/readings/matsakis-MEng-99.pdf



Classification/Feature Descriptors
http://hal.archives-ouvertes.fr/docs/00/54/85/12/PDF/hog_cvpr2005.pdf
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
http://eprints.pascal-network.org/archive/00004062/01/ShalevSiSr07.pdf
http://eprints.pascal-network.org/archive/00003046/01/bosch07a.pdf

Verification/Parsing:
http://www.springerlink.com/content/6217472316636n13/fulltext.pdf?MUD=MP


Synthetic data (something we considered but abandoned):
http://www.stanford.edu/~acoates/papers/coatesetal_icdar_2011.pdf


Meta-links (These links had links to useful papers):
http://www.inftyproject.org/en/articles_ocr.html
http://www.handwritten.net/mv/papers/ 




My Work:


Segmentation:

There were two schemes that were attempted for segmentation.

The two methods were based on the following two stackoverflow posts:

Both are based on finding connected-components.

This caused a major problem where it couldn't detect symbols with multiple disconnected components, most egregiously '='.

This was annoying but in the interest of finishing on time we postponed fixing this with possibly implementing a sliding window way of detecting symbols. 


Extraction:

Once the bounding boxes were found by the segmenter, we had to extract the symbols. This involved embedding the symbols in new matrices, and outputting positional information how each character related to previous ones. By calculating the centroid of a bounding box, we were able to output this information for help with parsing.


Classification:


We spent the first few weeks toying with logistic regression. After we implemented cross-validation we found the classifier worked abysmally.

Later, we switched to SVMs and used vl_feat. By modifying example code for object recognition, we were able to increase accuracy tremendously.

The details of this switch are described in this earlier post.

Parsing:

Parsing/Verificaiton ended up being an interesting project in itself. After A week or two of hacky and false starts, I was able to out together something with limited functionality using PLY, a python implementation of lex/yacc.



Future:


Despite this being the 'final post', there are a few things we're still working on.

Segmentation will probably not be modified this late.

Extraction isn't tested completely or integrated into the parser yet.

Classification has a few small adjustments to be tested.

Parsing can't handle all subscripts/superscripts yet, it fails in some edge cases like when multiple variables have both superscripts and subscripts one after the other.

What I've Learned/Would Do Different!:

I took this class at the same time as CSE141/141L. 141L is another class with a major project and in the first few weeks I had trouble balancing it. Time management is very important, and I think if I had planned more of the quarter earlier I could have gotten further.

I regret not getting segmentation for the '=' sign!!

I regret never using mechanical turk to generate a dataset.

I've learned a lot about SVM's, and working on a large open-ended project.

Scope and focus have been recurring themes. As the quarter progressed, I had to adjust the scope of what I wanted to accomplish while attempting to focus on the parts that were most essential.


Conclusion:

Thanks to everyone in class for their helpful comments and advice.

This post hasn't had any pictures, so please enjoy this picture of superman programming:


Pictured: Computer Vision





Monday, June 4, 2012

Parsing update

Accuracy with only 3 binary training types




All classes binary



Accuracy didn't increase.

Parsing

My parser now creates a syntax tree as a list of tuples. 

The form it outputs is like this:



('EXP', 
   [('QUANT', ' \\forall ', ' x', 

      ('BINOP', ' \\in', ('VAR', ' x'),

          ('BINOP', ' \\leftrightarrow', ('VAR', ' y'), 

              ('BINOP', ' \\subset', ('VAR', ' x'), ('VAR', ' y')
              )
           )
       )
    )]
)



This works for correctly classified examples from the first order logic.


TODO:

Parser:
   1) Add rules with positional information for {super,sub} scripts
   2) Figure out how to tree and print out LaTeX
   3) Add a dummy rules/node-types for when the classifier makes mistakes
   4) Figure out how to add back-off rules to make more possible trees,
          figure out how to select one as the most probable tree

Classifier:
    Add sigma and integral signs
    cross-validation!!!
    play with different settings.

Extraction:
    Output positional information