Handwritten Equation Recognition/Classification: June 2012

This is the closing blog post! The quarter has gone fast!

All the code is and will continue to be available here

I'm going to write a bit about previous work (because I never wrote a post about that), list some papers I found helpful, go over each part of the pipeline and conclude with what I learned and what I would do different if I had a paradox-free time machine.

Previous Work:

The abstract for our project proposal was:

While handwriting provides an efficient means to write mathematical symbols quickly, it is a poor medium for rapid exchange and editing of documents. Meanwhile, advanced typesetting systems like LaTeX and MathML have provided an environment where mathematical symbols can be typeset with precision, but at the cost of typing time and a steep learning curve. In order to facilitate the exchange, preservation and ease of editing of mathematical documents, we propose a method of offline handwritten equational recognition. Our system takes a handwritten document, for example a students calculus homework, then partitions, classifies and parses the document into LaTeX.

The following two links show similar but on-line versions of what we were trying to accomplish:

http://webdemo.visionobjects.com/equation.html
http://detexify.kirelabs.org/classify.html

In reading through papers, I found that this project was highly ambitious. In survey papers, repetitively it was mentioned that classification and parsing verification were mostly tackled as separate research.

While there is the standard MNIST dataset for digits, there didn't seem to be as standard a dataset for handwritten mathematical symbols. The closest thing to that is the infty project. This dataset was used in some of the papers we read to research the project, unfortunately we never got the dataset to work.

Papers That Helped the Most:

Entire project:
This was the paper we read for beginning the project:
http://www.eecs.ucf.edu/courses/cap6938/fall2008/penui/readings/matsakis-MEng-99.pdf

Classification/Feature Descriptors
http://hal.archives-ouvertes.fr/docs/00/54/85/12/PDF/hog_cvpr2005.pdf
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
http://eprints.pascal-network.org/archive/00004062/01/ShalevSiSr07.pdf
http://eprints.pascal-network.org/archive/00003046/01/bosch07a.pdf

Verification/Parsing:
http://www.springerlink.com/content/6217472316636n13/fulltext.pdf?MUD=MP

Synthetic data (something we considered but abandoned):
http://www.stanford.edu/~acoates/papers/coatesetal_icdar_2011.pdf

Meta-links (These links had links to useful papers):
http://www.inftyproject.org/en/articles_ocr.html
http://www.handwritten.net/mv/papers/

My Work:

Segmentation:

There were two schemes that were attempted for segmentation.

The two methods were based on the following two stackoverflow posts:

http://stackoverflow.com/questions/5305712/how-to-perform-character-segmentation-in-matlab

http://stackoverflow.com/questions/6972918/detect-a-rectangle-bound-of-an-character-or-object-in-black-white-or-binary-imag

Both are based on finding connected-components.

This caused a major problem where it couldn't detect symbols with multiple disconnected components, most egregiously '='.

This was annoying but in the interest of finishing on time we postponed fixing this with possibly implementing a sliding window way of detecting symbols.

Extraction:

Once the bounding boxes were found by the segmenter, we had to extract the symbols. This involved embedding the symbols in new matrices, and outputting positional information how each character related to previous ones. By calculating the centroid of a bounding box, we were able to output this information for help with parsing.

Classification:

We spent the first few weeks toying with logistic regression. After we implemented cross-validation we found the classifier worked abysmally.

Later, we switched to SVMs and used vl_feat. By modifying example code for object recognition, we were able to increase accuracy tremendously.

The details of this switch are described in this earlier post.

Parsing:

Parsing/Verificaiton ended up being an interesting project in itself. After A week or two of hacky and false starts, I was able to out together something with limited functionality using PLY, a python implementation of lex/yacc.

Future:

Despite this being the 'final post', there are a few things we're still working on.

Segmentation will probably not be modified this late.

Extraction isn't tested completely or integrated into the parser yet.

Classification has a few small adjustments to be tested.

Parsing can't handle all subscripts/superscripts yet, it fails in some edge cases like when multiple variables have both superscripts and subscripts one after the other.

What I've Learned/Would Do Different!:

I took this class at the same time as CSE141/141L. 141L is another class with a major project and in the first few weeks I had trouble balancing it. Time management is very important, and I think if I had planned more of the quarter earlier I could have gotten further.

I regret not getting segmentation for the '=' sign!!

I regret never using mechanical turk to generate a dataset.

I've learned a lot about SVM's, and working on a large open-ended project.

Scope and focus have been recurring themes. As the quarter progressed, I had to adjust the scope of what I wanted to accomplish while attempting to focus on the parts that were most essential.

Conclusion:

Thanks to everyone in class for their helpful comments and advice.

This post hasn't had any pictures, so please enjoy this picture of superman programming:

Pictured: Computer Vision

Handwritten Equation Recognition/Classification

Wednesday, June 6, 2012

!Closing Remarks!

Monday, June 4, 2012

Parsing update

Contributors