All the code is and will continue to be available here
I'm going to write a bit about previous work (because I never wrote a post about that), list some papers I found helpful, go over each part of the pipeline and conclude with what I learned and what I would do different if I had a paradox-free time machine.
Previous Work:
The abstract for our project proposal was:
While handwriting provides an efficient means to write mathematical symbols quickly, it is a poor medium for rapid exchange and editing of documents. Meanwhile, advanced typesetting systems like LaTeX and MathML have provided an environment where mathematical symbols can be typeset with precision, but at the cost of typing time and a steep learning curve. In order to facilitate the exchange, preservation and ease of editing of mathematical documents, we propose a method of offline handwritten equational recognition. Our system takes a handwritten document, for example a students calculus homework, then partitions, classifies and parses the document into LaTeX.
The following two links show similar but on-line versions of what we were trying to accomplish:
http://webdemo.visionobjects.com/equation.html
http://detexify.kirelabs.org/classify.html
In reading through papers, I found that this project was highly ambitious. In survey papers, repetitively it was mentioned that classification and parsing verification were mostly tackled as separate research.
While there is the standard MNIST dataset for digits, there didn't seem to be as standard a dataset for handwritten mathematical symbols. The closest thing to that is the infty project. This dataset was used in some of the papers we read to research the project, unfortunately we never got the dataset to work.
Papers That Helped the Most:
Entire project:
This was the paper we read for beginning the project:
http://www.eecs.ucf.edu/courses/cap6938/fall2008/penui/readings/matsakis-MEng-99.pdf
Classification/Feature Descriptors
http://hal.archives-ouvertes.fr/docs/00/54/85/12/PDF/hog_cvpr2005.pdf
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
http://eprints.pascal-network.org/archive/00004062/01/ShalevSiSr07.pdf
http://eprints.pascal-network.org/archive/00003046/01/bosch07a.pdf
Verification/Parsing:
http://www.springerlink.com/content/6217472316636n13/fulltext.pdf?MUD=MP
Synthetic data (something we considered but abandoned):
http://www.stanford.edu/~acoates/papers/coatesetal_icdar_2011.pdf
Meta-links (These links had links to useful papers):
http://www.inftyproject.org/en/articles_ocr.html
http://www.handwritten.net/mv/papers/
My Work:
Segmentation:
There were two schemes that were attempted for segmentation.
The two methods were based on the following two stackoverflow posts:
Both are based on finding connected-components.
This caused a major problem where it couldn't detect symbols with multiple disconnected components, most egregiously '='.
This was annoying but in the interest of finishing on time we postponed fixing this with possibly implementing a sliding window way of detecting symbols.
Extraction:
Classification:
We spent the first few weeks toying with logistic regression. After we implemented cross-validation we found the classifier worked abysmally.
Later, we switched to SVMs and used vl_feat. By modifying example code for object recognition, we were able to increase accuracy tremendously.
The details of this switch are described in this earlier post.
Parsing:
Future:
Despite this being the 'final post', there are a few things we're still working on.
Segmentation will probably not be modified this late.
Extraction isn't tested completely or integrated into the parser yet.
Classification has a few small adjustments to be tested.
Parsing can't handle all subscripts/superscripts yet, it fails in some edge cases like when multiple variables have both superscripts and subscripts one after the other.
What I've Learned/Would Do Different!:
I took this class at the same time as CSE141/141L. 141L is another class with a major project and in the first few weeks I had trouble balancing it. Time management is very important, and I think if I had planned more of the quarter earlier I could have gotten further.
I regret not getting segmentation for the '=' sign!!
I regret never using mechanical turk to generate a dataset.
I've learned a lot about SVM's, and working on a large open-ended project.
Scope and focus have been recurring themes. As the quarter progressed, I had to adjust the scope of what I wanted to accomplish while attempting to focus on the parts that were most essential.
Conclusion:
Thanks to everyone in class for their helpful comments and advice.
This post hasn't had any pictures, so please enjoy this picture of superman programming:
Pictured: Computer Vision |