We've completed adding digits to the dataset.
There is a disunity in our dataset where a few of the symbols were made binary. We're going to test to see if this helps classification by making all the data binary and re-training the classifier.
If accuracy improves, we'll make binary thresholding part of pre-processing. If it doesn't help, we'll revert the data and re-add the now binary images as non-binary.
_______________________________________________________________________________
I've begun work on the parser. As I haven't taken compilers yet , I've found it hard to begin but I've been exploring the problem space.
I've started to play with PLY, a python implementation of lex and yacc. Using this I've made a hacky thing that can substitute some of the characters.
The parser has three overlapping goals:
1) substituting the classifier's coded output with the corresponding LaTeX
0 -> \forall
2) Using the CFG and positional data to make the subscripts and superscripts work:
56 with positional data "6 is to the upper left of 5" -> x^{y}
3) When the parser encounters syntactic errors, the parser should "back-off" the soft classifier results (i.e., the top 3 classes the symbol could be) to a symbol that fits the CFG
I talked to professor Jhala and he gave me some advice:
I have to do 3 steps,
1) Create a parse tree based on the code/position info from the classifier.
2) Evaluate different possible trees by "backing-off" some of the classifications.
3) Read off the tree to translate it to LaTeX
He also pointed me to the following links:
An Ocaml parsing homework assignment for CSE130
Happy, a parser generator for Haskell
A blog post about using Happy to write a compiler that might be useful
No comments:
Post a Comment