Sunday, April 8, 2012

Preprocessing/Dataset progress


I am splitting up the update post into 4 posts by content:

First, this one with preprocessing / dataset stuff.
Next, classifier stuff.
Then, localization/bounding box stuff.
Finally, a brief TODO for the week and notes on version control.

This is for ease of reading/accessing posts by content in the future.

We have begun to play with our toy dataset!

The first task was removing the black grid lines so our samples wouldn’t have a bunch of extraneous noise. Oren wrote a script that did a fantastic job of erasing these lines:

Here is an example of an original dataset image:






















The result of the preprocessing script:


We would like to thank Jeanne Wang for bringing the inftyMDB dataset to our attention. We have downloaded it, and are currently trying to figure out how to use it.
 


The inftyMDB dataset can be found here

In our proposal we planned to use mechanical turk to generate a dataset by the end of the first week,however I feel we need a few more iterations of toy datasets and to use the
inftyMDB dataset before we generate a final dataset.

Current dataset goals include:

Figure out how to use
inftyMDB
possibly make toydataset2
finalize planned range of math symbols for dataset

No comments:

Post a Comment