Revision history [back]

How can I extract handwritten text from lined paper without the noise caused by the lines to use in a text detection algorithm?

I have been using just a test piece of paper while learning opencv. So far I have taken an image like this: image description

I identify the corners of the page, perform a perspective transform. Subtract a mean shift filtered version of the image to the original: image description

This gives me a white page with very little shadow: image description

This fixed any problems I was having with adaptive threshold. The paper can be any size and in any lighting so I think this step really cleaned up alot of the problems I was having.

The big problem i have now, is that sometimes images will have lines in them and can cause alot of noise as shown image description

I think I can get rid of all the other noise easily but its just that when alot of the letters on the page are very small (in between the lines on the page) it really makes it hard to separate everything. In an ideal world I would like to just have a blank background with all of the written letters and symbols on it, and none of the noise and lines. I do not know if this is possible, but if anyone has an idea on how I can get closer to that I would be extremely grateful. The letters will always have whitespace separating them from another letter so a Complex text recognition algorithm would not be needed in this case if i manage to get all the noise gone. Thanks for your time!