Handwritten digit recognition from table
Hi.
The big picture of my project is that I have a paper with multiple tables and I need to recgnize those table and all of their cells. I did this no problem.
Then, on one of the tables there are 2 choices you can make, writing a number from 1 to 4 in a box above the table. I managed to recognize the box containing the digit as shown in the attachment (I used threshold canny and countours to find the boxes)
Now what I need to do is recognize the digit in that given box. I tried to train some models on mnist dataset but I don't know how to properly preprocess my images (the boxes) so that such a model may be able to properly predict the digit.
Any idea of how I can approach this problem?
Thanks
EDIT: For anyone wondering how I managed to do it. I took the box with the digit inside of it. My goal was to make it look as much as possible as a mnist digit so what I had to do was get rid of the border. For the border I used 2 structural elements and I created a mask that I applied over the initial image.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
inv = 255 - thresh
horizontal_img = inv.copy()
vertical_img = inv.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (100, 10))
horizontal_img = cv2.erode(horizontal_img, kernel, iterations=1)
horizontal_img = cv2.dilate(horizontal_img, kernel, iterations=2)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10, 100))
vertical_img = cv2.erode(vertical_img, kernel, iterations=1)
vertical_img = cv2.dilate(vertical_img, kernel, iterations=3)
mask_img = horizontal_img + vertical_img
no_border = np.bitwise_or(thresh, mask_img)
Then I transformed all the pixels on the margin to a certain threshold in black pixels because there was a bit of noise that was remaining in some cases
w, h = image.shape
threshold = 20
image[0:threshold, 0:h] = 255
image[w - threshold:w, 0:h] = 255
image[0:h, 0:threshold] = 255
image[0:h, h - threshold:h] = 255
The last thing I did was train a model on mnist dataset where I chose only the digits corresponding to 1, 2,3 and 4 (as those were the only ones I was in need of). The model is a simple Linear SVC (I used hog transform on the image for the feature array) which has about 99% accuracy on mnist test data.
Unfortunately the above approach has only about 87% accuracy (on the 149 test papers I have). Also there is a problem with the mask, sometimes it may break some ones finding a line where it is not supposed to be.
btw, you probably want to retrain your mnist model on 4 digits only