Hi,
I have some scanned pdf documents which I am converting into jpeg. And all these documents are A4 size
Some of these documents have hand-signatures in them and I want to identify these documents.
Approach1:
I tried doing the following steps
Used thresholding (binary)
Calculating a4 constant as my pdf are a4 size(Got this idea from users code in github)
Using morphology with this constant to remove the pixels less than the calculated a4 constant
4.applying inverse threshold with Otsu( THRESH_BINARY_INV | cv2.THRESH_OTSU)
but my results are very poor too many false positives or partial signatures extracted. and also other texts extracted which are not hand written
Approach2:
I did not yet try this but may be this will work
I can use keras to train images with signatures in the images but this is not prefered as of now(Last resort approach)
Approach3:
- Converted the image into binary
- use histograms as i understood from histograms the intensity of a typed charaters are different than handwritten-signatures. But I am not able to find how I can confirm that it is a signature or atleast make sure that there is a signature inside the image Note: I think I have to use contours or some kind of boundings but I am not sure how to do this.
Conclusion:
I have failed in all the above approaches(except 2 which I havent tried yet)
Also I am thinking may be use KNN but need a starting point for it
Any ideas or suggestions much appreciated
Thanks, Dinesh.