information extraction from drug packaging
hello,guys, i have a project, that needs to extract information from drug packaging with camera,i need to judge, this drug is which medicine and what is the dosage and so on,the drug packaging is like following:
i try to use tesseract ocr to recognize text in image, but it only extract the text and can not analyse the text sothat i cant get the desired information。 do you have idea for me? thank you in advance!
It is a difficult task if you want to do it for any drug. And I am pretty sure Tesseract is not able to extact all of the text from this box. Try detecting the text with a deep network, apply Tesseract to it and convert the images to text and do some text analysis in the end.
Even if you manage to recognize the text, it's quite difficult to "understand" its meaning - especially as it's far from being something standard (like a table, an ID card or a license plate).
Often the packaging text contains very little information, so you can't rely on it to determine the dosage, or even the composition.
Maybe it's an easier task (but much more work) to create a database of all the drug boxes and run a matching algorithm to find the one in the picture.
hi,witek,i have already applied a text detector like EAST and text recognition like tesseract to recognize the text of the picture, but i dont what library has it better to analyse the text ? do you know which library can analyse the text?
sorry, I know nothing about text analysis and it is outside of the scope of OpenCV I would say. I think that @kbami's idea is more practical here.
hi, witek, thank you for your answer, i will try the approach from kbami, but i want to say, this approach will be not practical when it has a lot of drug boxes. by the way, i have a idea, if i store the information of drugs not the picture of boxes in database, then use the text matching algorithm to find the recognized text in the database sothat i can determine the infos of the scanned boxs. is it doable? or it will have some problems?
I guess you could do it, but I am certain that successful extraction of text will be nearly impossible. What is the problem of storing a few pictures of each drug box and using mature image matching algorithms? Surely it will take more space, but that is not a problem nowadays, is it?