Object recognition on opencv4android
I posted this message on StackOverflow and since there seems that nobody knows, i thought i might share it here maybe i can get some help.
This is the question posted on this link:
0 down vote favorite
I am trying to make an app which should recognize hand gestures on Android using opencv. The reason for this is that i want to be able to click on augmented reality bubbles from Metaio(if by chance someone has worked with Metaio) using hand gestures. My approach was to track my hand and recognize the gesture. As for the detector i tested Orb (almost decided on since i didn't find any better),FAST, BRISK (for some unknown reason was poor - 800 ms - worse than SURF - 200 ms)SURF and out of all of them only ORB and FAST gave good results, as for the descriptor either FREAK or ORB again. The results for the descriptors were good, around 80 ms - FREAK and 35 ms for ORB , therefore i would say this part is almost done(Final tests are not yet finished). Considering the latter part closed, i am worried about the matching part. I made a few tests so far - i used BruteForceMatcher and FlannBasedMatcher - and the results were not so good, meaning on pc it needed more than 2-3 secs (one can argue in the case of FlannBasedMatcher that i only trained the FlannBasedMatcher with 30-40 different gestures). My questions for you would be :
Is it possible to achieve on android a matching under 100-200ms? - meaning the entire process to take less than 300ms.
If it's possible, what would be the approach? - i thought of either using BOW + SVM or ANN (i must state i don't have here a clear picture in my hand (i only read this paper http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/)).
Any other suggestion which is helpful would be really appreciated.
About the time for FLANN: Change the third parameter, the multi-scale-level. OpenCV recommends 2, but in my tests I found that 0 while finding less(not much, maybe 5%) correct matches provides a significant speedup to the matching process.
Thanks. I will try that. My approach - this is what i am trying - is to use SVM, train some gestures (make gestures in front of the camera for 5-10min and save all this data into a big file), and then try later to predict based on this. I am still wrapping my head around this since i don't want to waste several hours for nothing (i am not so experience in this - i have only theoretical experience with machine learning)). Second approach use BOW and SVM and meanwhile - aka today - i will try to see if i can improve the speedup of the flann based on your observations. From your experience which one provides a better performance and speed ?.
After i finish this i will post a comment and maybe - i can post my master thesis (it's a bit tricky here since it's done with a company and i am not sure how much i can show) - so that people who will need this for android can have some help. In any case the results will posted.
For LSHIndexparams, I found that for my matching project the params 20, 15, 0 (hashtable size, key size, multi probe level) work best. For your case, do you use the epipolar geometry to verify your matching results? Because simply finding the nearest neighbours only yields good results if your images are nearly identical. You need some other tests (quotient of 2 nearest neighbours, symmetry test) and RANSAC with epipolar geometry to filter out bad matches. Code examples for these tests are in the OpenCV cookbook (can be downloaded for free as pdf), search for the RobustMatcher class in the book.
Hi, thanks a lot for the help. Yes i thought about that, actually after finishing the machine learning part the next thing is how to deal with outliers since my approach is not to make a recognition all the time but try to track the keypoints in motions (you can check SurfTRAC), therefore i need a way to check the geometries. But i have no clear concept yet.
@Notas. Your configuration (20, 15, 2)) proved to be good. Thanks a lot. Btw what's exactly the purpose of multi probe level?. It seems after training for more than 3-5min the prediction time is very fast. On the other hand the matching is rather poor so far. Thanks again.
The multi probe level defines how many neighbouring buckets (Finding the nearest neighbours is done using hash tables) are searched to find the nearest neighbours (the descriptions with the lowest distance to the query description). Of course the matching is poor. Simply computing the nearest neighbours isn't enough to get reliable results. For that, you have to use the tests and epipolar geometry I wrote about in my other comment. But there's not much to be done, the code is all in the RobustMatcher class in the OpenCV book.
I mean 20.15, 0. I am trying now all this code. I managed to create a sample app to test 4-5 combinations. SURF, ORB and FREAK(plus FLANN training for these combinations). Btw do you know if BOW + SVM produces good results and very fast as well. I mean in comparison with FLANN. I would like to give it a try very soon (like tomorrow).
Sorry, never used BOW/SVM.