Ask Your Question
2

Object recognition on opencv4android

asked 2013-07-22 23:55:48 -0600

andrei.toader gravatar image

updated 2013-07-24 08:10:38 -0600

I posted this message on StackOverflow and since there seems that nobody knows, i thought i might share it here maybe i can get some help.


This is the question posted on this link:

0 down vote favorite

I am trying to make an app which should recognize hand gestures on Android using opencv. The reason for this is that i want to be able to click on augmented reality bubbles from Metaio(if by chance someone has worked with Metaio) using hand gestures. My approach was to track my hand and recognize the gesture. As for the detector i tested Orb (almost decided on since i didn't find any better),FAST, BRISK (for some unknown reason was poor - 800 ms - worse than SURF - 200 ms)SURF and out of all of them only ORB and FAST gave good results, as for the descriptor either FREAK or ORB again. The results for the descriptors were good, around 80 ms - FREAK and 35 ms for ORB , therefore i would say this part is almost done(Final tests are not yet finished). Considering the latter part closed, i am worried about the matching part. I made a few tests so far - i used BruteForceMatcher and FlannBasedMatcher - and the results were not so good, meaning on pc it needed more than 2-3 secs (one can argue in the case of FlannBasedMatcher that i only trained the FlannBasedMatcher with 30-40 different gestures). My questions for you would be :

  1. Is it possible to achieve on android a matching under 100-200ms? - meaning the entire process to take less than 300ms.

  2. If it's possible, what would be the approach? - i thought of either using BOW + SVM or ANN (i must state i don't have here a clear picture in my hand (i only read this paper http://www.morethantechnical.com/2011/08/25/a-simple-object-classifier-with-bag-of-words-using-opencv-2-3-w-code/)).


Any other suggestion which is helpful would be really appreciated.

edit retag flag offensive close merge delete

Comments

1

About the time for FLANN: Change the third parameter, the multi-scale-level. OpenCV recommends 2, but in my tests I found that 0 while finding less(not much, maybe 5%) correct matches provides a significant speedup to the matching process.

Notas gravatar imageNotas ( 2013-07-24 09:12:41 -0600 )edit

Thanks. I will try that. My approach - this is what i am trying - is to use SVM, train some gestures (make gestures in front of the camera for 5-10min and save all this data into a big file), and then try later to predict based on this. I am still wrapping my head around this since i don't want to waste several hours for nothing (i am not so experience in this - i have only theoretical experience with machine learning)). Second approach use BOW and SVM and meanwhile - aka today - i will try to see if i can improve the speedup of the flann based on your observations. From your experience which one provides a better performance and speed ?.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 02:05:09 -0600 )edit

After i finish this i will post a comment and maybe - i can post my master thesis (it's a bit tricky here since it's done with a company and i am not sure how much i can show) - so that people who will need this for android can have some help. In any case the results will posted.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 02:09:22 -0600 )edit
1

For LSHIndexparams, I found that for my matching project the params 20, 15, 0 (hashtable size, key size, multi probe level) work best. For your case, do you use the epipolar geometry to verify your matching results? Because simply finding the nearest neighbours only yields good results if your images are nearly identical. You need some other tests (quotient of 2 nearest neighbours, symmetry test) and RANSAC with epipolar geometry to filter out bad matches. Code examples for these tests are in the OpenCV cookbook (can be downloaded for free as pdf), search for the RobustMatcher class in the book.

Notas gravatar imageNotas ( 2013-07-25 03:02:06 -0600 )edit

Hi, thanks a lot for the help. Yes i thought about that, actually after finishing the machine learning part the next thing is how to deal with outliers since my approach is not to make a recognition all the time but try to track the keypoints in motions (you can check SurfTRAC), therefore i need a way to check the geometries. But i have no clear concept yet.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 03:35:09 -0600 )edit

@Notas. Your configuration (20, 15, 2)) proved to be good. Thanks a lot. Btw what's exactly the purpose of multi probe level?. It seems after training for more than 3-5min the prediction time is very fast. On the other hand the matching is rather poor so far. Thanks again.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 09:17:59 -0600 )edit

The multi probe level defines how many neighbouring buckets (Finding the nearest neighbours is done using hash tables) are searched to find the nearest neighbours (the descriptions with the lowest distance to the query description). Of course the matching is poor. Simply computing the nearest neighbours isn't enough to get reliable results. For that, you have to use the tests and epipolar geometry I wrote about in my other comment. But there's not much to be done, the code is all in the RobustMatcher class in the OpenCV book.

Notas gravatar imageNotas ( 2013-07-25 09:24:25 -0600 )edit

I mean 20.15, 0. I am trying now all this code. I managed to create a sample app to test 4-5 combinations. SURF, ORB and FREAK(plus FLANN training for these combinations). Btw do you know if BOW + SVM produces good results and very fast as well. I mean in comparison with FLANN. I would like to give it a try very soon (like tomorrow).

andrei.toader gravatar imageandrei.toader ( 2013-07-25 10:22:26 -0600 )edit
1

Sorry, never used BOW/SVM.

Notas gravatar imageNotas ( 2013-07-25 10:29:59 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2013-07-24 06:36:56 -0600

tenta4 gravatar image

The most feature detection algorithms work better and faster in reduced images. You must try use image resolution 320X240 or 480X360.

The ORB gives feature matching worse than SIFT, but this matches you can filter using Ranzac, and results will be satisfactory.

If any one know the better solution, it will be helpful for me too

edit flag offensive delete link more

Comments

2

What is your basis on saying that they work better? If your images have a scale factor, then by using low res images you lose scale invariance! Which means that matching will be worse!

Notas gravatar imageNotas ( 2013-07-24 09:09:48 -0600 )edit

Let's say that ORB is not the problem. Let's say i read about, and this can be handled, the main problem of mine is the recognition time. Like i already stated, i need speed and performance, since i don't want to have bad recognized features and also i don't want to see the results of the recognition way later than expected.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 02:10:19 -0600 )edit
  1. The Surf, ORB feature detectors are invariant to scaling.
  2. Reducing an image allows to reduce the effect of noise on the image. It's easy to understand, whan see in this pictures http://masters.donntu.edu.ua/2012/iii/chigarev/library/images/article1_pic1.png
tenta4 gravatar imagetenta4 ( 2013-07-25 02:24:43 -0600 )edit
1

You don't seem to know how scale invariance works. Look at David Lowe's SIFT. It is done by image pyramids. With your image sizes, you can only reliably cover 2 octaves before the images get too small to be able to compute more than 5 or 10 keypoints.

Plus, the algorithms to compute the keypoint descriptions already use blurring to reduce noise, so an extra blurring is not necessary at all!

Notas gravatar imageNotas ( 2013-07-25 03:08:22 -0600 )edit

@tenta4. SURF is patented and since the project is for a company i would say no. Plus my tests on SURF state that it needs 200 ms for computing the keypoints. Android is a bit slower therefore i would say no (i read both surf and sift in detail). Thanks for the help.

andrei.toader gravatar imageandrei.toader ( 2013-07-25 03:49:42 -0600 )edit

@Notas. Ok. I am not so competent in detail the implementation of SIFT, and said what I read in the article.

tenta4 gravatar imagetenta4 ( 2013-07-25 04:05:15 -0600 )edit

Which article?

Notas gravatar imageNotas ( 2013-07-25 04:19:40 -0600 )edit

page 3, paragraph 2, but it in russian. http://zhurnal.ape.relarn.ru/articles/2007/104.pdf I'm sorry for misinformation

tenta4 gravatar imagetenta4 ( 2013-07-25 04:24:28 -0600 )edit

Aww, too bad it's only in russian.

Notas gravatar imageNotas ( 2013-07-25 05:20:27 -0600 )edit

Question Tools

Stats

Asked: 2013-07-22 23:55:48 -0600

Seen: 1,540 times

Last updated: Jul 24 '13