Ask Your Question
0

my classifier using haar cascade can not detect anything

asked 2013-11-20 17:58:52 -0600

ioanna gravatar image

updated 2013-11-23 16:43:23 -0600

I create my own classifier using 90 positives samples and 299 negatives to detect doctor's tool.

I run this command createsamples.exe -info positive/info.txt -vec data/vector.vec -num 90 -w 25 -h 15 to create my samples and that haartraining.exe -data data/cascade -vec data/vector.vec -bg negative/infofile.txt -npos 90 -nneg 299 -nstages 25 -mem 1000 -mode ALL -w 25 -h 15 -nonsym to train my classifier.

When the clasifier trained I get the xml file and use it in my program. I notice that it can not detect anything...

Do anyone know what I am doing wrong?:/

Second problem

When I run my command opencv_traincascade -data data/cascade -vec data/vector.vec -bg neg.txt -numPos 2200 -numNeg 1000 -numStages 25 –featureType LBP -mem 2000 -mode ALL -w 25 -h 15 after the first stage it returned the follow

<BEGIN
OpenCV Error: Bad argument (Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
) in get, file /home/mcn/opencv-2.4.5/apps/traincascade/imagestorage.cpp, line 159
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/mcn/opencv-2.4.5/apps/traincascade/imagestorage.cpp:159: error: (-5) Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
 in function get

Aborted

As I read from http://answers.opencv.org/question/776/error-in-parameter-of-traincascade/ I redude the numPos but the problem is still exist in later stages...

edit retag flag offensive close merge delete

Comments

1

Switch to the newer train cascade algorithm software, which works way better than the haartraining tool. Use LBP features for faster calculation.

StevenPuttemans gravatar imageStevenPuttemans ( 2013-11-21 04:13:51 -0600 )edit
1

@ioanna unless you are detecting a extremely rigid object on a invariable background, you'll need to gather more data to train a classifier. Think thousands, not hundreds.

Pedro Batista gravatar imagePedro Batista ( 2013-11-21 04:35:11 -0600 )edit
1

ok thanks you both I will follow the advises:)

ioanna gravatar imageioanna ( 2013-11-21 06:55:11 -0600 )edit

1 answer

Sort by » oldest newest most voted
6

answered 2013-11-25 02:17:25 -0600

The error

Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.

Tells you that the algorithm isn't able to fetch a new positive for classification. During the process, some positive training samples get rejected, certainly if that positive sample is a bit of an outlier. This means that the single sample is alot different of the other positive samples and would generate an overfitting of your model to that specific data.

Another can be due to two positives being to equal to eachother and not differencing enough to be unique positive samples.

What does this mean in practice?

  1. Imagine you have 200 positive samples in your positives.txt file, which are all put into positives.vec using the create_samples utility and have a fixed w and h.
  2. In your traincascade you pass along -numPos 200.
  3. In the first iteration of training, it finds 2 positive samples that are too different of the general positive set, so they get rejected to overcome overfitting. This means that in fact your training pool of data now contains 198 samples, since 2 cannot be used anymore. (OR it could find three samples that are equally unique, and thus removing 2 to only keep 1)
  4. In the second iteration, when trying to fetch positive samples, you again request 200 unique positive samples, but in fact, you have only 198 left in your dataset.
  5. The error results that it cannot fetch new positives.

How to fix this error?

  1. There is some rule about calculating the positive set to be sure, but it is quite difficult to explain.
  2. What I tell my students, and what works 99% of the time is to give -numPos 90% of your actual positive training dataset. This makes sure that the remaining 10% can be used for extra positive samples when needed.
  3. In the worst case, if this still stops training, then using 85% works in 100% of my cases I have seen so far.

What does this do to the result now?

  1. Again you start training, now with -numPos 180 (90%) which makes sure 20 samples remain unique in the set and unused.
  2. If a training stage rejects positives (and it doesn't always happens) then there are still 20 samples to pick new ones from.

When does this occur the most? It mostly occurs when people use a single sample and use the createsamples utility to transform it. This generates artificial and unnatural examples, which sometimes do not differ enough from eachother to be unique samples. It is better to train your model with unique data, grabbed from real life video/image streams.

edit flag offensive delete link more

Comments

1

Nice explanation, this should be documented somewhere easy to find, since this is such a common and unavoidable issue. I had myself and was a pain to understand what was wrong. People will just naturally set numPos as the same number of positive samples they manage to gather.

Pedro Batista gravatar imagePedro Batista ( 2013-11-25 04:32:32 -0600 )edit

Actually I am collecting a set of remarks and suggestions from around forums, to go together with the train cascade application. I am in the progress of transforming this to some guideline to my students who use the app. Could as well make it into some documentation page. Will see later how to fix that!

StevenPuttemans gravatar imageStevenPuttemans ( 2013-11-25 04:43:05 -0600 )edit
1

Steven, thanks for such a good explanation. Have a similar issue over here, however the solution was a drastic decrease in my numPos sample size. I went from 5272 to 1500. Anything higher than that and it would kick back this error. I thought you would find this interesting. I was wondering your opinion on this. I'm training to detect american license plates at the moment. It's one of the first objects I'm training for. Do you think that images from this link [http://www.plateshack.com/platelist.html] (just American Car Plates) are suitable for training? Or does lacking different background cause issues? Perhaps, could the features on the plates themselves be so different that the software rejects most of my data while creating samples? Cause that's what I'm thinking. Training for LBP.

AbbeFaria gravatar imageAbbeFaria ( 2014-06-10 08:55:50 -0600 )edit
1

I think that what you experiencing is supplying a set of training data which hasn't as much different features to classify. In my experience if you have to drop that much of your positive amount it will certainly mean that 1500 positive samples are enough to classify every single example you supplied. No new positive can be found that adds extra information to the training stage and thus the error is thrown. If the error would be ignored a continous loop would occur creating an infinite loop.

I do think it is indeed mainly due to the background which is fairly similar in all cases.

StevenPuttemans gravatar imageStevenPuttemans ( 2014-06-11 03:13:02 -0600 )edit

Ah, to follow up. I ended up with a very good LBP Cascade that works with stunning accuracy. Thanks for that.

AbbeFaria gravatar imageAbbeFaria ( 2014-06-11 13:24:19 -0600 )edit

I am glad it worked out just well for you!

StevenPuttemans gravatar imageStevenPuttemans ( 2014-06-23 08:02:24 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2013-11-20 17:58:52 -0600

Seen: 6,245 times

Last updated: Nov 25 '13