The error
Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.
Tells you that the algorithm isn't able to fetch a new positive for classification. During the process, some positive training samples get rejected, certainly if that positive sample is a bit of an outlier. This means that the single sample is alot different of the other positive samples and would generate an overfitting of your model to that specific data.
Another can be due to two positives being to equal to eachother and not differencing enough to be unique positive samples.
What does this mean in practice?
- Imagine you have 200 positive samples in your positives.txt file, which are all put into positives.vec using the create_samples utility and have a fixed w and h.
- In your traincascade you pass along -numPos 200.
- In the first iteration of training, it finds 2 positive samples that are too different of the general positive set, so they get rejected to overcome overfitting. This means that in fact your training pool of data now contains 198 samples, since 2 cannot be used anymore. (OR it could find three samples that are equally unique, and thus removing 2 to only keep 1)
- In the second iteration, when trying to fetch positive samples, you again request 200 unique positive samples, but in fact, you have only 198 left in your dataset.
- The error results that it cannot fetch new positives.
How to fix this error?
- There is some rule about calculating the positive set to be sure, but it is quite difficult to explain.
- What I tell my students, and what works 99% of the time is to give -numPos 90% of your actual positive training dataset. This makes sure that the remaining 10% can be used for extra positive samples when needed.
- In the worst case, if this still stops training, then using 85% works in 100% of my cases I have seen so far.
What does this do to the result now?
- Again you start training, now with -numPos 180 (90%) which makes sure 20 samples remain unique in the set and unused.
- If a training stage rejects positives (and it doesn't always happens) then there are still 20 samples to pick new ones from.
When does this occur the most? It mostly occurs when people use a single sample and use the createsamples utility to transform it. This generates artificial and unnatural examples, which sometimes do not differ enough from eachother to be unique samples. It is better to train your model with unique data, grabbed from real life video/image streams.