Ask Your Question
1

Preventing Over-fitting

asked 2015-11-17 05:48:24 -0600

Adi gravatar image

updated 2015-11-17 06:19:06 -0600

I'm using cv::Boost to learn small image patches (Boost::DISCRETE currently gives me the best results).
I noticed that the more example images that I have in my training set, the larger the model/predictor XML file is. It is almost as if the file is storing [some] of the images as samples.
I don't care so much for the file size but I am afraid that this growing effect is due to overfitting, where the classifier does very well on the test set (because it almost keeps an internal copy), and would not generalize well to new images.

How can I ensure that I will avoid over-fitting and good generalization?

I currently use setWeightTrimRate(0.4); to keep the file size low.

edit retag flag offensive close merge delete

Comments

You should use a test set that is not used in the learning step

thdrksdfthmn gravatar imagethdrksdfthmn ( 2015-11-17 05:57:17 -0600 )edit

Of course, but that does not answer my question re: over-fitting. Is there some parameter to control over-fitting?

Adi gravatar imageAdi ( 2015-11-17 06:20:22 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
2

answered 2015-11-17 07:17:57 -0600

thdrksdfthmn gravatar image

As it is marked here, an overfitting is achieved when your prediction is getting worse. So testing on the same unknown data (it should be big enough for detecting the case), if your prediction is less good as the prediction of the model trained on less data, then there is an overfitting. Tell me mode about the number of images you use (test and train)

edit flag offensive delete link more

Comments

1

This is actually the only way to test overfitting in OpenCV. Train several models with increasing complexity, then check their prediction performance on a seperate test set and see where performance goes down as said by @thdrksdfthmn. Keep in mind that you should do this in a cross correlation way and thus vary the test set, because you do not want to select the best decision point on only a single test set, because then it is again overfitted to the test set!

StevenPuttemans gravatar imageStevenPuttemans ( 2015-11-19 06:53:45 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2015-11-17 05:48:24 -0600

Seen: 520 times

Last updated: Nov 17 '15