Validating OpenCV algorithms
introduction
First of all sorry that it took so long to reply, but there was simply no spare time left. Actually validating algorithms is a very interesting topic and it's really not that hard. In this post I'll show how to validate your algorithms (I'll take the FaceRecognizer, because you've asked for it). As always in my posts I will show it with a full source code example, because I think it's much easier to explain stuff by code.
So whenever people tell me "my algorithm performs bad", I ask them:
- What is bad actually?
- Did you rate this by looking at one sample?
- What was your image data?
- How do you split between training and test data?
- What is your metric?
- [...]
My hope is, that this post will clear up some confusion and show how easy it is to validate algorithms. Because what I have learned from experimenting with computer vision and machine learning algorithms is:
- Without a proper validation it's all about chasing ghosts. You really, really need figures to talk about.
All code in this post is put under BSD License, so feel free to use it for your projects.
validating algorithms
One of the most important tasks of any computer vision project is to acquire image data. You need to get the same image data as you expect in production, so you won't have any bad experiences when going live. A very practical example: If you want to recognize faces in the wild, then it isn't useful to validate your algorithms on images taken in a very controlled scenario. Get as much data as possible, because Data is king. That for the data.
Once you have got some data and you have written your algorithm, it comes to evaluating it. There are several strategies for validating, but I think you should start with a simple Cross Validation and go on from there, for informations on Cross Validation see:
Instead of implementing it all by ourself, we'll make use of scikit-learn a great Open Source project:
It has a very good documentation and tutorials for validating algorithms:
So the plan is the following:
- Write a function to read some image data.
- Wrap the
cv2.FaceRecognizer
into a scikit-learn estimator. - Estimate the performance of our
cv2.FaceRecognizer
with a given validation and metric. - Profit!
Getting the image data right
First I'd like to write some words on the image data to be read, because questions on this almost always pop up. For sake of simplicity I have assumed in the example, that the images (the faces, persons you want to recognize) are given in folders. One folder per person. So imagine I have a folder (a dataset) call images
, with the subfolders person1
, person2
and so on:
philipp@mango:~/facerec/data/images$ tree -L 2 | head -n 20
.
|-- person1
| |-- 1 ...
(more)