Image Retrieval through fisher vectors: why my implementation works SO BAD?
I'm trying to implement a Content Based Image Retrieval system for small image-dataset. By now, I'm using 1k images (40 categories) from Caltech101.
This is the system workflow:
- For each image
img
compute the SIFT descriptors (usingcv::SIFT
, default paramaters) and save the descriptor matrix instd::vector<cv::Mat> descriptors
- Compute a Gaussian Mixture Model using VLfeat implementation VlGMM and the previously computed descriptors, using the k-means as base algorithm (again, using VLFeat implementation).
- For each
img
, compute the correspondent fisher vector using GMM obtained before, one for each dataset image. - Given the query
q
, compute SIFT descriptors and fisher vectors (using the same GMM of before). - Compute the Euclidean distance between
q
's fisher vector and eachimg
fisher vector from the dataset. - Return the top
k
images, according to the distances obtained from 5.
This is the code from point 2 to point 3 and 5, which are the most important ones:
vl_size totalElements = totalKeypoints * dimension;
float *data = new float[totalElements];
size_t counter = 0;
//save into data all descriptors matrices (tested, it works)
for(size_t i=0; i<descriptors.size(); i++){
std::memcpy(data+counter,descriptors[i].data,descriptors[i].total() * sizeof(float));
counter += descriptors[i].total();
}
VlKMeans * kmeans = vl_kmeans_new (VL_TYPE_FLOAT, VlDistanceL2) ;
vl_kmeans_set_algorithm (kmeans, VlKMeansElkan) ;
vl_kmeans_set_initialization(kmeans, VlKMeansPlusPlus);
vl_kmeans_set_min_energy_variation(kmeans,0.0001);
vl_kmeans_set_num_repetitions(kmeans,3);
VlGMM* gmm = vl_gmm_new(VL_TYPE_FLOAT, dimension, k) ; //k=256, dimension= 128
vl_gmm_set_initialization (gmm,VlGMMKMeans);
vl_gmm_set_kmeans_init_object(gmm, kmeans);
vl_gmm_cluster (gmm, data, totalKeypoints);
delete []data;
int encodingSize = 2 * dimension * k;
std::vector<cv::Mat> codes(n); //n is the dataset size
for(size_t i=0;i<descriptors.size();i++){
float *enc = (float*)vl_malloc(sizeof(float) * encodingSize);
vl_fisher_encode(enc, VL_TYPE_FLOAT,
vl_gmm_get_means(gmm), dimension, k,
vl_gmm_get_covariances(gmm),
vl_gmm_get_priors(gmm),
descriptors[i].data,1,
VL_FISHER_FLAG_IMPROVED
);
codes[i] = cv::Mat(1, encodingSize, CV_32FC1, enc);
}
...
//here we compute query's fisher code in the same way
...
for(int i=0;i<n;i++)
distances[i] = norm(queryCode,codes[i],cv::NORM_L2);//from OpenCV
I hope that this can be considered as MCV code. If you want to see some other section, please let me know it.
The performance are simply HORRIBLE: even if we use an image from the dataset itself, the most similar image is the image itself (but even in that case the distance is 0.0201524
!), while all the others images are totally uncorellated!
I don't know if this is normal, but it take 263s
for gmm clustering and 0.91s
for creating 1k fisher vectors.
I'm a bit frustrated, I don't know at all how could I improve this. Fisher vector is already an advanced solution, why does it work SO bad?
Trying also the cosine distances with the following code:
static double cosdistance(const cv::Mat &testFeature, const cv::Mat &trainFeature)
{
double a = trainFeature.dot(testFeature);
double b = trainFeature.dot(trainFeature);
double c = testFeature.dot(testFeature);
return -a / sqrt(b*c);
}
...
for(int i=0;i<n;i++)
distances[i] = cosdistance(queryCode,codes[i]);
Didn't give better results ...