All of these form a dictionary with size (20,64). What does this 64 mean?
64 is the size of a single SIFT or SURF descriptor, and your dictionary has 20 of those. all correct, so far ! (though it might need a few more than 20 for a good classification)
BagOfWords classification will use a signature of 20 elements (in your case), a histogram, each bin counts, which dictionary feature was matched by one of your image features.
the next steps will be:
- make BoW signatures from your train images, using BOWImgDescriptorExtractor
- train an SVM (or Knn or ANN) on those
- also make BoW signatures for your test images
- make a prediction for your test signatures
# 1.
sift = cv2.xfeatures2d.SIFT_create()
flann_params = dict(algorithm = 1, trees = 5)
matcher = cv2.FlannBasedMatcher(flann_params, {})
bow_extract = cv2.BOWImgDescriptorExtractor( sift , matcher )
bow_extract.setVocabulary( voc ) # the 64x20 dictionary, you made before
traindata = []
trainlabels = []
#for each train image:
# get keypoints
siftkp = sift.detect(img)
# let the bow extractor find descriptors, and match them to the dictionary
bowsig = bow_extract.compute(im, siftkp)
traindata.extend( bowsig )
trainlabels.append( class_id_of_img ) # a number, from 0 to 20
# 2. create & train the svm
svm = cv2.ml.SVM_create()
svm.train(np.array(traindata), cv2.ml.ROW_SAMPLE, np.array(trainlabels))
# 3. for each test image, you have to repeat the steps from above:
siftkp = sift.detect(img)
bowsig = bow_extract.compute(im, siftkp)
# 4. now you can predict the classid of your img
# (one of the numbers you passed in for the trainlabels):
p = svm.predict(bowsig)