1 | initial version |
Here's a couple of tips that might help you get it working.
I'm not sure what units OpenGL uses for the focal length, but OpenCV uses pixels. A focal length of 4 pixels doesn't seem very realistic. The focal length should be on the order of 400 pixels or so. If the focal length is too large or too small, the calculated rotation will be incorrect. Page 12 of the original 5-point algorithm paper gives a good example of this.
The findEssentialMat function might not be giving you the right answer. An easy way to verify this is to cross-check it against a calculated essential matrix from the true translation and rotation. The formula for the essential matrix is . The two matrices may a different scale, so it may help to normalize them before comparison. If you aren't getting the correct essential matrix, I would recommend tweaking the RANSAC threshold or use LMEDS. For RANSAC you might try using a threshold around 0.1 pixels. If you use LMEDS you won't need to worry about tweaking the threshold since LMEDS minimizes the median error instead of counting inliers. I would also recommend using more than 8 points to reduce the effects of noise and better distinguish between candidate essential matrices.
Keep in mind that OpenCV defines the rotation and translation in the direction the points move, not the direction the camera moves. For example, the coordinates of a point in the second camera frame can be calculated from its coordinates in the first frame as . This is simple for point translation, but counter-intuitive since we often think about the direction the camera moves, not the direction the points move. If you need the camera transformation you can simply invert the matrix . See also this post.
If you are getting the correct essential matrix, but incorrect rotation and translation, you may need do more than just using the recoverPose function. The essential matrix has two possible rotations and a positive and negative possible translation. These are found using the decomposeEssentialMat function. The recoverPose function uses the cheirality constraint (positive depth) to determine which rotation and translation out of the 4 possible combinations is correct. However, it can sometimes give the wrong answer when there is noise. Additionally, page 5 of the original 5-point algorithm paper points out that the cheirality contraint does not resolve the ambiguity if "all visible points are closer to one [camera] than the other."