Assume I have two independent cameras looking at the same scene (there are features that are visible from both) and that I know the calibration parameters of both the cameras individually (I can also perform stereo calibration at a certain baseline but I don't know if that would be useful). One of the cameras is fixed and stable, the other is noisy in terms of its pose (translation and rotation). As the pose of camera2 keeps changing over time, is it possible to accurately estimate it with respect to the stationary one using image data from both cameras (in opencv)?
I've been doing a little bit of reading, and this is what I've gathered so far:
- Find features using SIFT and the point correspondences.
- Find the fundamental matrix.
- Find essential matrix and perform SVD to obtain the R and t values between the cameras.
Does this approach work on a frame-by-frame basis? And how does the setup help in getting the scale factor? Pointers and suggestions would be very helpful.
Thanks!