Hi all!
I need a multi-view version of the solvePnP function.
Qualitively:
We want to resolve the pose (rotation + translation) of an object in space using projections of features of that object onto multiple image planes. Each image plane represents a calibrated camera at fixed locations in the world (for each we have a question on applied use of OpenCV, I hope it's ok to ask here.
If we have:
- A calibrated stereo camera pair (mounted to a robot arm)
- Corner priori : cameraMatrix, distortionCoefficients, rotation, translation). The object is covered in markers
(think 1x1 checkerboards) (e.g. 10-20 markers) which can be seen and identified in the scene which we can find to subpixel precision with cameras and are at known 3D position.positions in object space. Using the correspondences of 3D points in object space to 2D points in projected image space for each camera, we must reliably (and quickly) discover the rotation and translation for the object. Quantitively:
- Inputs
- Set[ intrinsics, extrinsics ] views // size N
- Set[ Set[ObjectPoints], Set[ImagePoints] ] // size N
Is there an routine to find the extrinsics of the stereo pair relative to the scene?
(the extrinsics of the stereo pair to each other is already known)
With
Outputs
- rotation of object // 3-vector
- translation of object // 3-vector
I have some notes at:
https://paper.dropbox.com/doc/KC35-stereoSolvePnP-pseudo-code-14VMJDF9W8UhMxVOGdCEZ
And posed a freelancer posting at:
https://www.upwork.com/jobs/~01b0f0c4105c0652da
Using
a single camera
I could use solvePnP/RANSAC(...)
. But this
wouldn't make the most of the stereo pair. Is there a routine which could simultaneously solve reprojection for 2 cameras to find the translation+rotation of the cameras to the scene?
(I presume such a routine might be used by a Stereo SLAM routine)
is possible using the solvePnP function (which optimises the rotation and translation so that the projections of object points match the observed points on the image plane) Template for the function could be:
double solvePnPMultiView(vector<vector<cv::Point3f>> objectPointsPerView
, vector<vector<cv::Point2f>> imagePointProjectionsPerView
, vector<cv::Mat> cameraMatrixPerView
, vector<cv::Mat> distortionCoefficientsPerView
, vector<cv::Mat> translationPerView
, vector<cv::Mat> rotationVectorPerView
, cv::Mat & objectRotationVector
, cv::Mat & objectTranslation
, bool useExtrinsicGuess);
//same function but with different data format
double solvePnPMultiView(vector<vector<cv::Point3f>> objectPointsPerView
, vector<vector<cv::Point2f>> undsitortedImagePointProjectionsPerView
, vector<cv::Mat> rectifiedProjectionMatrixPerView
, cv::Mat & objectRotationVector
, cv::Mat & objectTranslation
, bool useExtrinsicGuess);
//specific version for stereo (calls one of the functions above)
double solvePnPStereo(vector<cv::Point3f> objectPointsObservedInCamera1
, vector<cv::Point2f> projectedImagePointsObservedInCamera1
, vector<cv::Point3f> objectPointsObservedInCamera2
, vector<cv::Point2f> projectedImagePointsObservedInCamera2
, cv::Mat cameraMatrix1
, cv::Mat distortionCoefficientsCamera1
, cv::Mat cameraMatrix2
, cv::Mat distortionCoefficientsCamera2
, cv::Mat & objectRotationVector
, cv::Mat & objectTranslation
, bool useExtrinsicGuess);
stereoSolvePnp(vector<cv::Point3f> objectPoints1
, vector<cv::Point2f> imagePoints1
, vector<cv::Point3f> objectPoints2
, vector<cv::Point3f> imagePoints2
, cv::Mat cameraMatrix1 // from calibrateCamera
, cv::Mat distortionCoefficients1 // from calibrateCamera
, cv::Mat cameraMatrix2 // from calibrateCamera
, cv::Mat distortionCoefficients2 // from calibrateCamera
, cv::Mat translationCamera1ToCamera2 // from stereoCalibrate
, cv::Mat rotationCamera1ToCamera2// from stereoCalibrate
, cv::Mat outputTranslation // object to camera1
, cv::Mat outputRtoation // object to camera2
);
Bonus (these functions would all call the same code internally, but just have different ways of being used)
Notes:
- Routine should take less than 3ms on Core i7 for 2 views with 10 object points each.
- Ideally don't use any libraries other than OpenCV (would be even be great to PR this into OpenCV)
- I think OpenCV's only numerical solver is CvLevMarq which is C only, but we'd like to use C++ style where possible
- Correctly calculating the derivatives for
either: - Using previous frames of extrinsics data to encourage smooth filtered motion.
the solver is essential for reliability and speed - A (non-realtime)
This routine for refining the scene data (3D position of the features). NB : Presuming I know where all these markers are in 3D space, then I can use the robot arm pose to estimate where they
will be seen employed in the image space of each camera, an open-source motion capture system which we will use for our artworks. Please see http://www.kimchiandchips.com for reference of those artworks and then use cornerSubPix(...)
to find them accurately.https://github.com/elliotwoods/ofxRulr/tree/MoCap/Plugin_MoCap/src/ofxRulr/Nodes/MoCap for example of code so far (by me)
Your work will be credited (and of course paid too! that's the important one here :)
Thank you
Elliot
--EDIT--
I believe the pseudocode could be
//refine extrinsic parameters using iterative algorithm
CvLevMarq solver( 6 parameters);
while (solver.update(parameters, error, jacobian) != COMPLETED)
{
rotationObjectToCamera1 = parameters[0..2];
translationObjectToCamera1 = parameters[3..5];
error = 0;
cvProjectPoints2( objectPoints
, rotationObjectToCamera1
, translationObjectToCamera1
, cameraMatrix1
, distortionCoefficients1
, calculatedImagePoints1, jacobian);
error += distance(imagePoints1 - calculatedImagePoints1);
rotationObjectToCamera2 = f(rotationObjectToCamera1, translationObjectToCamera1, rotationStereoPair, translationStereoPair);
translationObjectToCamera2 = g(rotationObjectToCamera1, translationObjectToCamera1, rotationStereoPair, translationStereoPair);
cvProjectPoints2( objectPoints
, rotationObjectToCamera2
, translationObjectToCamera2
, cameraMatrix2
, distortionCoefficients2
, calculatedImagePoints2, jacobian);
error += distance(imagePoints2 - calculatedImagePoints2);
}
The next step is to find functions f
and g
. (i.e. how to chain together rotations and translations). Perhaps I can put that as a separate question on here.
(more detailed notes at https://paper.dropbox.com/doc/KC35-stereoSolvePnP-pseudo-code-14VMJDF9W8UhMxVOGdCEZ )
EDIT : paid job available to resolve this : https://www.upwork.com/jobs/~01b0f0c4105c0652da