Revision history [back]

If you know the realworld dimensions of the block and you have detected its corners you can get a ratio between the number of pixels and realworld dimensions.

One of the most popular books about this topic is "Multiple View Geometry in Computer Vision". You have a maths background, you will love it.

I think SLAM-based approaches are currently the most advanced techniques to estimate a camera position in the real-world, but maybe you should start by simpler techniques.

About Python or C++, I am not experienced with the python version, but your decision can be based on these topics:

Python is considered easier to learn and program than C++.
In theory C++ runs faster than python.
You will find that the official and non-official opencv documentation is more oriented to C++, but you will also find help for python.
Opencv is more complete in C++, but the differences are not that meaningful in my opinion.
If you need to interact if other C++ code, your life will be easier if your code is C++.