1 | initial version |
Well, I tested this out, and you are correct, I don't see much difference. 12.7 vs 11.2 seconds. Until that is, I call setUseOptimized(false) and turn off at least some of the AVX code. Then it becomes 20.2 vs 16.0 seconds. (Summed across 10000 runs)
I suspect a combination of hand coded and compiler optimized AVX code is allowing the float data to be processed nearly as fast as the 16 bit data. The 16bit is still faster though.
As a side note, you should consider using the function cv::initUndistortRectifyMap, as it does exactly what you are doing here, probably faster, and with tests to make sure it works.