Block matching CPU utilization - 2 processes faster than 2 threads
Hi,
I have an algorithm which is using the Open CV block matching algorithm internally in order to calculate the Disparity map between 2 separate stereo channels. Everything is working fine, results wise.
However, I have not yet understood how come CPU utilization is so low, because the algorithm is very parallel as far as I saw. on a modern i7 with 8 threads, I see ~30% CPU utilization.
But, the specific problem I have is weirder than that. I have 2 stereo channels (4 cameras overall): Channel 1 running on main thread (Or worker thread, it makes no diff) Channel 2 running on a worker thread
CPU utilization this way is ~30%.
But, and here is the really weird issue: If I'm building 2 separate EXEs, 1 EXE for each channel and running both EXEs in parallel, I see better results per channel, and CPU utilization is doubled, ~60%
I wonder how that is even possible. Threads should be faster than processes as far as I know.
The funny thing is that if I add to the 2 EXEs a 3rd EXE with both channels, all running at the same time, CPU is then utilized to ~85% but I see some ~35% reduction in the time it take for the 2 channels process to run (But I guess that is expected)
Thanx for any help!
opencv is using massive data parallelization internally already. if not all of your cores are maxed out already, the devs there are doing something wrong.
usually, it's not worth/feasible trying task parallelization on top of it.
I see. But honestly I don't know if I should expect 100% CPU usage with Open CV Block matching algorithm. Even in a standalone, command line based program that do nothing but prepare Open CV BM instance and run 2 static stereo images on it, CPU here is at ~15% only (But performance are great). So I believe that the OpenCV BM has at least some sequential limitations. But still, that doesn't explain how come 2 processes can utilize more CPU than 2 threads doing ~same main task.