TBB not being used for CMat memory copy, just IPP
I'm using 3.4.0 and enabled WITH_TBB, but stepping thro the copyTo and clone procs, they only allow support for IPP.
The memcpy command is slow on one thread for 2K and 4K frames, and I am surprised that TBB isn't used, or should it be?
Any advice would be appreciated.
My guess is that memcpy operation is limited by the memory bandwidth (nowadays, memory transfer is the limiting factor, see the different levels of CPU cache) and parallelization should not improve the operation.
You can always implement your own
copyTo
function if you want.I understand that in modern CPUs each core has its own PCIe channel, which is why when I tested OCV 3.1 and TBB was used to transfer large memory CMats it was much faster than single-threaded. Did TBB get removed from the CMat copy source in 3.3 onwards?
What is
CMat
? There is onlycv::Mat
andcv::UMat
in OpenCV.All the changes are versioned on Github if you want to check.
I meant cv:Mat, sorry for the confusion.
I remember seeing the parallel_for in use on Mat memory copy back when I used OCV 3.1, I thought it was used for Mat copying, but now I am not sure. Digging through all the source code would be dull work, is there a way to quickly check all areas of OCV using parallel_for?
Given the large size of Mat data with FHD and 4K images, and that CPU cores have their own memory channels, using parallel_for would boost performance in apps using hi-res frames.
I tried with Intel IPP, but the performance showed as only 1% better than a straight fast memcpy using _128 instructions on aligned memory transfer.
Here is the code for
copyTo()
in OpenCV 3.1.Thanks, no sign of TBB there, just the IPP, I must have seen the TBB parallel_for somewhere else... think I need a personal memory upgrade! I'm reading-up on fast memory transfers, Intel has a lot of options, but some people report that rep sto is the fastest due to microcode improvement, will have to test it with a modified copyTo, will post if the results are interesting.