1 | initial version |
You don't need the 30day evaluation TBB it is available as GPL
TBB essentially allows you to call the same function in parallel on each cpu core/hyperthread. It's useful when you have a function that does the same thing to different data, doesn't depend on the output of other calls of itself and can be split into 4/8/16 blocks.
CUDA (since you have an NVIdia card) runs instructions on the GPU, it can process 100-1000 tasks in parallel but it takes time to get the data onto and off the card - so is useful when you want to process an entire image and have functions that operate only on a local section of the image.
Yes the cuda build of opencv takes along time because CUDA pre-builds the GPU code for different cards ahead of time.