I'm doing some simple benchmarking and comparing the cost of transferring data from host to GPU and back. Here's a paraphrasing of the snippet that's acting up:
Mat lImage( 720, 1280, CV_8UC3, Scalar( 100, 250, 30 ) );
UMat lUImage;
lUImage = lImage.getUMat(ACCESS_READ); /* This is fast */
// lImage.copyTo( lUImage ); /* This is SLOW */
cvtColor( lUImage, lUDestImage, COLOR_BGR2YCrCb );
lNumGpuCopyConverts++;
// lImage = lUImage.getMat( ACCESS_READ ); /* This is SLOW */
// lUImage.copyTo( lImage ); /* This is SLOW */
When I say very slow I'm talking about literally over a minute to do the copy from GPU to CPU. This is an AMD FirePro card and for perspective, using a 720p image will have cvtColor done 170k-180k times per second. Using just the one-way copy it drops to 22k conversions per second. If I copy back to the CPU I don't even get one.
I tested this on a variety of other machines/laptops/etc. and doing the two-way copy seems to be slow-ish but not terrible so I'm assuming there must be either something weird about this card or some misconfiguration on my machine. Does anyone have any ideas about what I could check?
cheers,
Chris