Ask Your Question
0

Transparent API performance discrepancy

asked 2018-07-24 14:22:07 -0600

drcurry gravatar image

Hello everyone, I'm using OpenCV's TAPI, I have 2 different computers. Laptop1 has an Radeon 8690m with OpenCL C 1.1. Laptop2 has a GTX 1050 (384.130 power setting set to prefer GPU) with OpenCL C 1.2. Both laptops have been compiled with the exact same setting except for the addition of CUDA 8 on laptop2. Both are also running the exact same code on the same version of OpenCV(3.4.2-dev) on Ubuntu 16.04. The OCL module recognizes both GPUs.

Image I'm loading in is a 2048x1536 color

code snippet:

cv::Mat m = cv::imread("img.jpg");//2048x1536
cv::UMat u;

while(1){
   m.copyTo(u)
   startTime();
   cv::GaussianBlur(u, u, cv::Size(3x3), 3.5)
   endTime();
}

on Laptop 1 I get ~24ms. On Laptop2 I get ~34ms. I let the loop run for a couple of seconds. Now the fun begins. On Laptop2 if I change GaussianBlur to use m instead of u my time is ~20ms, while Laptop 1 gets worse performance as expected (complete opposites). Is there some implementation under the hood that could be affecting the performance or is there some other issue? Thanks

edit retag flag offensive close merge delete

Comments

1

With opencl don't use first loop to estimate time : first loop =compile opencl kernel + gaussianblur

LBerger gravatar imageLBerger ( 2018-07-24 15:21:14 -0600 )edit

I didnt, my outputs for time was the first couple were large then the values became stable. So I took the average of the middle few out of 50 values.

drcurry gravatar imagedrcurry ( 2018-07-24 15:34:50 -0600 )edit

I checked gpu utilization using nvidia xsever both versions of code used about ~40% and pcie bandwidth ~6% utilization.

drcurry gravatar imagedrcurry ( 2018-07-24 15:47:53 -0600 )edit

But you are comparing OpenCL GPU versus CPU processing. By processing a single image in a loop, you have the bottleneck of pushing the data to GPU memory every single time, so for me it makes sence that if you use m instead of u, and thus you are running on CPU with RAM memory access, than it runs faster for that single image. GPU's are only faster if you first allocate for example 100 images on GPU memory and process them and get the result back.

StevenPuttemans gravatar imageStevenPuttemans ( 2018-07-25 03:59:07 -0600 )edit

Sure there is a convert Mat->UMat, but I'm timing the actual computation which shouldn't be affected by memory transfers, since the transfer should be completed before the computation starts.

drcurry gravatar imagedrcurry ( 2018-07-25 06:22:55 -0600 )edit

try with this code

LBerger gravatar imageLBerger ( 2018-07-25 06:41:44 -0600 )edit

So I changed how I went about performing the Gaussian blur on Laptop2. Originally I was passing in a 3 channel image 2048x1536. I performed the same test on Laptop1. Same as the code in the Question

Laptop2                                  Laptop1
3ch: Mat:    ~10ms                  ~6ms
3ch: UMat:  ~23ms                    ~10ms

Next I converted the image to single channel using inRange but still timed only the Gaussian.

Laptop2                                Laptop1
1ch: Mat:  ~2ms                      ~4ms
1ch: UMat: ~15ms                     ~8ms
drcurry gravatar imagedrcurry ( 2018-07-25 10:28:28 -0600 )edit

1 answer

Sort by ยป oldest newest most voted
2

answered 2018-07-27 10:20:23 -0600

drcurry gravatar image

The discrepancy was caused by my timer function. It had nothing to do with OpenCV. The timer function I used was not chrono. Sorry for the confusion.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2018-07-24 14:22:07 -0600

Seen: 673 times

Last updated: Jul 27 '18