Ask Your Question
0

OpenCV 3.0 ,the performance of UMat

asked 2015-06-07 08:38:31 -0600

Anna Lucia gravatar image

Hi, I'm trying to compare the performance of facedetect computing with OpenCL and without. So I use the Mat and UMat for the two cases. But the UMat is more slow than Mat. The OpenCL runtime is OK on my computer. The question is how can I start the GPU device?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
5

answered 2015-06-07 16:36:43 -0600

Eduardo gravatar image

Hi,

To use OpenCL, I use in addition to cv::UMat:

  • cv::ocl::setUseOpenCL(true);
  • I add an environment variable to set the correct GPU device (see the documentation) as I have an integrated GPU (Intel HD Graphics) and a dedicated GPU: name of the variable: OPENCV_OPENCL_DEVICE ; value of the variable: :GPU:1

Some tests I did for CascadeClassifier::detectMultiScale() using OpenCV-3.0.0-rc1, Windows 7 x64, VS2010 in release mode, image size=1280x720, results on an average of 1000 images:

Only the CPU (Intel Core i7): 12.46 FPS, CPU load: 65%
OpenCL + Intel HD Graphics: 7 FPS, CPU load: 8%, GPU load: 78%, (x0.56)
OpenCL + GPU (nVidia): 13 FPS, CPU load: 25%, GPU load: 70%, (x1.04)
CUDA + GPU: 30 FPS, CPU load: 12%, GPU load: 60%, (x2,4)

On my computer, the gain for OpenCL + GPU is negligible compared to using only the CPU. However, with CUDA + GPU the speed-up is about x2. I did't check if the results are the same for all the version of detectMultiScale.

The code I used for my tests, feel free to add your results to disprove/confirm my results:

#include <iostream>

#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/core/cuda.hpp>
#include <opencv2/cudaobjdetect.hpp>
#include <opencv2/cudaimgproc.hpp>


int main(int argc, char**argv) {
    std::cout << "OpenCV version=" << std::hex << CV_VERSION << std::dec << std::endl;

    cv::Mat frame;
    cv::UMat uframe, uFrameGray;
    cv::cuda::GpuMat image_gpu, image_gpu_gray;
    cv::VideoCapture capture("path_to_the_video");

    bool useOpenCL = (argc >= 2) ? atoi(argv[1]) : false;
    std::cout << "Use OpenCL=" << useOpenCL << std::endl;
    cv::ocl::setUseOpenCL(useOpenCL);

    bool useCuda = (argc >= 3) ? atoi(argv[2]) : false;
    std::cout << "Use CUDA=" << useCuda << std::endl;

    cv::Ptr<cv::CascadeClassifier> cascade = cv::makePtr<cv::CascadeClassifier>("data/lbpcascades/lbpcascade_frontalface.xml");
    cv::Ptr<cv::cuda::CascadeClassifier> cascade_gpu = cv::cuda::CascadeClassifier::create("data/lbpcascades/lbpcascade_frontalface.xml");

    double time = 0.0;
    int nb = 0;
    if(capture.isOpened()) {
        for(;;) {
            capture >> frame;
            if(frame.empty() || nb >= 1000) {
                break;
            }

            std::vector<cv::Rect> faces;
            double t = 0.0;
            if(!useCuda) {
                t = (double) cv::getTickCount();
                frame.copyTo(uframe);
                cv::cvtColor(uframe, uFrameGray, CV_BGR2GRAY);
                cascade->detectMultiScale(uFrameGray, faces);
                t = ((double) cv::getTickCount() - t) / cv::getTickFrequency();
            } else {
                t = (double) cv::getTickCount();
                image_gpu.upload(frame);
                cv::cuda::cvtColor(image_gpu, image_gpu_gray, CV_BGR2GRAY);
                cv::cuda::GpuMat objbuf;
                cascade_gpu->detectMultiScale(image_gpu_gray, objbuf);
                cascade_gpu->convert(objbuf, faces);
                t = ((double) cv::getTickCount() - t) / cv::getTickFrequency();
            }

            time += t;
            nb++;

            for(std::vector<cv::Rect>::const_iterator it = faces.begin(); it != faces.end(); ++it) {
                cv::rectangle(frame, *it, cv::Scalar(0,0,255));
            }
            std::stringstream ss;
            ss << "FPS=" << (nb / time);
            cv::putText(frame, ss.str(), cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0,0,255));

            cv::imshow("Frame", frame);
            char c = cv::waitKey(30);
            if(c == 27) {
                break;
            }
        }
    }

    std::cout << "Mean time=" << (time / nb) << " s" << " ; Mean FPS=" << (nb / time) << " ; nb=" << nb << std::endl;
    system("pause");
    return 0;
}
edit flag offensive delete link more

Comments

I've done some tests refer to http://answers.opencv.org/question/58.... For the filtering and Soble cases, I have got some good results. But the results of the facedetect demo are still bad. The GPU has seemingly not started up, because the GPU load(tested by GPU-Z) is 1%~2%. And I have not check out the reason until now.

Anna Lucia gravatar imageAnna Lucia ( 2015-06-09 04:35:13 -0600 )edit

Did you check that setOpenCV is correctly set to true ?

cv::ocl::setUseOpenCL(true);
Eduardo gravatar imageEduardo ( 2015-06-09 09:02:34 -0600 )edit

Yes, I've set the flag as true. But the result is the same as false when I use the haarcascade to do the facedetect. Then I change the cascade file as lbpcascade, and get a double FPS.

Anna Lucia gravatar imageAnna Lucia ( 2015-06-09 19:48:28 -0600 )edit

Actually with the new Tapi, the cv::ocl module should be completely gone. So that will not be the problem. Using a UMat should invoke the setUseOpenCL(true) implicitly once an OpenCL enabled device is detected.

StevenPuttemans gravatar imageStevenPuttemans ( 2015-06-12 04:52:05 -0600 )edit

@Anna Lucia, keep in mind that the progress report of the latest 3.0 release states that several 100 of functions have been updated using the Tapi interface, but it is possible that the facedetection is not there yet. Then using a UMat will indeed be slower, because it invokes tons of unneccesary checks for OpenCL possibilities. I would open a bug report if I was you.

StevenPuttemans gravatar imageStevenPuttemans ( 2015-06-12 04:53:31 -0600 )edit

Hi, I meet the same problem. I tested UMat with cv::ocl::goodFeaturesToTrack in three conditions: UMat/setUseOpenCL(true), UMat/setUseOpenCL(false) and only using cv::Mat. In debug mode, the first situation runs much quicker; but in release mode, they almost have the same runtime. I checked the source code of cv::ocl::goodFeaturesToTrack and it did have a OpenCL kernel. So it should have been updated with T-API. BTY, I also test the some function with the OpenCV2.4.11's ocl module. It runs a little faster than that with UMat. I'am considering turning back to OpenCV2.4.11 :(

Lenoir.Tan gravatar imageLenoir.Tan ( 2015-06-13 04:07:28 -0600 )edit

Did you ever get an answer to how to get a speedup from using goodFeaturesToTrack with OpenCL? I'm trying to use it on a Mac.

mself gravatar imagemself ( 2016-06-06 12:24:14 -0600 )edit

Question Tools

2 followers

Stats

Asked: 2015-06-07 08:38:31 -0600

Seen: 10,732 times

Last updated: Jun 11 '15