Ask Your Question
4

Running 2 algorithms simulatenously

asked 2015-10-23 02:14:44 -0600

thexnightmare gravatar image

updated 2020-11-30 03:33:08 -0600

Dear, I want to capture a video and do 2 algorithms in the same to it, where each algorithms take approximately 2 seconds. How to do that? Note: Ive tried to do the 2 algorithms in 2 different processes but the 2nd program telling tat it cant capture from camera due to that the camera is locked by the first process. Thanks

edit retag flag offensive close merge delete

Comments

What do you mean by simultaneously? If you mean to process the same frame in two different and independent ways, then it is as easy as doing this:

VideoCapture cap(*whatever*);
Mat frame1, frame2;
while (true){
    cap >> frame1;
    frame2 = frame1.clone();
    //run first algorithm over frame1
    //run second algorithm over frame2
}

You can also run the algorithms in a parallel way rather than in a sequencial way. I think the key to your problem is avoiding reading from the camera twice.

LorenaGdL gravatar imageLorenaGdL ( 2015-10-23 02:42:46 -0600 )edit

Thanks for your comment, So, it is no possible to handle the usb camera from 2 different process in the same time? if not, Any idea how to run the algorithms in a parallel way?

thexnightmare gravatar imagethexnightmare ( 2015-10-23 05:20:13 -0600 )edit

I'm not sure you can't do it, but why would you want to? Accesing the camera from two different processes will probably lead to grabbing different frames, and I guess you want to compare the algorithms of something like that. Maybe if you detail more your scenario we can approach better solutions

LorenaGdL gravatar imageLorenaGdL ( 2015-10-23 05:40:49 -0600 )edit

2 answers

Sort by ยป oldest newest most voted
5

answered 2015-10-24 09:02:11 -0600

updated 2015-10-24 09:06:13 -0600

Although the answer of pklab is correct, I still want to add some comments (and because the comment's length is limited, so I put my comments in the answer part). Firstly, parallel techniques require more work to do for performance goal (high speed). Hence, it should be applied in situations where performance is a must and the computation is heavy. Secondly, your benchmark results are just for small images captured from camera, you did not mention the images' resolution so may be it is not a fair comparison. Here my result (on Duo Core machine 2x2.4, Windows 64, OpenCV 3, VS 2013):

30% CPU used
Video width: 640
Video height: 480
Frame count: 200
Parallel:8.66ms Sequential:13.47ms Overhead:0.6ms

With a big video file:

34% CPU used
Video width: 1280
Video height: 720
Frame count: 200
Parallel:17.64ms Sequential:25.76ms Overhead:0.21ms

And the last: I rewrited @pklab code to use Mat reference (instead of Mat pointer) and can be run with both file and camera as follow:

#include <thread>
#include <iostream>
#include <string>
#include <opencv2/opencv.hpp>
using namespace cv;
using namespace std;

// here we use canny 
void Algo1(const cv::Mat &src, cv::Mat &dst)
{
    cvtColor(src, dst, CV_BGR2GRAY);
    GaussianBlur(dst, dst, Size(7, 7), 1.5, 1.5);
    Canny(dst, dst, 0, 30, 3);
}

// here we use morphology gradient
void Algo2(const cv::Mat &src, cv::Mat & dst)
{
    int morph_size = 1;
    cv::Size sz(2 * morph_size + 1, 2 * morph_size + 1);
    cv::Point anchor(morph_size, morph_size);
    Mat element = getStructuringElement(MORPH_RECT, sz, anchor);
    morphologyEx(src, dst, MORPH_GRADIENT, element);
}

// empty function to measure overhead
void Test()
{
    return;
}

int main(int argc, char * argv[])
{
    VideoCapture cap; 
    if (0==string("0").compare(string(argv[1])))
        cap.open(0);  // open the default camera
    else
        cap.open(argv[1]); // open video file
    if (!cap.isOpened()) // check if we succeeded
        return -1;
    cout << "Video width:" << cap.get(CV_CAP_PROP_FRAME_WIDTH) << endl;
    cout << "Video height:" << cap.get(CV_CAP_PROP_FRAME_HEIGHT) << endl;
    clock_t  parallel = 0, sequential = 0, testParallel = 0, testSequential = 0;
    clock_t start, stop;
    int cnt = 0;
    for (;cnt<200;)
    {
        Mat src, dst1, dst2;
        cap >> src; // get a new frame from camera
        imshow("src", src);

        //Try it with sequential way
        start = clock();
        Algo1(src, dst1);
        Algo2(src, dst2);
        stop = clock();
        sequential += (stop - start);

        imshow("Sequential Algo1", dst1);
        imshow("Sequential Algo2", dst2);

        // try simple parallel processing way
        start = clock();
        std::thread th1(&Algo1, src, dst1);
        std::thread th2(&Algo2, src, dst2);
        th1.join();
        th2.join();
        stop = clock();
        parallel += (stop - start);

        imshow("Paralllel Algo1", dst1);
        imshow("Paralllel Algo2", dst2);

        // measure threading overhead (2 calls)
        int n = 2;
        start = clock();
        Test();
        Test();
        stop = clock();
        testSequential += (stop - start);

        start = clock();
        std::thread thTest1(&Test);
        std::thread thTest2(&Test);
        thTest1.join();
        thTest2.join();
        stop = clock();
        testParallel += (stop - start);

        cnt++;

        if (waitKey(30) >= 0)
            break;
    }

    double parTime = 1000.0*parallel / cnt / (double)CLOCKS_PER_SEC;
    double seqTime = 1000.0*sequential / cnt / (double)CLOCKS_PER_SEC;
    double overHead = 1000.0*(testParallel - testSequential) / cnt / (double)CLOCKS_PER_SEC;

    std::cout << std::endl << "Average processing time (2 ...
(more)
edit flag offensive delete link more

Comments

1- Amazing new performance, what is the reasons for getting this new benchmarks which sowing that parallel is much faster? 2- Just simple question, what do you mean by overhead and the empty function ?

thexnightmare gravatar imagethexnightmare ( 2015-10-24 09:57:34 -0600 )edit

1-For showing that parallel is a better approach where it is preferable. 2-That is for showing the overhead (time) you have to pay when applying parallel techniques (in this case, it just the time for initialization threads).

tuannhtn gravatar imagetuannhtn ( 2015-10-24 10:10:59 -0600 )edit
1

@tuannhtn, I agree with you in fact my answer focuses on: sequential is better than "simple treading" as is stated at 2nd point of my answer. I would suggest this answer on better thread design

Regards to Mat as reference, in my test, if I use dst as reference sometimes I catch a memory exception.

@thexnightmare, thread could be faster if are well used but will become a nightmare quickly if you don't have a strong treading background

pklab gravatar imagepklab ( 2015-10-24 11:15:27 -0600 )edit
1

I'm sorry to have lost my accepted answer... anyway Just to say that treading isn't "the gold" ... on my i3/Win64/OCV 2.4.10/VS2013

Video width: 640
Video height: 480
Frame count: 200
Parallel:11.65ms Sequential:11.48ms Overhead:0.67ms

on my PC result shows there is no real difference in performance between. Performance improvement isn't guaranteed and it will depends from many factor from algorithms to hardware architecture. In addiction if in real case the 2 algos have to share some variable ... needed mutex will introduce more delay.

pklab gravatar imagepklab ( 2015-10-24 12:31:01 -0600 )edit

@pklab: with OCV 2.4, you may not have good results since it was not built with Intel's IPP package. Hence there is no different between the two approaches in your test.

tuannhtn gravatar imagetuannhtn ( 2015-10-24 12:36:23 -0600 )edit

@tuannhtn Why IPP should improve paralles and not sequential ? For infos... on same machine using OCV 3.0.0:

Video width: 640
Video height: 480
Frame count: 200
Parallel:8.67ms Sequential:8.37ms Overhead:0.69ms
pklab gravatar imagepklab ( 2015-10-24 13:23:33 -0600 )edit

I am sorry @pklab, but I did not mean "IPP should improve paralles and not sequential". What I meant was "I think that with IPP support, the difference between the two approaches is more evident". Thanks for your new results.

tuannhtn gravatar imagetuannhtn ( 2015-10-25 02:00:38 -0600 )edit

Thank for you guys, I really appreciate your effots which helped me alot. Just a quick question, If I want to go deeply with threading, which library do you prefer: IPP, OpenMP, or Boost?

thexnightmare gravatar imagethexnightmare ( 2015-10-26 15:04:45 -0600 )edit
1

From my experience, I suggest you the Threading Building Blocks (TBB) from Intel.

tuannhtn gravatar imagetuannhtn ( 2015-10-26 22:35:26 -0600 )edit

All thing worked perfectly, just one thing, if I want to see the process to the 2 alogirithms, I added imshow to both parallel functions, but then compilation error appears, any idea?

thexnightmare gravatar imagethexnightmare ( 2015-12-06 11:29:07 -0600 )edit
7

answered 2015-10-23 06:13:49 -0600

pklab gravatar image

updated 2015-10-25 07:06:50 -0600

You could just use threads to run your algorithms but don't expect faster performance because:

  1. OpenCV has a lot of internal parallelization;
  2. To go at full speed using threads you need a well designed threading architecture (like producer/consumers) and may be this is out of your scope;

Below is simple example, here I'm comparing sequential vs parallel implementation using a stream from a webcam as input.

I'm showing how to apply 2 different algorithms over same frame, using 2 sequential calls and simple threading. The example below suffering of poor threading implementation because thread construction will introduce big overhead.

On my computer, results show that the sequential way is faster than simple threading, it depends on background computer load, sequential might be up to 2 time faster.

EDIT: Added measure of treading overhead.. Look at my timing (win7/64, intel i3 2x2.53Ghz):

  • webcam @320x240, OCV 2.4.10:
    • Debug ver within MS VisualStudio 2013: Parallel:16.3ms Sequential:12.8ms Overhead:3.5ms
    • Release ver within MS VisualStudio 2013: Parallel:8.1ms Sequential:4.3ms Overhead:4.9ms
    • Release ver from command line: Parallel:3.6ms Sequential:2.7ms Overhead:0.6ms
  • webcam @640x480, OCV 2.4.10:Parallel:11.65ms Sequential:11.48ms Overhead:0.67ms
  • webcam @640x480, OCV 3.0.0:Parallel:8.67ms Sequential:8.37ms Overhead:0.69ms

EDIT2: Considering tuannhtn answer, looks interesting to investigate a bit over different results

For sure advanced parallel programming in IPP improves overall performance but really on Intel i3 I can't see any improvement between sequential and parallel approach. I suppose that difference is due to different processor architecture.

Core Duo 2x2.4 and Intel i3 2x2.53 have 2 cores but CoreDuo doesn't have Hyper-Threading and SmartCache.

When Hyper-Threading is available, some operations share the execution resources automatically in parallel (I/O, cache, bus interface..) on more logical processor. Hyper-Threading and SmartCache make more efficient use of available execution resources boosting sequential approach.

On CoreDuo load balancing on is demanded to developer than parallel approach gets better result.

This can explains why parallel approach is better on CoreDuo but is close to sequential approach on Intel i3. Looking at performance with video 640x480:

  • CoreDuo/Ocv3.0.0/Win7/64: Parallel:8.66ms Sequential:13.47ms Overhead:0.6ms
  • i3/Ocv3.0.0/Win//64: Parallel:8.67ms Sequential:8.37ms Overhead:0.69ms

the code:

#include <thread>
#include <opencv2/opencv.hpp>
using namespace cv;

// here we use canny 
void Algo1(const cv::Mat &src, cv::Mat *dst)
{
    cvtColor(src, *dst, CV_BGR2GRAY);
    GaussianBlur(*dst, *dst, Size(7, 7), 1.5, 1.5);
    Canny(*dst, *dst, 0, 30, 3);
}

// here we use morphology gradient
void Algo2(const cv::Mat &src, cv::Mat *dst)
{
    int morph_size = 1;
    cv::Size sz(2 * morph_size + 1, 2 * morph_size + 1);
    cv::Point anchor(morph_size, morph_size);
    Mat element = getStructuringElement(MORPH_RECT, sz, anchor);
    morphologyEx(src, *dst, MORPH_GRADIENT, element);
}

// empty function to measure overhead
void Test()
{
   return;
}

int main()
{
    VideoCapture cap ...
(more)
edit flag offensive delete link more

Comments

Just simple question, what do yu mean by overhead and the empty function ?

thexnightmare gravatar imagethexnightmare ( 2015-10-24 07:00:10 -0600 )edit

the overhead is due to threads creation and join. I use an empty function to measure just this time

pklab gravatar imagepklab ( 2015-10-24 11:17:58 -0600 )edit

Since @pklab updated new results and answer, I made a test on my PC that has a core i3 and below are my results:

First: Webcam input

Video width: 640
Video height: 480
Frame count: 200
Parallel:2.3ms Sequential:2.4ms Overhead:0.55ms

Second: Video file input

Video width: 1280
Video height: 720
Frame count: 200
Parallel:13.89ms Sequential:16.1ms Overhead:0.63ms
tuannhtn gravatar imagetuannhtn ( 2015-10-25 08:59:19 -0600 )edit

@tuannhtn Thank you for nice discussion. Just for info: avoid to pass non const parameters as reference to a std::thread: see here

pklab gravatar imagepklab ( 2015-10-26 06:54:58 -0600 )edit

Thanks @pklab, I am indeed aware of that. The code is ok because two threads are independent (only src is common, but it is read only accessed) and there is no race condition. As you emphasized, this situation was very simple, and more complicated contexts will need other tools to control accessing to common data.

tuannhtn gravatar imagetuannhtn ( 2015-10-26 09:06:27 -0600 )edit

Question Tools

3 followers

Stats

Asked: 2015-10-23 02:14:44 -0600

Seen: 2,441 times

Last updated: Oct 25 '15