How to use parallel_for?
In release notes for version 2.4.3 I discovered that OpenCV has build in parallel_for, but I can't find any documentation for it. Anyone know about its documentation or examples of use?
In release notes for version 2.4.3 I discovered that OpenCV has build in parallel_for, but I can't find any documentation for it. Anyone know about its documentation or examples of use?
Hi,
As shown by Vladislav, you only need to derivate the cv::ParallelLoopBody class to make your own.
To complete, and answer Q3 (previous Qx may be related to includes, you should give more details about encountered errors). Then, related to Q3 : if you need to process in parallel local memory buffers or other data, you need a constructor that will point to the buffers to process when operator() is called.
Here is a sample code i use that may help you : It is a simple loop that clips buffer values to max and min values. I consider here classical tables of any type using templates. You can change this using std::vectors, cv::Mat or any other, only keep in mind that you have to create private identifiers that points to the beginning of data buffer that you want to manage.
In constructor, you show which buffer to process and eventually what constants to take into account. Once done everything is prepared to run parallel. In the operator() method, create new local pointers to the target block range
Hope it helps.
Regards
Alex
template <class type>
class Parallel_clipBufferValues: public cv::ParallelLoopBody
{
private:
type *bufferToClip;
type minValue, maxValue;
public:
Parallel_clipBufferValues(type* bufferToProcess, const type min, const type max)
: bufferToClip(bufferToProcess), minValue(min), maxValue(max){}
virtual void operator()( const cv::Range &r ) const {
register type *inputOutputBufferPTR=bufferToClip+r.start;
for (register int jf = r.start; jf != r.end; ++jf, ++inputOutputBufferPTR)
{
if (*inputOutputBufferPTR>maxValue)
*inputOutputBufferPTR=maxValue;
else if (*inputOutputBufferPTR<minValue)
*inputOutputBufferPTR=minValue;
}
}
};
Finally, how to use it :
const int SIZE=10;
int myTab[SIZE];
int minVal=0, maxVal=255;
parallel_for_(cv::Range(0,SIZE-1), Parallel_clipBufferValues<int>(myTab, minVal, maxVal));
Now I understand it better. Thank you for detailed answer.
Gianluigi, you can call parallel_for loop for each row in Mat, like you would do with usual 'for' loop. And of course if buffer of Mat is continious in memory all this can be done in single call.
@blorgggg To get real speed up, you should have a "heavy" computations in each iteration of loop. Look at cvtColor. Now it parallelized with parallel_for_ and have pretty speedup.
Yes, setNumThreads()
will do the work, http://docs.opencv.org/modules/core/doc/utility_and_system_functions_and_macros.html?highlight=setnumthreads#void%20setNumThreads%28int%20nthreads%29
class Body : public cv::ParallelLoopBody
{
public:
void operator ()(const cv::Range& range) const
{
for (int i = range.start; i < range.end; ++i)
...
}
};
Body body;
cv::parallel_for_(cv::Range(0, count), body);
Thank you for your answer. Now I have more questions :). (I am still using 2.4.2, so please tell me if my questions are not relevant in 2.4.3).
Q1: My code won't compile. It expects BlockedRange instead of Range. What is this BlockedRange?
Q2: ParallelLoopBody is not found. What is this and why it is needed?
Q3: How can it access local variables if this parallel_for loop requiers use of some outside class?
I tried this out myself and got everything to compile and function, but the program would never use more than one processor. Is there something special you also need to do other than have opencv 2.4.3 compiled with the Using TBB selected?
Here's a sample code (my actual looping body was much more substantial, this was just a test to make sure that yes, no matter what it wasn't using more than one core)
class Parallel_Test : public cv::ParallelLoopBody
{
private:
double* const mypointer;
public:
Parallel_Test(double* pointer)
: mypointer(pointer){
}
void operator() (const Range& range) const
{
//This constructor needs to be here otherwise it is considered an abstract class.
// qDebug()<<"This should never be called";
}
void operator ()(const cv::BlockedRange& range) const
{
for (int x = range.begin(); x < range.end(); ++x){
mypointer[x]=x;
}
}
};
//TODO Loop pixels in parallel
double t = (double)getTickCount();
//TEST PARALELL LOOPING AT ALL
double data1[1000000];
cv::parallel_for(BlockedRange(0, 1000000), Parallel_Test(data1));
t = ((double)getTickCount() - t)/getTickFrequency();
qDebug() << "Parallel TEST time " << t << endl;
t = (double)getTickCount();
for(int i =0; i<1000000; i++){
data1[i]=i;
}
t = ((double)getTickCount() - t)/getTickFrequency();
qDebug() << "SERIAL Scan time " << t << endl;
This should be posted as separate question. Short answer is in comments to original question.
Asked: 2012-11-04 02:02:30 -0600
Seen: 36,844 times
Last updated: Nov 30 '17
ATTENTION: If you are new to this (like me), make note that the new implementation is "parallel_for_" (with a trailing underscore!) not "parallel_for" !!! Otherwise it will just run your loop in serial! AHH!
@Daniil Osokin and especially @Vladislav Vinogradov for pointing this out!
Thanks for the point. I missed it myself.
hi sir how to use parallel_for to QR code,to time reducing in QR code concept
@Rthi, there is no reason of reviving 5 year old topics that have nothing to do with QR code processing ....