Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass Variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which treats all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            temp.at<char>(i) = in.at<char>(i)  - 1;
            out.at<char>(i) = temp.at<char>(i) + 1;
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass Variables variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which treats all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            temp.at<char>(i) = in.at<char>(i)  - 1;
            out.at<char>(i) = temp.at<char>(i) + 1;
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which treats affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            temp.at<char>(i) = in.at<char>(i)  - 1;
            out.at<char>(i) = temp.at<char>(i) + 1;
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            temp.at<char>(i) for(int j = in.at<char>(i) 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
            out.at<char>(i)     out.at<char>(i, j) = temp.at<char>(i) temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src,rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j)  - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src,rows; src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j)  - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = imread("Path/to/image.bmp", cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j)  - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_
parallel_for_ for every thread
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j)  - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_ for every thread
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j)  - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j)  - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_ for every thread
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j) - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j) - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation in with numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

Actually HoughCircles is not multithreaded as you can see in the sources (link). You are free to make a pull request.

For multi threading you should OpenCV's parallel_for_ wrapper to make it available for all multi thread libraries. As far as I can see parallelize HoughCircles should be possible with existing code, perhabs parts of the algorithm are also vectorizable using SIMD intrinsics.

EDIT:

I think there is no good documentation. I try to explain it in a plain example. I hope this is correct so far. Consider that writing to same memory location is problematic.

class parallelClass : public cv::ParallelLoopBody {
public:    
    // Constructor: here you can pass variables from outside
    // and those who shall be passed to outside
    parallelClass(const cv::Mat &_in, cv::Mat &_out) :    
    in(_in), out(_out)  // assign variables from outside to global internal variables
    {
        // Do something which affects all threads and global variables
        out.create(in.size(), in.type());
        temp.create(in.size(), in.type());
    }

    // For completeness
    ~parallelClass() {}
    parallelClass& parallelClass=(const parallelClass&) {return *this;}

    // This is an overloaded () operator which executes the calculation for every thread
    // Range will be splitted by parallel_for_ automaticly
    void operator()(const Range &boundaries) const
    {
        // Do the loop which you will parallelize.
        // boundaries.start and boundaries.end will be set from parallel_for_ for every thread
        for(int i = boundaries.start; i < boundaries.end; ++i) {

            // Do loop stuff example (makes no sense)
            for(int j = 0; j < src.cols; ++j) {
                temp.at<char>(i, j) = in.at<char>(i, j) - 1;
                out.at<char>(i, j) = temp.at<char>(i, j) + 1;
            }
        }
    }

private:
    // Declare global variables
    const cv::Mat &in;
    cv::Mat &out, temp;
}

Executing the class is here:

cv::Mat src, dst;
src = cv::imread("Path/to/image.bmp", IMREAD_GRAYSCALE);
int numberOfThreads = 4;


// This is done by parallel_class_obj
/*
dst.create(src.size(), src.type());
temp.create(src.size(), src.type());

for(int i = 0; i < src.rows; ++i) {
    for(int j = 0; j < src.cols; ++j) {
        temp.at<char>(i, j) = src.at<char>(i, j) - 1;
        dst.at<char>(i, j) = temp.at<char>(i, j) + 1;
    }
} 
*/

// Creates object and calls the constructor of parallelClass
parallelClass parallel_class_obj(src, dst);

// Executes the calculation and splits the calculation with numberOfThreads threads
cv::parallel_for_(Range(0, src.rows), parallel_class_obj, numberOfThreads);

EDIT 2:

I made a pull request for parallelizing HoughCircles here. Please test it and it would be great to get some feedback about performance and issues. The complete fork is here.