I am attempting to use a cv::cuda::Stream thread per processing thread in my image processing pipeline. Essentially an image is processed by a thread, where i can have n threads and thus n images processed in parallel. (In my current tests i am only using 1 or 2 threads though).
Each call to cv::cuda functions is being passed its thread's cuda stream. Everything seems to be working until I get to a call to cv::cuda::meanStdDev (the version that takes a stream). I get a crash, something to do with a kernel execution failure.
If i change nothing other than passing those same stream overloaded cv::cuda functions cv::cuda::Stream::Null() as the stream, cv::cuda::meanStdDev does not crash, and gives the right answer.
This is how i am calling the function and interpreting its output.
double vals[2];
cv::cuda::HostMem mean_std_hm;
cvc::meanStdDev(out_gimg, mean_std_hm, m_GpuStream);
m_GpuStream.waitForCompletion();
mean_std_hm.createMatHeader().copyTo(cv::Mat(1, 2, CV_64FC1, &vals[0]));
double mean = vals[0];
double stdDev = vals[1];
I checked and both calls (crashing and non-crashing calls) are getting the same image...
Any Ideas as to what might be going wrong?
I am using OpenCv 3.4.3 with the CUDA compilation flag turned on.
Thanks, -Ryan