Revision history [back]

Hi, Can you share your code because you can use GpuMat with vector<Point2f> and HostMem in the following way

cv::cuda::Stream stream;
Point2f p = Point2f(1, 2);
vector<Point2f> vec = { p, p };
cv::cuda::HostMem h_vec_src(vec);
cv::cuda::GpuMat d_vec;
d_vec.upload(h_vec_src,stream);
cv::cuda::HostMem h_dst;
d_vec.download(h_dst,stream);
/* sync to ensure the result of h_dst has been downloaded - in practice if you are going to sync directly after downloading you loose the benefit of using CUDA streams.*/
stream.waitForCompletion();

where the contents of h_dst are equal to vec?

If the result is zero now and wasn't before using HostMem I would suspect that you could be using streams without synchronizing and/or you are synchronizing on the wrong stream.

Hi, Can you share your code because you can use GpuMat with vector<Point2f> and HostMem in the following way

cv::cuda::Stream stream;
Point2f p = Point2f(1, 2);
vector<Point2f> vec = { p, p };
cv::cuda::HostMem h_vec_src(vec);
cv::cuda::GpuMat d_vec;
d_vec.upload(h_vec_src,stream);
cv::cuda::HostMem h_dst;
d_vec.download(h_dst,stream);
/* sync to ensure the result of h_dst has been downloaded - in practice if you are going to sync directly after downloading you loose the benefit of using CUDA streams.*/
stream.waitForCompletion();

where the contents of h_dst are equal to vec?

If the result is zero now and wasn't before using HostMem I would suspect that you could be using streams without synchronizing and/or you are synchronizing on the wrong stream.

Additionally from the Nvidia Visual Profiler output, the maximum speed up I would expect from using streams and an async pipeline is ~25%.