1 | initial version |
Hi,
Can you share your code because you can use GpuMat
with vector<Point2f>
and HostMem
in the following way
cv::cuda::Stream stream;
Point2f p = Point2f(1, 2);
vector<Point2f> vec = { p, p };
cv::cuda::HostMem h_vec_src(vec);
cv::cuda::GpuMat d_vec;
d_vec.upload(h_vec_src,stream);
cv::cuda::HostMem h_dst;
d_vec.download(h_dst,stream);
/* sync to ensure the result of h_dst has been downloaded - in practice if you are going to sync directly after downloading you loose the benefit of using CUDA streams.*/
stream.waitForCompletion();
where the contents of h_dst
are equal to vec
?
If the result is zero now and wasn't before using HostMem
I would suspect that you could be using streams without synchronizing and/or you are synchronizing on the wrong stream.
2 | No.2 Revision |
Hi,
Can you share your code because you can use GpuMat
with vector<Point2f>
and HostMem
in the following way
cv::cuda::Stream stream;
Point2f p = Point2f(1, 2);
vector<Point2f> vec = { p, p };
cv::cuda::HostMem h_vec_src(vec);
cv::cuda::GpuMat d_vec;
d_vec.upload(h_vec_src,stream);
cv::cuda::HostMem h_dst;
d_vec.download(h_dst,stream);
/* sync to ensure the result of h_dst has been downloaded - in practice if you are going to sync directly after downloading you loose the benefit of using CUDA streams.*/
stream.waitForCompletion();
where the contents of h_dst
are equal to vec
?
If the result is zero now and wasn't before using HostMem
I would suspect that you could be using streams without synchronizing and/or you are synchronizing on the wrong stream.
Additionally from the Nvidia Visual Profiler output, the maximum speed up I would expect from using streams and an async pipeline is ~25%.