I'm trying to use the opencv gpu module to filter an image with Gabor kernels. To check if everything is correct, I'm comparing the result of the CUDA accelerated filtering, with the regular CPU filtering. The code I'm using is available here: https://github.com/juancamilog/gpu_convolve_test.git
Since the funtcion cv::gpu::filter2D is limited to kernels of size smaller than 16x16, I'm using cv::gpu::convolve for larger kernels. In that case, I use cv::gpu::copyMakeBorder to produce a filter response that has the same size as the original image.
The problem I'm facing is that the result of the cv::gpu::filter2D is different from the result of cv::filter2D. This is even more noticeable when using the cv::gpu::convolve function. What is the cause of this difference? How do we obtain a GPU filtering response that is the same as the CPU filtering response?