Revision history - OpenCV Q&A Forum

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

convolve with smoothing/gradient kernels (cuda)
cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
copy maxLoc 3x3 neighbours into Mat (CPU)
subpixel registration by quadratic fit (CPU)
resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

convolve with smoothing/gradient kernels (cuda)
cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
copy maxLoc 3x3 neighbours into Mat (CPU)
subpixel registration by quadratic fit (CPU)
resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) ~~this?~~this? (also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

convolve with smoothing/gradient kernels (cuda)
cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum with subpixel precision

find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
copy maxLoc 3x3 neighbours into Mat (CPU)
subpixel registration by quadratic fit (CPU)
resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) ~~this?~~ this?

(also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

convolve with smoothing/gradient kernels (cuda)
cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum of correlation pattern with subpixel precision

find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
copy maxLoc 3x3 neighbours into Mat (CPU)
subpixel registration by quadratic fit (CPU)
resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

(also thanks to L.Berger who got me this far)

strategy to build asynchronic subpixel registration analysis

Hi, I am analysing set of images for subpixel image shifts. I have code which essantially loops through:

loop(){

read binary image, send it to GpuMat/cuda

//next 2 points are based on dft, mulSpectrums, magnitude (all cuda "Streamable")

convolve with smoothing/gradient kernels (cuda)
cross-correlate (phase-correlate) with base image (cuda)

// next are locating maximum of correlation pattern with subpixel precision

find maxLoc (cuda, but value sent to Point.x/Point.y on CPU)
copy maxLoc 3x3 neighbours into Mat (CPU)
subpixel registration by quadratic fit of 3x3 maxima neighbours (CPU)
resulting (x,y) pixel shifts are placed in shift maps (CPU) }

All this is computed ~65000 times, it takes about 8 minutes to compute (256x256 base 16 bit B&W images). Cuda card is not even heating up (nvidia-smi shows 6% GPU-Util).

Any suggestions on how to parallelize (the faster the better) this?

(also thanks to L.Berger who got me this far)

Revision history [back]

strategy to build asynchronic subpixel registration analysis

strategy to build asynchronic subpixel registration analysis

strategy to build asynchronic subpixel registration analysis

strategy to build asynchronic subpixel registration analysis

strategy to build asynchronic subpixel registration analysis