Ask Your Question
1

copy-pasted opencv code slower than precompiled code

asked 2015-11-24 07:41:26 -0600

Gabriele D gravatar image

updated 2015-11-24 07:42:34 -0600

Hello,

for my project I need to modify the openCV class cv::cuda::HOG (I need to introduce support for CV_8UC3 and cellSize = (16,16)).

At this stage I still don't wont to modify the openCV source code, so I have created my cuda HOG descriptor (namely HOGtest), starting from the openCV code.

Anyway, using the same openCV cv::cuda::HOG source code (just copied and pasted) I noticed that my code is noticeably slower. E.g.: for perform hog feature extraction on a 256x256 pixel image I have the following time measurements:

time with cv::cuda::HOG: about 1 ms
time with my HOGtest class: about 6 ms

Like I said, the code of HOGtest and cv::cuda::HOG is exactly the same. Performing code profiling with nvvp it turns out that the origin of this difference is the cuda kernels time execution.

Does anyone know the reason of that? I attach in the following the part of my .pro file (I am using the Qt framework) where I compile the cuda code with nvcc

CUDA_SOURCES += cuda_test.cu
CUDA_SDK = "/usr/local/cuda-7.5/samples/"    
CUDA_DIR = "/usr/local/cuda-7.5/"    
CUDA_ARCH = sm_52
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v 

INCLUDEPATH += $$CUDA_DIR/include
INCLUDEPATH += $$CUDA_SDK/common/inc/
INCLUDEPATH += $$CUDA_SDK/../shared/inc/

QMAKE_LIBDIR += $$CUDA_DIR/lib64 
QMAKE_LIBDIR += $$CUDA_SDK/lib
QMAKE_LIBDIR += $$CUDA_SDK/common/lib

LIBS += -L/usr/local/cuda-7.5/lib64/ \
    -lcuda \
    -lcudart

CUDA_INC = $$join(INCLUDEPATH,' -I','-I',' ')

cuda.input = CUDA_SOURCES
cuda.output = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc -m64 -g -G -arch=$$CUDA_ARCH -c $$NVCCFLAGS $$CUDA_INC $$LIBS  ${QMAKE_FILE_NAME} -o ${QMAKE_FILE_OUT}
cuda.dependency_type = TYPE_C
cuda.depend_command = $$CUDA_DIR/bin/nvcc -g -G -M $$CUDA_INC $$NVCCFLAGS   ${QMAKE_FILE_NAME}
QMAKE_EXTRA_COMPILERS += cuda

Thanks for the help

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-11-26 10:49:59 -0600

Gabriele D gravatar image

Solved!

The compilation flags -g -G have to be removed in order to activate the default compilation flag -O3.

Now the time performances are roughly equal.

edit flag offensive delete link more

Question Tools

2 followers

Stats

Asked: 2015-11-24 07:41:26 -0600

Seen: 261 times

Last updated: Nov 26 '15