1 | initial version |
I have tried the patch on a beagleboard, running debian testing hardfloat (armhf) (based on opencv git commit 5777598).
First I had some errors, mixing signed and unsigned data:
/root/src/opencv/modules/imgproc/src/thresh.cpp: In function ‘void cv::thresh_8u(const cv::Mat&, cv::Mat&, uchar, uchar, int)’:
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:270:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:294:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:295:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:317:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
make[2]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/src/thresh.cpp.o] Fehler 1
make[2]: Leaving directory `/root/src/opencv/build'
make[1]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/all] Fehler 2
make[1]: Leaving directory `/root/src/opencv/build'
make: *** [all] Fehler 2
I then replaced "vget_low_s8" in those lines with "vget_low_u8", then it did compile.
I then tested with a program, which uses threshold for some of its work (the main-work is in other functions) and used oprofile on it: "opreport -l -g -D smart ../build/src/imgproc|grep -i thresh" without the patch:
1054 3.5127 thresh.cpp:794 imgproc cv::adaptiveThreshold(cv::_InputArray const&, cv::_OutputArray const&, double, int, int, int, double)
456 1.5197 thresh.cpp:677 imgproc cv::ThresholdRunner::operator()(cv::Range const&) const
3 0.0100 thresh.cpp:712 imgproc cv::threshold(cv::_InputArray const&, cv::_OutputArray const&, double, double, int)
1 0.0033 thresh.cpp:855 imgproc cvThreshold
(of course I ran that several times, the numbers were always in this range)
The profile of the patched version:
1035 3.5734 thresh.cpp:936 imgproc cv::adaptiveThreshold(cv::_InputArray const&, cv::_OutputArray const&, double, int, int, int, double)
244 0.8424 thresh.cpp:819 imgproc cv::ThresholdRunner::operator()(cv::Range const&) const
2 0.0069 thresh.cpp:854 imgproc cv::threshold(cv::_InputArray const&, cv::_OutputArray const&, double, double, int)
2 0.0069 thresh.cpp:997 imgproc cvThreshold
So this hints to an increase of factor 2, however 1.5 versus 0.8 % of the program is probably not the best test-case..... maybe I find a better test-case, but I wanted to give you some feedback.
(I want to check the samples-directory next. However whatever program I'll use should be console-based, since I currently have no display attached to the board (and I guess letting the program run with X forwarded via ssh might not work so well....))