test NEON-optimized cv::threshold() on mobile device

asked 2012-07-31 02:11:37 -0600

sammy
3029 ●14 ●29 ●48

updated 2012-07-31 06:54:58 -0600

I have been writing some optimizations for the OpenCV's threshold function, for ARM devices (mobile phones). It should be working on both Android and iPhone.

However, I do not have a device to test it on, so I am looking for volunteers to give me a little help. If that motivates you more, I am planning to push it to the main OpenCV repo - and hope you guys will accept it

I am interested in code correctness, and if it happens to work as intended, some statistics for original/optimized performance. Do not forget to look at all scenarios.

If this patch format is a nonsense for you, the post here may help you more

Edit

I forgot to mention that you have to activate it by adding #define CV_USE_NEON somewhere at the top of the thresh.cpp function or as compile flag -DCV_USE_NEON

Patch 1/3. Update checkHardwareSupport()

From 5129091430a7423e5c07a4f3c845033adb8ccefe Mon Sep 17 00:00:00 2001
From: sammy <[email protected]>
Date: Tue, 31 Jul 2012 09:59:36 +0300
Subject: [PATCH 1/2] Update checkHardwareSupport and HWFeatures to support
 ARM NEON

---
 modules/core/include/opencv2/core/core_c.h |  3 +++
 modules/core/src/system.cpp                | 17 ++++++++++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/modules/core/include/opencv2/core/core_c.h b/modules/core/include/opencv2/core/core_c.h
index df763ab..bdfb879 100644
--- a/modules/core/include/opencv2/core/core_c.h
+++ b/modules/core/include/opencv2/core/core_c.h
@@ -1706,6 +1706,9 @@ CVAPI(double) cvGetTickFrequency( void );
 #define CV_CPU_SSE4_2  7
 #define CV_CPU_POPCNT  8
 #define CV_CPU_AVX    10
+
+#define CV_CPU_ARM_NEON 100
+
 #define CV_HARDWARE_MAX_FEATURE 255

 CVAPI(int) cvCheckHardwareSupport(int feature);
diff --git a/modules/core/src/system.cpp b/modules/core/src/system.cpp
index b8a4661..eeb2a58 100644
--- a/modules/core/src/system.cpp
+++ b/modules/core/src/system.cpp
@@ -176,7 +176,22 @@ struct HWFeatures
             f.have[CV_CPU_POPCNT] = (cpuid_data[2] & (1<<23)) != 0;
             f.have[CV_CPU_AVX]    = (cpuid_data[2] & (1<<28)) != 0;
         }
-
+       
+       // Android check
+       #if defined ANDROID
+       if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM &&
+            (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0)
+        {
+            f.have[CV_CPU_ARM_NEON] = 1;
+        }
+       #endif
+       // iOS check. Automatically activated by GCC/LLVM compiler option
+       #if define TARGET_OS_IPHONE 
+       #if define __ARM_NEON__
+       f.have[CV_CPU_ARM_NEON] = 1;
+       #endif
+       #endif
+       
         return f;
     }

-- 
1.7.11

Patch 2/3. threshold() optimizations

From 9f9e6e0a382b0ec2b5ddb3eedd27bcdf95af9763 Mon Sep 17 00:00:00 2001
From: sammy <[email protected]>
Date: Tue, 31 Jul 2012 10:02:35 +0300
Subject: [PATCH 2/2] NEON-accelerated threshold()

---
 modules/imgproc/src/thresh.cpp | 140 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)

diff --git a/modules/imgproc/src/thresh.cpp b/modules/imgproc/src/thresh.cpp
index 1fb4847..fb01852 100644
--- a/modules/imgproc/src/thresh.cpp
+++ b/modules/imgproc/src/thresh.cpp
@@ -42,6 +42,10 @@

 #include "precomp.hpp"

+#if defined CV_USE_NEON
+#include <arm_neon.h>
+#endif
+
 namespace cv
 {

@@ -226,6 +230,142 @@ thresh_8u( const Mat& _src, Mat& _dst, uchar thresh, uchar maxval, int type )
     }
 #endif

+#if CV_USE_NEON
+    if( checkHardwareSupport(CV_CPU_ARM_NEON) )
+    {
+       uint8x16_t thresh_u = vdupq_n_u8(thresh);
+        uint8x16_t maxval_ = vdupq_n_u8(maxval);
+       
+        j_scalar = roi.width & -8;
+
+        for( i = 0; i < roi.height; i++ )
+        {
+            const uchar ...

(more)

edit retag flag offensive close merge delete

Comments

Hi sammy, are you able to provide some interesting docs for someone to start learning about NEON instructions?

Rui Marques ( 2012-09-29 07:43:38 -0600 )edit

I've had good results reading the http://hilbert-space.de/ blog and following ARM tutorials here http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ There are 5 posts on NEON development here, covering many interesting aspects. Also, here is a list of all NEON intrinsics you can use in GCC http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

sammy ( 2012-10-01 01:03:57 -0600 )edit

Thank you :)

Rui Marques ( 2012-10-01 04:06:24 -0600 )edit

If you happen to have an Android/iPhone, or an ARM device which can run OpenCV code, I am still looking for someone to test this patch of code.

sammy ( 2012-10-01 06:28:16 -0600 )edit

add a comment

2 answers

Sort by » oldest newest most voted

answered 2012-10-29 16:24:23 -0600

trie
36 ●1 ●1

I have tried the patch on a beagleboard, running debian testing hardfloat (armhf) (based on opencv git commit 5777598).

First I had some errors, mixing signed and unsigned data:

/root/src/opencv/modules/imgproc/src/thresh.cpp: In function ‘void cv::thresh_8u(const cv::Mat&, cv::Mat&, uchar, uchar, int)’:
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
/root/src/opencv/modules/imgproc/src/thresh.cpp:269:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:270:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:294:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:295:61: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:317:62: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:339:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:69: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
/root/src/opencv/modules/imgproc/src/thresh.cpp:361:108: error: cannot convert ‘uint8x16_t {aka __vector(16) __builtin_neon_uqi}’ to ‘int8x16_t {aka __vector(16) __builtin_neon_qi}’ for argument ‘1’ to ‘int8x8_t vget_low_s8(int8x16_t)’
make[2]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/src/thresh.cpp.o] Fehler 1
make[2]: Leaving directory `/root/src/opencv/build'
make[1]: *** [modules/imgproc/CMakeFiles/opencv_imgproc.dir/all] Fehler 2
make[1]: Leaving directory `/root/src/opencv/build'
make: *** [all] Fehler 2

I then replaced "vget_low_s8" in those lines with "vget_low_u8", then it did compile.

I then tested with a program, which uses threshold for some of its work (the main-work is in other functions) and used oprofile on it: "opreport -l -g -D smart ../build/src/imgproc|grep -i thresh" without the patch:

1054      3.5127  thresh.cpp:794              imgproc                  cv::adaptiveThreshold(cv::_InputArray const&, cv::_OutputArray const&, double, int, int, int, double)
456       1.5197  thresh.cpp:677              imgproc                  cv::ThresholdRunner::operator()(cv::Range const&) const
3         0.0100  thresh.cpp:712              imgproc                  cv::threshold(cv::_InputArray const&, cv::_OutputArray const&, double ...

(more)

edit flag offensive delete link

Comments

Hey trie, this is great! Not the results, but the fact that someone finally tested it :) For the performance, I have to admit that it's not impressive. I was hoping for something above 4x. Can you tell which resolution your test images were? BTW, why did you call cv::adaptiveThreshold()? cv::threshold() should be just fine.

sammy ( 2012-10-30 01:15:26 -0600 )edit

normally the program processes images from a webcam. For this test (and other test of my own) I fed a video-file with a resolution of 800x600 into it. (The file was written with the opencv-video-writer as mjpeg. To limit the actual processing to "interesting" regions, in a first step there is a square-detector, loosely based on squares.cpp from the samples, but with adaptiveThreshold instead of canny (to work with differing light-conditions). That is the steps are: pyrDown pyrUp adaptiveThreshold(gray0, gray, 255, ADAPTIVE_THRESH_MEAN_C, THRESH_BINARY, kernel, athresh); dilate(gray, gray, Mat(), Point(-1, -1)); findContours(gray, contours...

trie ( 2012-10-30 18:07:42 -0600 )edit

That is, I happened to have a program (I'm currently trying to optimize with neon....), running on an arm, which uses threshold (at least for preprocessing). I came to this thread while searching the internet for "arm neon opencv".

Regarding gaining "only x2": On http://hilbert-space.de/?p=22 I have read, that using assembler instead of intrinsics might bring another performance boost, since the compiler didn't optimized the register-usage very well. I haven't looked at the assembler-output (yet, will probably do that in the next days...), but maybe it's a similar case here.

However I have very little knowledge of assembler (neither arm/neon, nor of the PC-world...), so that might not give much insight ;-)

trie ( 2012-10-30 18:36:54 -0600 )edit

add a comment

answered 2012-08-03 15:54:41 -0600

Kirill Kornyakov

2792 ●13 ●25 ●52

Code submissions should be done through the pull requests at github. The process is not yet documented, but here is the repo: https://github.com/Itseez/opencv. If you're familiar with github, you can do it easily.

You are right, your code is of potential interest for ARMs with NEON. But let's move this discussion to the github. NEON gurus should review your patch.

edit flag offensive delete link

Comments

Thanks, Kirill. I will push the patch to github, when I'll have a bit of time. There are some things that should be reviewed. I think I can (and I'd love to) contribute NEON code on a regular basis.

sammy ( 2012-08-04 06:53:50 -0600 )edit

Great! Just make a pull request when you're ready, we'll check it.

Kirill Kornyakov ( 2012-08-04 08:29:41 -0600 )edit

add a comment

test NEON-optimized cv::threshold() on mobile device

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

test NEON-optimized cv::threshold() on mobile device edit

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

test NEON-optimized cv::threshold() on mobile device