test NEON-optimized cv::threshold() on mobile device
I have been writing some optimizations for the OpenCV's threshold function, for ARM devices (mobile phones). It should be working on both Android and iPhone.
However, I do not have a device to test it on, so I am looking for volunteers to give me a little help. If that motivates you more, I am planning to push it to the main OpenCV repo - and hope you guys will accept it
I am interested in code correctness, and if it happens to work as intended, some statistics for original/optimized performance. Do not forget to look at all scenarios.
If this patch format is a nonsense for you, the post here may help you more
Edit
I forgot to mention that you have to activate it by adding #define CV_USE_NEON
somewhere at the top of the thresh.cpp
function or as compile flag -DCV_USE_NEON
Patch 1/3. Update checkHardwareSupport()
From 5129091430a7423e5c07a4f3c845033adb8ccefe Mon Sep 17 00:00:00 2001
From: sammy <[email protected]>
Date: Tue, 31 Jul 2012 09:59:36 +0300
Subject: [PATCH 1/2] Update checkHardwareSupport and HWFeatures to support
ARM NEON
---
modules/core/include/opencv2/core/core_c.h | 3 +++
modules/core/src/system.cpp | 17 ++++++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/modules/core/include/opencv2/core/core_c.h b/modules/core/include/opencv2/core/core_c.h
index df763ab..bdfb879 100644
--- a/modules/core/include/opencv2/core/core_c.h
+++ b/modules/core/include/opencv2/core/core_c.h
@@ -1706,6 +1706,9 @@ CVAPI(double) cvGetTickFrequency( void );
#define CV_CPU_SSE4_2 7
#define CV_CPU_POPCNT 8
#define CV_CPU_AVX 10
+
+#define CV_CPU_ARM_NEON 100
+
#define CV_HARDWARE_MAX_FEATURE 255
CVAPI(int) cvCheckHardwareSupport(int feature);
diff --git a/modules/core/src/system.cpp b/modules/core/src/system.cpp
index b8a4661..eeb2a58 100644
--- a/modules/core/src/system.cpp
+++ b/modules/core/src/system.cpp
@@ -176,7 +176,22 @@ struct HWFeatures
f.have[CV_CPU_POPCNT] = (cpuid_data[2] & (1<<23)) != 0;
f.have[CV_CPU_AVX] = (cpuid_data[2] & (1<<28)) != 0;
}
-
+
+ // Android check
+ #if defined ANDROID
+ if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM &&
+ (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0)
+ {
+ f.have[CV_CPU_ARM_NEON] = 1;
+ }
+ #endif
+ // iOS check. Automatically activated by GCC/LLVM compiler option
+ #if define TARGET_OS_IPHONE
+ #if define __ARM_NEON__
+ f.have[CV_CPU_ARM_NEON] = 1;
+ #endif
+ #endif
+
return f;
}
--
1.7.11
Patch 2/3. threshold() optimizations
From 9f9e6e0a382b0ec2b5ddb3eedd27bcdf95af9763 Mon Sep 17 00:00:00 2001
From: sammy <[email protected]>
Date: Tue, 31 Jul 2012 10:02:35 +0300
Subject: [PATCH 2/2] NEON-accelerated threshold()
---
modules/imgproc/src/thresh.cpp | 140 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 140 insertions(+)
diff --git a/modules/imgproc/src/thresh.cpp b/modules/imgproc/src/thresh.cpp
index 1fb4847..fb01852 100644
--- a/modules/imgproc/src/thresh.cpp
+++ b/modules/imgproc/src/thresh.cpp
@@ -42,6 +42,10 @@
#include "precomp.hpp"
+#if defined CV_USE_NEON
+#include <arm_neon.h>
+#endif
+
namespace cv
{
@@ -226,6 +230,142 @@ thresh_8u( const Mat& _src, Mat& _dst, uchar thresh, uchar maxval, int type )
}
#endif
+#if CV_USE_NEON
+ if( checkHardwareSupport(CV_CPU_ARM_NEON) )
+ {
+ uint8x16_t thresh_u = vdupq_n_u8(thresh);
+ uint8x16_t maxval_ = vdupq_n_u8(maxval);
+
+ j_scalar = roi.width & -8;
+
+ for( i = 0; i < roi.height; i++ )
+ {
+ const uchar ...
Hi sammy, are you able to provide some interesting docs for someone to start learning about NEON instructions?
I've had good results reading the http://hilbert-space.de/ blog and following ARM tutorials here http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ There are 5 posts on NEON development here, covering many interesting aspects. Also, here is a list of all NEON intrinsics you can use in GCC http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html
Thank you :)
If you happen to have an Android/iPhone, or an ARM device which can run OpenCV code, I am still looking for someone to test this patch of code.