Floating point operations(compered to Doubles) are much faster on arm architectures and other than that NEON doesn't support doubles. So why isn't opencv4android based on float instead of double to offer a better performance ?
As an example of better performance I could mention warpPerspective (ImgProc). It was too slow (around 100 ms) but then I created my own version based on floats and voila it took only 5 ms.(not rebuilt as part of opencv but as part of my own app/lib)
I can not fork and build my own version of opencv based on float because I'll lose the advantage of some closed-source optimization which according to my testing is really important. (My own builds of opencv (with lots of optimization flags) resulted in 200-300 ms for warpPrespective!)