Is it possible to speed up the overloaded matrix multiplication operator (*) in OpenCV by using preallocated cv::Mat instance with correct dimensions as a placeholder for where the result is being written into?
Something like the existing function:
CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, double alpha,
InputArray src3, double beta, OutputArray dst, int flags = 0);
only simpler. I would like to have something like this:
CV_EXPORTS_W void matmul(InputArray src1, InputArray src2, OutputArray dst);