I have tried implementing my own code for the Sum of Squared Differences (SSD) for my template matching using the algorithm given in http://docs.opencv.org/2.4/modules/imgproc/doc/object_detection.html?highlight=matchtemplate#matchtemplate
I compare the template with the original image pixel by pixel (using 4 for loops) up to (size of original image - size of template image). However, the time taken can be up to 1000 times (10000ms - depends on image dimension) longer compared to opencv's TM_SQDIFF (10ms, constant for every image dimensions).
I then tried to compare template to original image in block size, first with template size, then down focus to the area with minimum SSD, time is significantly reduced. Thus, I suspect that Opencv actually computes the difference between template and original image in block size.
I have read through the open source, templmatch.cpp but still do not get the clue. I am finding the code complicated even though i read them line by line. Also, some of the called functions, I could not find where is it. For example the Caller callers[method], dst(parameters), dft(parameters), part(parameters), sqsum[] etc.
I know I could just implement my template matching method using opencv's library but it is part of my assignments to find out the reason.
Anyway, my main question is what makes opencv's TM_SQDIFF so much faster? No much explanation in the opencv document as well.