Hi Nightlife,
I've been looking at the stitching_detailed.cpp example which is heavily based on the stitching class. So far I've noticed several things that could help with speed that may be relevant to you whether or not you are using that example:
The feature detection for each image is run within a for loop in the example. This could be parellelised to run on multiple cores as getting the features in each image is independent.
There are appear to be some GPU capabilities for SURF and I think other places within Stitching.
What resolution are you using for the image registration stage? Can this be lowered without affecting performance?
What resolution are you using for the composition stage? Can this be lowered and still give you adequate panorama? I've found that this can affect performance a lot.
Can you tell us where your bottleneck is (feature detection, pairwise matching, estimating homography, composition)? How many images are you trying to stitch etc?
Actually, I do not understand why people think stitching should be a fast algorithm. Looking at all the steps that need to be performed, with several operations taking seconds and everything depending on the amount of pixels and the amount of overlap between images... So do not expect to get this fast or real time, it is nearly impossible with current hardware, even if you code everything to work multicore, you will have bottlenecks that need to happen sequentially.
Ok it will never be real time and I don't think anybody was necessarily expecting it to be but it certainly isn't out of the realms of possibility to speed this up substantially.
I do not want to implement it as real-time processing. it is a project to run in Desktop. Thank you anyway guys. :)