Depending on the capture conditions and the buses which you want recognize. Some cases would be really difficult.
The simplest case would be a camera with fixed position pointing to a street, where said bus come facing the camera.
Preparing a recognition database:
- Obtain reference images from the
buses with similar pose to the ones
that you expect to see by such
camera
- Train a classifier with its
keypoints and descriptors.
- Separate the planar sections of each
reference image (manually), and
its respective sets of
keypoints/descriptors - (optional) Keep the real-world
dimensions of such planar sections
A possible algorithm:
- Remove the background (street, trees)
- Segment non background objects(possible issues with occlusion)
- Detect keypoints and descriptors for each segmented object
- Use an classifier (SVM,Vocabulary Tree, etc) in order to verify if it is a bus
- Match the candidate object features with the features of the bus type determined by the classifier
- For each planar section, try to compute an homography with the matched features.
- If one or more homographies are consistent, you have detected the bus.
If you wish determine its distance (optional):
- Refine the matched features with some template matching approach
- If the refinement succeeds, use PnP with the matched points and their real-world dimensions, and determine its distance. (you will need a calibrated camera)
Of course, this will need a bit of tweaking, determine the best keypoint detector/feature descriptor, etc.
Note that if you have a mobile camera (smartphone, tablet, go pro, etc...), it is much more difficult, you will have to train the classifier with several poses from the buses. Things like background removal and determination of distances would be more difficult, if not impossible.