Confusion on Transfer Learning Models
Maybe this may not be the place to ask this, but I am a beginner and I feel like someone here may think this is an easy question. I'll start with a little background. I am in the process of creating a Real-time object detection program that can detect some US traffic signs and maybe traffic lights. I have decided that openCV with the use of Keras/Tensorflow are probably my best options in achieving this goal. Within these I figured (I'm still new to this field) that transfer-learning on ImageNet would be my best option in building an image classifier. I have downloaded the LISA traffic sign data set and decided this is what I want to use for Transfer-learning on ImageNet. The part that's hanging me up in this whole process would be would it be better to show a street view with signs on it (this is what the LISA dataset contains) or just gather my own pictures (from the internet) of closeups of these signs. Also if any other part of my logic is flawed please let me know I am very interested in learning.
Thanks.
does the LISA data fit your situation ?
do you know how the annotations look like ? you'll probably need bounding boxes.
did you already decide, which pretrained model to use for this ? transfer learning would mean here: use a pretrained network (e.g. on imagenet), and retrain the box related layers (only, freezing the conv layers!) with your own data(&classes/boundingboxes).
Well if i get you right you are asking how to build a dataset? Some hints:
Size:
Distribution
Images with your object in natural situations. This is important because you want your object to be detected in that situations. For example a car on a street.
Various Image and Object sizes
This is important to learn a scale invariant representation of your object. For example toy truck or truck
Various angles This is to detect your object from various positions
Supply negatives This helps during learning to separate your object from the background
From my opinions you can archieve the same when putting the background images into a separate class "other". I think of negatives as a pseudo class.
imho, the LISA dataset is already quite a good choice for this.
it has ~70 classes and ~100 images per class in various scales. (for multi-class detection, it does not need any "background" samples)
Here is my thought on the approach then:
To train: *Preprocess the LISA images to get only the signs? ( crop the background out) *Put them in separate class folders. *Train the neural network on these images.
In real scenario (after training): *Identify if something is a sign by using the different contour functions? *Feed through CNN *profit?
The ?s are where I'm mostly confused
Train
Well sampling the background and use them as negatives is ok - i think you could gain from this. But berak has a point with saying "...or multi-class detection, it does not need any "background" samples". On the other hand, i made good experiences with negatives even in a multiclass scenario and the yolo framework encourages you to do so. https://github.com/AlexeyAB/darknet
"...Desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects..."
But please dont crop out the traffic signs as train images. This could hurt your detection abilities. Just mark the bounding box(es) and feed in the image unprocessed
Detection
This is also called interference. For object detection you usually run a model(feed image to network in non train mode) and get back a matrix with the bounding boxes and probabilities. Thats it. Forget about contours in that scenario.
I recommend you to actually run a pretrained model for testing purposes before training so you get familar with detection "api".
Happy labeling :)