OpenCV MLP with Sigmoid Neurons, Output range
I have searched for answers here on SO and google to the following question, but haven't found anything, so here is my situation:
I want to realize a MLP that learns some similarity function. I have training and test samples and the MLP set up and running. My problem is how to provide the teacher outputs to the net (from which value range).
Here is is the relevant part of my code:
CvANN_MLP_TrainParams params(
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
CvANN_MLP_TrainParams::BACKPROP,
0.1,
0.1);
Mat layers = (Mat_<int>(3,1) << FEAT_SIZE, H_NEURONS, 1);
CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
int iter = net.train(X, Y, Mat(), Mat(), params);
net.predict(X_test, predictions);
The number of input and hidden neurons is set somewhere else and the net has 1 output neuron. X, Y, X_test are Mats containing the training and test samples, no problem here. The problem is, from what value range my Y's have to come and from what value range the predictions will come.
In the documentation I have found the following statements:
For training:
If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.
Since I'm NOT using the default sigmoid function (the one with alpha=0 and beta=0), I'm providing my Y's from [0,1]. Is this right, or do they mean something else with 'default sigmoid function'? Im asking this, because for prediction they explicitly mention alpha and beta:
If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].
Again, since I'm not using the default sigmoid function, I assume to get predictions from [0,1]. Am I right so far?
What is confusing me here is that I've found another question regarding the output range of OpenCV's sigmoid function, that says the range has to be [-1,1].
And now comes the real confusion: When I train the net and let it make some predictions, I get values slightly larger than 1 (around 1.03), regardless if my Y's come from [0,1] or [-1,1]. And this shouldn't happen in either case.
Could somebody please enlighten me? Am I missing something here?
Thanks in advance.
EDIT:
To make things very clear, I came up with a small example that shows the problem:
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>
using namespace cv;
using namespace std;
int main() {
int POS = 1;
int NEG = -1;
int SAMPLES = 100;
float SPLIT = 0.8;
float C_X = 0.5;
float C_Y = 0.5;
float R = 0.3;
Mat X(SAMPLES, 2, CV_32FC1);
Mat Y(SAMPLES, 1, CV_32FC1);
randu(X, 0, 1);
for(int i = 0; i < SAMPLES ...