1 | initial version |
Okay so tanh(x) = sigmoid function = (1+e^(-2x)) / (1-e^(-2x)) so the activation functions are the same. The sigmoid function for training is defined as (beta) * (1+e^(-(alpha)x)) / (1-e^(-(alpha)x)). The predict function is defined as 1.7159*tanh(2/3 * x).
LBerger noticed in the source code that this can change with fparam1 and fparam2, which after some more digging are also alpha and beta, respectfully. By default they are set to zero, however the source code then changes their values to 2/3 and 1.7159 if it is less than FLT_EPSILON which is just a small decimal - so default value of zero means default values of 2/3 and 1.7159.
I'm not sure why they chose these numbers, but if I want to scale my responses between [0,1] then I need to insure all passed in values are positive, and that beta is set to 1.
I also set alpha to 2 as this is the original tanh(x) identity. However I am not sure how alpha and the fparam1 (2/3) conversion is made - as in order to maintain the identity with alpha = 2, fparam1 for tanh (the value set to 2/3 by default) should be 1. As seen by tanh(x) = (1+e^(-2x)) / (1-e^(-2x)).
So at some point I think there should be something like alpha = 2 * fparam1, however I do not see this anywhere in the source code, it seems fparam1 = alpha as identity but that is not the equality relationship between hyperbolic tangent and the sigmoid function, is there a reason why?
2 | No.2 Revision |
Okay so tanh(x) = sigmoid function = (1+e^(-2x)) / (1-e^(-2x)) so the activation functions are the same. The sigmoid function for training is defined as (beta) * (1+e^(-(alpha)x)) / (1-e^(-(alpha)x)). The predict function is defined as 1.7159*tanh(2/3 * x).
LBerger noticed in the source code that this can change with fparam1 and fparam2, which after some more digging are also alpha and beta, respectfully. By default they are set to zero, however the source code then changes their values to 2/3 and 1.7159 if it is less than FLT_EPSILON which is just a small decimal - so default value of zero means default values of 2/3 and 1.7159.
I'm not sure why they chose these numbers, but if I want to scale my responses between [0,1] then I need to insure all passed in values are positive, and that beta is set to 1.
I also set alpha to 2 as this is the original tanh(x) identity. However I am not sure how alpha and the fparam1 (2/3) conversion is made - as in order to maintain the identity with alpha = 2, fparam1 for tanh (the value set to 2/3 by default) should be 1. As seen by tanh(x) = (1+e^(-2x)) / (1-e^(-2x)).
So at some point I think there should be something like alpha = 2 -2 * fparam1, however I do not see this anywhere in the source code, it seems fparam1 = alpha as identity but that is not the equality relationship between hyperbolic tangent and the sigmoid function, is there a reason why?
3 | No.3 Revision |
Okay so tanh(x) = sigmoid function = (1+e^(-2x)) / (1-e^(-2x)) so the activation functions are the same. The sigmoid function for training is defined as (beta) * (1+e^(-(alpha)x)) / (1-e^(-(alpha)x)). The predict function is defined as 1.7159*tanh(2/3 * x).
LBerger noticed in the source code that this can change with fparam1 and fparam2, which after some more digging are also alpha and beta, respectfully. By default they are set to zero, however the source code then changes their values to 2/3 and 1.7159 if it is less than FLT_EPSILON which is just a small decimal - so default value of zero means default values of 2/3 and 1.7159.
I'm not sure why they chose these numbers, but if I want to scale my responses between [0,1] then I need to insure all passed in values are positive, and that beta is set to 1.
I also would like to set alpha to 2 as this is the original tanh(x) identity. However I am not sure how alpha and the fparam1 (2/3) conversion is made - as in order to maintain the identity with alpha = 2, fparam1 for tanh (the value set to 2/3 by default) should be 1. As seen by tanh(x) = (1+e^(-2x)) / (1-e^(-2x)).
So at some point I think there should be something like alpha = -2 2 * fparam1, however I do not see this anywhere in the source code, it seems fparam1 = alpha as identity but that is not the equality relationship between hyperbolic tangent and the sigmoid function, is there a reason why?