Hi All, I am learning a boosted tree from 30000 randomly generated features. The learning is limited to only say the best 100 features. After learning how do I extract from the CvBoost object, the indexes of the features used by the decision tree.
My motivation for doing this is to eliminate the requirement to generate all 30000 features and only compute those features that will be used. I've included a printout of the yml file generated from the CvBoost.save function. I think what I want is the value called sample_count
which identifies the feature as shown below in a decision tree of depth 1:
trees:
-
best_tree_idx: -1
nodes:
-
depth: 0
sample_count: 11556
value: -1.8339875099775065e+00
norm_class_idx: 0
Tn: 0
complexity: 0
alpha: 0.
node_risk: 0.
tree_risk: 0.
tree_error: 0.
splits:
- { var:497, quality:8.6223608255386353e-01,
le:5.3123302459716797e+00 }
-
depth: 1
sample_count: 10702
value: -1.8339875099775065e+00
norm_class_idx: 0
Tn: 0
complexity: 0
alpha: 0.
node_risk: 0.
tree_risk: 0.
tree_error: 0.
-
depth: 1
sample_count: 854
value: 1.8339875099775065e+00
norm_class_idx: 1
Tn: 0
complexity: 0
alpha: 0.
node_risk: 0.
tree_risk: 0.
tree_error: 0.
any help would be great
cheers
Peter