Ask Your Question

Revision history [back]

How to extract feature indexes from CvBoost

Hi All, I am learning a boosted tree from 30000 randomly generated features. The learning is limited to only say the best 100 features. After learning how do I extract from the CvBoost object, the indexes of the features used by the decision tree.

My motivation for doing this is to eliminate the requirement to generate all 30000 features and only compute those features that will be used. I've included a printout of the yml file generated from the CvBoost.save function. I think what I want is the value called sample_count which identifies the feature as shown below in a decision tree of depth 1:

 trees:
      -
         best_tree_idx: -1
         nodes:
            -
               depth: 0
               sample_count: 11556
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
               splits:
                  - { var:497, quality:8.6223608255386353e-01,
                      le:5.3123302459716797e+00 }
            -
               depth: 1
               sample_count: 10702
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
            -
               depth: 1
               sample_count: 854
               value: 1.8339875099775065e+00
               norm_class_idx: 1
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.

any help would be great

cheers

Peter

How to extract feature indexes from CvBoost

Hi All, I am learning a boosted tree from 30000 randomly generated features. The learning is limited to only say the best 100 features. After learning how do I extract from the CvBoost object, the indexes of the features used by the decision tree.

My motivation for doing this is to eliminate the requirement to generate all 30000 features and only compute those features that will be used. I've included a printout of the yml file generated from the CvBoost.save function. I think what I want is the value called sample_count which identifies the feature as shown below in a decision tree of depth 1:

 trees:
      -
         best_tree_idx: -1
         nodes:
            -
               depth: 0
               sample_count: 11556
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
               splits:
                  - { var:497, quality:8.6223608255386353e-01,
                      le:5.3123302459716797e+00 }
            -
               depth: 1
               sample_count: 10702
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
            -
               depth: 1
               sample_count: 854
               value: 1.8339875099775065e+00
               norm_class_idx: 1
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.

any help would be great

cheers

Peter

How to extract feature indexes from CvBoost

Hi All, I am learning a boosted tree from 30000 randomly generated features. The learning is limited to only say the best 100 features. After learning how do I extract from the CvBoost object, the indexes of the features used by the decision tree.

My motivation for doing this is to eliminate the requirement to generate all 30000 features and only compute those features that will be used. I've included a printout of the yml file generated from the CvBoost.save function. I think what I want is the value called sample_count which identifies the feature as shown below in a decision tree of depth 1:

 trees:
      -
         best_tree_idx: -1
         nodes:
            -
               depth: 0
               sample_count: 11556
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
               splits:
                  - { var:497, quality:8.6223608255386353e-01,
                      le:5.3123302459716797e+00 }
            -
               depth: 1
               sample_count: 10702
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
            -
               depth: 1
               sample_count: 854
               value: 1.8339875099775065e+00
               norm_class_idx: 1
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.

EDIT

Currently I have the following code for accessing the data:

//Interrogate the Decision Tree. Each element is a Decision Tree, making up the classifer
    CvSeq* decisionTree = boostDevice.get_weak_predictors();

    simplifyFeatureSet(decisionTree, firstOrderROIs );

this function is:

inline void Chnftrs::simplifyFeatureSet(CvSeq* decisionTree, std::vector<boost::tuple<int, cv::Rect> >& rois)
{
    //This variable stores the index of the feature used from rois and a pointer to the split so that the variable there can
    //be updated when the rois are pruned and reordered.
    std::vector<boost::tuple<int, CvDTreeSplit* > > featureIdx;

    //Determine the max depth of the tree

    printf("Size of boost %d \n", decisionTree->total);

    for (int i = 0; i < decisionTree->total; i++)
    {
            //Get the root of the tree
            CvBoostTree *tree =0;
            tree = (CvBoostTree*)cvGetSeqElem(decisionTree, i);

            if(tree == 0)
                printf("Tree is NULL\n");
            else
                printf("Tree Addr %ld\n", tree);            

            const CvDTreeNode *root = tree->get_root();

            printf("Class_idx %d, Value %f ", root->sample_count, root->value);

            featureIdx.push_back(boost::tuple<int, CvDTreeSplit*>(root->split->var_idx, root->split)); 

                    //Search down the right hand side
            depthFirstSearch(root->right, featureIdx);

            //Search down the left hand side
            depthFirstSearch(root->left, featureIdx);


    }
}

However when I try to access any members of root such as in root->sample_count I get a segmentation fault. It may be that members of CvTree are unaccessible unless the CvTreeTrainData.shared is set to true (by default it is false). as indicated here

any help would be great

cheers

Peter

How to extract feature indexes from CvBoostAccessing and modifying OpenCV Decision Tree Nodes when using Adaboost

Hi All, I am learning a boosted tree from 30000 randomly generated features. The learning is limited to only say the best 100 features. After learning how do I extract from the CvBoost object, the indexes of the features used by the decision tree.

My motivation for doing this is to eliminate the requirement to generate all 30000 features and only compute those features that will be used. I've included a printout of the yml file generated from the CvBoost.save function. I think what I want is the value called sample_count which identifies the feature as shown below in a decision tree of depth 1:

 trees:
      -
         best_tree_idx: -1
         nodes:
            -
               depth: 0
               sample_count: 11556
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
               splits:
                  - { var:497, quality:8.6223608255386353e-01,
                      le:5.3123302459716797e+00 }
            -
               depth: 1
               sample_count: 10702
               value: -1.8339875099775065e+00
               norm_class_idx: 0
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.
            -
               depth: 1
               sample_count: 854
               value: 1.8339875099775065e+00
               norm_class_idx: 1
               Tn: 0
               complexity: 0
               alpha: 0.
               node_risk: 0.
               tree_risk: 0.
               tree_error: 0.

EDIT

Currently I have the following code for accessing the data:

//Interrogate the Decision Tree. Each element is a Decision Tree, making up the classifer
    CvSeq* decisionTree = boostDevice.get_weak_predictors();

    simplifyFeatureSet(decisionTree, firstOrderROIs );

this function is:

inline void Chnftrs::simplifyFeatureSet(CvSeq* decisionTree, std::vector<boost::tuple<int, cv::Rect> >& rois)
{
    //This variable stores the index of the feature used from rois and a pointer to the split so that the variable there can
    //be updated when the rois are pruned and reordered.
    std::vector<boost::tuple<int, CvDTreeSplit* > > featureIdx;

    //Determine the max depth of the tree

    printf("Size of boost %d \n", decisionTree->total);

    for (int i = 0; i < decisionTree->total; i++)
    {
            //Get the root of the tree
            CvBoostTree *tree =0;
            tree = (CvBoostTree*)cvGetSeqElem(decisionTree, i);

            if(tree == 0)
                printf("Tree is NULL\n");
            else
                printf("Tree Addr %ld\n", tree);            

            const CvDTreeNode *root = tree->get_root();

            printf("Class_idx %d, Value %f ", root->sample_count, root->value);

            featureIdx.push_back(boost::tuple<int, CvDTreeSplit*>(root->split->var_idx, root->split)); 

                    //Search down the right hand side
            depthFirstSearch(root->right, featureIdx);

            //Search down the left hand side
            depthFirstSearch(root->left, featureIdx);


    }
}

However when I try to access any members of root such as in root->sample_count I get a segmentation fault. It may be that members of CvTree are unaccessible unless the CvTreeTrainData.shared is set to true (by default it is false). as indicated here

any help would be great

cheers

Peter