Saturday, 28 February 2015

Review 2.6: Modeling with Decision Trees - Deal with Numerical Outcomes

It is not a good idea to treat every numerical data as a different category when the outcomes of given dataset are numbers. A better way is using variance as the criteria when finding the best split. 

The formula of variance is :
S^2=[(X1-X)^2+(X2-X)^2+...+(Xn-X)^2]/n ; 

When building a decision tree using variance as the scoring function , we can use variance to split higher values on one side and lower values on another side. This will reduce the overall variance on the branches.

No comments:

Post a comment