Pruning do not have the drawback described above. It creates the entire tree at first. so we can start to check wheather merging branches would increase the entropy less than a given threshold from the bottom of the tree.
The code of pruning a decision tree:
def prune(tree:DTreeNode , threshold:Float ):Unit = { if (tree.left != None ) prune(tree.left.get, threshold) if (tree.right != None ) prune(tree.right.get, threshold) if (tree.left != None && tree.left.get.result!=None && tree.right != None && tree.right.get.result !=None ) { val left = tree.left.get.result.get.toList.map(x=> (for(i←0 to x._2) yield List(x._1)).toList ).flatten val right = tree.right.get.result.get.toList.map( x=> (for(i←0 to x._2) yield List(x._1)).toList ).flatten val p1 = left.length / (left.length + right.length) val p2 = 1.0 - p1 println(entropy(left++right)+":" + entropy(left)*p1+":" + entropy(right)*p2) val diff = entropy(left++right) - (entropy(left)*p1 + entropy(right)*p2) if ( diff ≤ threshold ) { tree.colIndex = None tree.split = None tree.left = None tree.right = None tree.result = Some(countResult(left ++ right)) } } }A example
No comments:
Post a Comment