Welcome Guest, you are in: • Login

MUST Creative Engineering Laboratory

Navigation

Main Page

Technical Doc

Search the wiki
»

MUST Corp.

Microsoft CERTIFIED Partner Software Development, Web Development, Data Platform

Microsoft Small Business Specialist

MCSD

Microsoft Certified IT Professional

Microsoft Certified Professional Developer

Discuss (0) History

Decision Tree

Print

RSS

Modified on 2010/05/09 18:59 by Administrator Categorized as Data Mining

» Decision Tree

Training Dataset

This follows an example from Quinlan’s ID3

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm)

Tree is constructed in a top-down recursive divide-and-conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discretized in advance)
Examples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

Conditions for stopping partitioning

All samples for a given node belong to the same class
There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
There are no samples left

Attribute Selection Measure: Information Gain (ID3/C4.5)

Select the attribute with the highest information gain
S contains s_i tuples of class Ci for i = {1, …, m}
information measures info required to classify any arbitrary tuple
entropy of attribute A with values {a₁,a₂,…,a_v}
information gained by branching on attribute A

Attribute Selection by Information Gain Computation

Class P: buys_computer = “yes”
Class N: buys_computer = “no”
I(p, n) = I(9, 5) =0.940
Compute the entropy for age:

means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. Hence

Similarly,

Output: A Decision Tree for “buys_computer”

MUST Creative Engineering Laboratory

Copyright © 2010 MUST Corp. All rights reserved. must@must.or.kr
This Program is released under the GNU General Public License v2. View the GNU General Public License v2 or visit the GNU website.