Welcome Guest, you are in: Login

MUST Creative Engineering Laboratory

RSS RSS

Navigation



Technical Doc



Search the wiki
»

MUST Corp.

MUST Corp.

www.must.or.kr

 Microsoft CERTIFIED Partner Software Development, Web Development, Data Platform

 Microsoft Small Business Specialist

MCSD

Microsoft Certified IT Professional

Microsoft Certified Professional Developer

Training Dataset

This follows an example from Quinlan’s ID3

Image

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm)

  • Tree is constructed in a top-down recursive divide-and-conquer manner
  • At start, all the training examples are at the root
  • Attributes are categorical (if continuous-valued, they are discretized in advance)
  • Examples are partitioned recursively based on selected attributes
  • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

Conditions for stopping partitioning

  • All samples for a given node belong to the same class
  • There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
  • There are no samples left

Attribute Selection Measure: Information Gain (ID3/C4.5)

  • Select the attribute with the highest information gain
  • S contains si tuples of class Ci for i = {1, …, m}
  • information measures info required to classify any arbitrary tuple
    Image
  • entropy of attribute A with values {a1,a2,…,av}
    Image
  • information gained by branching on attribute A
    Image

Attribute Selection by Information Gain Computation

  • Class P: buys_computer = “yes”
  • Class N: buys_computer = “no”
  • I(p, n) = I(9, 5) =0.940
  • Compute the entropy for age:
    Image

Image
Image means “age <=30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. Hence Image

Similarly, Image

Output: A Decision Tree for “buys_computer”

Image






MUST Creative Engineering Laboratory

ImageImage Image Image

Image Image Image Image Image Image Image

Copyright © 2010 MUST Corp. All rights reserved. must@must.or.kr
This Program is released under the GNU General Public License v2. View the GNU General Public License v2 or visit the GNU website.