ID3 Algorithm Decision Tree Solved Example Machine Learning ~ TUTORIALTPOINT- Java Tutorial, C Tutorial, DBMS Tutorial

Problem Definition:

Build a decision tree using ID3 algorithm for the given training data in the table (Buy Computer data), and predict the class of the following new example: age<=30, income=medium, student=yes, credit-rating=fair

age	income	student	Credit rating	Buys computer
<=30	high	no	fair	no
<=30	high	no	excellent	no
31…40	high	no	fair	yes
>40	medium	no	fair	yes
>40	low	yes	fair	yes
>40	low	yes	excellent	no
31…40	low	yes	excellent	yes
<=30	medium	no	fair	no
<=30	low	yes	fair	yes
>40	medium	yes	fair	yes
<=30	medium	yes	excellent	yes
31…40	medium	no	excellent	yes
31…40	high	yes	fair	yes
>40	medium	no	excellent	no

Solution:

First, check which attribute provides the highest Information Gain in order to split the training set based on that attribute. We need to calculate the expected information to classify the set and the entropy of each attribute.

The information gain is this mutual information minus the entropy:

The mutual information of the two classes,

Entropy(S)= E(9,5)= -9/14 log₂(9/14) – 5/14 log₂(5/14)=0.94

Now Consider the Age attribute

For Age, we have three values age_<=30 (2 yes and 3 no), age_31..40 (4 yes and 0 no), and age_>40 (3 yes and 2 no)

Entropy(age) = 5/14 (-2/5 log₂(2/5)-3/5log₂(3/5)) + 4/14 (0) + 5/14 (-3/5log₂(3/5)-2/5log₂(2/5))

= 5/14(0.9709) + 0 + 5/14(0.9709) = 0.6935

Gain(age) = 0.94 – 0.6935 = 0.2465

Next, consider Income Attribute

For Income, we have three values income_high (2 yes and 2 no), income_medium (4 yes and 2 no), and income_low (3 yes 1 no)

Entropy(income) = 4/14(-2/4log₂(2/4)-2/4log₂(2/4)) + 6/14 (-4/6log₂(4/6)-2/6log₂(2/6)) + 4/14 (-3/4log2(3/4)-1/4log2(1/4))

= 4/14 (1) + 6/14 (0.918) + 4/14 (0.811)

= 0.285714 + 0.393428 + 0.231714 = 0.9108

Gain(income) = 0.94 – 0.9108 = 0.0292

Next, consider Student Attribute

For Student, we have two values student_yes (6 yes and 1 no) and student_no (3 yes 4 no)

Entropy(student) = 7/14(-6/7log₂(6/7)-1/7log₂(1/7)) + 7/14(-3/7log₂(3/7)-4/7log₂(4/7)

= 7/14(0.5916) + 7/14(0.9852)

= 0.2958 + 0.4926 = 0.7884

Gain (student) = 0.94 – 0.7884 = 0.1516

Finally, consider Credit_Rating Attribute

For Credit_Rating we have two values credit_ratingfair (6 yes and 2 no) and credit_ratingexcellent (3 yes 3 no)

Entropy(credit_rating) = 8/14(-6/8log₂(6/8)-2/8log₂(2/8)) + 6/14(-3/6log₂(3/6)-3/6log₂(3/6))

= 8/14(0.8112) + 6/14(1)

= 0.4635 + 0.4285 = 0.8920

Gain(credit_rating) = 0.94 – 0.8920 = 0.479

Since Age has the highest Information Gain we start splitting the dataset using the age attribute.

Since all records under the branch age31..40 are all of the class, Yes, we can replace the leaf with Class=Yes

Now build the decision tree for the left subtree

The same process of splitting has to happen for the two remaining branches.

For branch age<=30 we still have attributes income, student, and credit_rating. Which one should be used to split the partition?

The mutual information is E(S_age<=30)= E(2,3)= -2/5 log₂(2/5) – 3/5 log₂(3/5)=0.97

For Income, we have three values income_high (0 yes and 2 no), income_medium (1 yes and 1 no) and income_low (1 yes and 0 no)

Entropy(income) = 2/5(0) + 2/5 (-1/2log₂(1/2)-1/2log₂(1/2)) + 1/5 (0) = 2/5 (1) = 0.4

Gain(income) = 0.97 – 0.4 = 0.57

For Student, we have two values student_yes (2 yes and 0 no) and student_no (0 yes 3 no)

Entropy(student) = 2/5(0) + 3/5(0) = 0

Gain (student) = 0.97 – 0 = 0.97

We can then safely split on attribute student without checking the other attributes since the information gain is maximized.

Since these two new branches are from distinct classes, we make them into leaf nodes with their respective class as label:

Now build the decision tree for right left subtree

The mutual information is Entropy(S_age>40)= I(3,2)= -3/5 log₂(3/5) – 2/5 log₂(2/5)=0.97

For Income, we have two values income_medium (2 yes and 1 no) and income_low (1 yes and 1 no)

Entropy(income) = 3/5(-2/3log2(2/3)-1/3log2(1/3)) + 2/5 (-1/2log2(1/2)-1/2log2(1/2))

= 3/5(0.9182)+2/5 (1) = 0.55+0. 4= 0.95

Gain(income) = 0.97 – 0.95 = 0.02

For Student, we have two values student_yes (2 yes and 1 no) and student_no (1 yes and 1 no)

Entropy(student) = 3/5(-2/3log₂(2/3)-1/3log₂(1/3)) + 2/5(-1/2log₂(1/2)-1/2log₂(1/2)) = 0.95

Gain (student) = 0.97 – 0.95 = 0.02

For Credit_Rating, we have two values credit_ratingfair (3 yes and 0 no) and credit_ratingexcellent (0 yes and 2 no)

Entropy(credit_rating) = 0

Gain(credit_rating) = 0.97 – 0 = 0.97

We then split based on credit_rating. These splits give partitions each with records from the same class. We just need to make these into leaf nodes with their class label attached:

New example: age<=30, income=medium, student=yes, credit-rating=fair

Follow branch(age<=30) then student=yes we predict Class=yes

Buys_computer = yes

Source: https://vtupulse.com

Thursday, 22 February 2024

ID3 Algorithm Decision Tree Solved Example Machine Learning

Solution:

Now build the decision tree for the left subtree

Now build the decision tree for right left subtree

0 comments :

Post a Comment

NumPy Tutorial

Advertisement

Java Tutorial

UGC NET CS TUTORIAL

Data Base Management

C Programming

Python Tutorial

GATE TUTORIAL

Data Structures

computer Organization

Computer Basics