A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
Advantage/Features of Decision Tree
- Decision trees require less effort for data preparation as compared to other algorithms during pre-processing.
- Do not require normalization of data.
- Do not require scaling of data as well.
- Missing values in the data also do NOT affect the process of building a decision tree to any considerable extent.
- A Decision tree model is very intuitive and easy to explain to technical teams as well as stakeholders.
Disadvantages/Shortcomings of Decision Tree
- A small change in the data can cause a large change in the structure of the decision tree causing instability.
- For a Decision tree sometimes calculation can go far more complex compared to other algorithms.
- It often involves a higher time to train the model.
- Its training is relatively expensive because the complexity and time taken are more.
- Decision Tree algorithm is inadequate for applying regression and predicting continuous values.
Types of Decision Tree
- Categorical Variable Decision Tree
Decision Tree which has a categorical target variable then it called a categorical variable decision tree. For Example, I say Will Bangladesh be able to beat India?, so I can have the decision parameter as YES or NO. - Continuous Variable Decision Tree
Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree. For Example, I say What is the percentage of chances that India will win? so here we have a varying value. - ID3 (Iterative Dichotomiser 3)
- C4.5 (successor of ID3)
- CART (Classification And Regression Tree)
- CHAID (CHi-squared Automatic Interaction Detector)
- MARS (Multivariate Adaptive Regression Splines)
- Conditional Inference Trees.
What is the importance of a Decision Tree?
- Simple to easy, use, understand and explain
- They do feature selection and variable screening
- They require very little efforts to prepare and produce
- Can live with nonlinear relationships and parameters also
- Can use conditional probability-based reasoning or Bayes theorem
- They provide strategic answers to uncertain situations
Regression Tree vs Classification Tree
|
Regression Tree |
Classification Tree |
Dependent Variable |
Used when the dependent variable is continuous |
Used when the dependent variable is categorical |
Unseen Data |
Makes prediction with the mean value |
Makes prediction with mode values |
Predictor Space |
divides into a distinct and non-overlapping region |
divides into a distinct and non-overlapping region |
Approach Followed |
Top-Down Greedy |
Top-Down Greedy |
Entropy and Information Gain Calculations
Entropy is something is very important when we have to find the Information Gain, which in turn places a vital role in selecting the next attribute.
Following is the formula we will use during this article,
Example 1
Entropy([29+,35-]) = -29/64*log2 (29/64) – 35/64*log2 (35/64)
= 0.99
Entropy([21+,5-]) = -21/26*log2 (21/26) - 5/26*log2 (5/26)
= 0.71
Entropy([8+,30-]) = -8/38*log2 (8/38) - 30/38 *log2 (30/38)
= 0.74
Information Gain([29+,35-]),A1) = 0.99 - (26/64)*0.71 - (38/64)*0.74
=-0.15
Example 2
Entropy([29+,35-]) = -29/64*log2 (29/64) – 35/64*log2 (35/64)
= 0.99
Entropy([21+,5-]) = -18/51*log2 (18/51) - 33/51*log2 (33/51)
= 0.94
Entropy([8+,30-]) = -11/13*log2 (11/13) - 2/13 *log2 (2/13)
= 0.62
Information Gain([29+,35-]),A1) = 0.99 - (52/64)*0.94 - (13/64)*0.62
=0.1
So we can see that the information gain is bigger incase of example 2(IG=0.1) as compared to example 1(IG=-0.15)
Decision Tree Example
Till now we studied theory, now let's try out some hands-on. I will take a demo dataset and will construct a decision tree based upon that dataset.
I am taking the following dataset, where we make a decision on whether we will play or not based upon the given environmental conditions.
Day |
Outlook |
Temperature |
Humidity |
Wind |
Play Tennis |
D1
|
Sunny
|
Hot
|
High
|
Weak
|
No
|
D2
|
Sunny
|
Hot
|
High
|
Strong
|
No
|
D3
|
Overcast
|
Hot
|
High
|
Weak
|
Yes
|
D4 |
Rain
|
Mild
|
High
|
Weak
|
Yes
|
D5
|
Rain
|
Cool
|
Normal
|
Weak
|
Yes
|
D6
|
Rain
|
Cool
|
Normal
|
Strong
|
No
|
D7
|
Overcast
|
Cool
|
Normal
|
Weak
|
Yes
|
D8
|
Sunny
|
Mild
|
High
|
Weak
|
No
|
D9
|
Sunny
|
Cool
|
Normal
|
Weak
|
Yes
|
D10
|
Rain
|
Mild
|
Normal
|
Strong
|
Yes
|
D11
|
Sunny |
Mild
|
Normal
|
Strong
|
Yes
|
D12
|
Overcast
|
Mild
|
High
|
Strong
|
Yes |
D13
|
Overcast
|
Hot
|
Normal
|
Weak
|
Yes
|
D14
|
Rain
|
Mild
|
High
|
Strong
|
No
|
Calculation of Gini Index for "Outlook" feature
1. We can see that Outlook has 5 instances (5/14) as "Sunny", 4 instances (4/14) as "Overcast" and 5 instances as "Rain".
2. Outlook = Sunny, Play Tennis = Yes, we have 2 instances (2/5)
Outlook = Sunny, Play Tennis = No, we have 3 instances (3/5)
Hence, the Gini Index comes out to be:
= 1 - ((2/5)^2+(3/5)^2)
= 0.48
3. Outlook = Overcast, Play Tennis = Yes, we have 4 instances (4/4)
Outlook = Overcast, Play Tennis = No, we have 0 instances (0/4)
Hence, the Gini Index comes out to be:
= 1 - ((4/4)^2+(0/4)^2)
= 0
4. Outlook = Rain, Play Tennis = Yes, we have 3 instances (3/5)
Outlook = Rain, Play Tennis = No, we have 2 instances (2/5)
Hence, the Gini Index comes out to be:
= 1 - ((3/5)^2+(2/5)^2)
= 0.48
5. So the final Gini Index is:
= (5/14)*0.48 + (4/14)*0 + (5/14)*0.48
= 0.34
Calculations of Gini Index of "Temperature" feature
1. We can see that, Temperature has 4 instances (4/14) as "Hot", 4 instances (4/14) as "Cool", 6 instances (6/14) as "Mild"
2. Temperature = Hot, Play Tennis = Yes, we have 2 instances (2/4)
Temperature = Hot, Play Tennis = No, we have 3 instances (2/4)
Hence, the Gini Index comes out to be:
= 1 - ((2/4)^2+(2/4)^2)
= 0.5
3. Temperature = Cool, Play Tennis = Yes, we have 3 instances (3/4)
Temperature = Cool, Play Tennis = No, we have 1 instances (1/4)
Hence, the Gini Index comes out to be:
= 1 - ((3/4)^2+(1/4)^2)
= 0.375
4. Temperature = Mild, Play Tennis = Yes, we have 4 instances (4/6)
Temperature = Mild, Play Tennis = No, we have 2 instances (2/6)
Hence, the Gini Index comes out to be:
= 1 - ((4/6)^2+(2/6)^2)
= 0.44
5. So the final Gini Index is:
= (4/14)*0.5 + (4/14)*0.375 + (6/14)*0.44
= 0.44
Calculations of Gini Index of "Humidity" feature
1. We can see that, Humidity has 7 instances (7/14) as "High", 7 instances (7/14) as "Normal"
2. Humidity = High, Play Tennis = Yes, we have 3 instances (3/7)
Humidity = High, Play Tennis = No, we have 4 instances (4/7)
Hence, the Gini Index comes out to be:
= 1 - ((3/7)^2+(4/7)^2)
= 0.49
3. Humidity = Normal, Play Tennis = Yes, we have 6 instances (6/7)
Humidity = Normal, Play Tennis = No, we have 1 instance (1/7)
Hence, the Gini Index comes out to be:
= 1 - ((6/7)^2+(1/7)^2)
= 0.24
4. So the final Gini Index is:
= (7/14)*0.49 + (7/14)*0.24
= 0.365
Calculations of Gini Index of "Wind" feature
1. We can see that, Wind has 8 instances (8/14) as "Weak", 6 instances (6/14) as "Strong"
2. Wind = Strong, Play Tennis = Yes, we have 3 instances (3/6)
Wind = Strong, Play Tennis = No, we have 4 instances (3/6)
Hence, the Gini Index comes out to be:
= 1 - ((3/6)^2+(3/6)^2)
= 0.5
3. Wind = Weak, Play Tennis = Yes, we have 6 instances (6/8)
Wind = Weak, Play Tennis = No, we have 2 instances (2/8)
Hence, the Gini Index comes out to be:
= 1 - ((6/8)^2+(2/8)^2)
= 0.375
4. So the final Gini Index is:
= (8/14)*0.375 + (6/14)*0.5
= 0.43
Since the Gini Index of Outlook is the smallest, we choose "outlook"
Decision Tree using Conjunction (AND Operation)
Decision Tree using Disjunction (OR Operations)
Decision Tree using XOR
Decision Tree using Disjunction of Conjunctions (OR of AND Operations)
Let us look at a rough decision tree, to look at how exactly the decision tree will look like
From the above trees we aren't able to clearly infer anything, so let me show you how exactly the Decision Tree will look for the given dataset and then we will try to understand, what exactly that tree means and how to use the tree to our benefit.
You may be wondering why did we take "Outlook" as root node", it is
because of the Gini Index value, calculated earlier
Following is the Decision Tree with the Weights which can be used to calculate entropy and information gain.
Now let's try to understand the tree,
R1: If (Outlook=Sunny) _| (Humidity=High) Then PlayTennis=No
The above statement means that if Outlook is Sunny and Humidity is high, then we do not play
R2: If (Outlook=Sunny) _| (Humidity=Normal) Then PlayTennis=Yes
The above statement means that if Outlook is Sunny and Humidity is normal, then we play
R3: If (Outlook=Overcast) Then PlayTennis=Yes
The above statement means that if Outlook is Overcast, then we play
R4: If (Outlook=Rain) -| (Wind=Strong) Then PlayTennis=Yes
The above statement means that if Outlook is Rain or Wild is Strong, then we play
Python Implementation of Decision Tree
The dataset that I will be using here is UCI Zoo Dataset, you can download it from the article
1. Using NumPy
- import pandas as pd
- import numpy as np
- from pprint import pprint
-
- dataset = pd.read_csv('zoo.csv',
- names=['animal_name','hair','feathers','eggs','milk',
- 'airbone','aquatic','predator','toothed','backbone',
- 'breathes','venomous','fins','legs','tail','domestic','catsize','class',])
-
- dataset=dataset.drop('animal_name',axis=1)
-
- def entropy(target_col):
-
- elements,counts = np.unique(target_col,return_counts = True)
- entropy = np.sum([(-counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in range(len(elements))])
- return entropy
-
- def InfoGain(data,split_attribute_name,target_name="class"):
-
-
- total_entropy = entropy(data[target_name])
-
-
-
-
- vals,counts= np.unique(data[split_attribute_name],return_counts=True)
-
-
- Weighted_Entropy = np.sum([(counts[i]/np.sum(counts))*entropy(data.where(data[split_attribute_name]==vals[i]).dropna()[target_name]) for i in range(len(vals))])
-
-
- Information_Gain = total_entropy - Weighted_Entropy
- return Information_Gain
-
- def ID3(data,originaldata,features,target_attribute_name="class",parent_node_class = None):
-
-
-
-
- if len(np.unique(data[target_attribute_name])) <= 1:
- return np.unique(data[target_attribute_name])[0]
-
-
- elif len(data)==0:
- return np.unique(originaldata[target_attribute_name])[np.argmax(np.unique(originaldata[target_attribute_name],return_counts=True)[1])]
-
-
-
-
-
- elif len(features) ==0:
- return parent_node_class
-
-
-
- else:
-
- parent_node_class = np.unique(data[target_attribute_name])[np.argmax(np.unique(data[target_attribute_name],return_counts=True)[1])]
-
-
- item_values = [InfoGain(data,feature,target_attribute_name) for feature in features]
- best_feature_index = np.argmax(item_values)
- best_feature = features[best_feature_index]
-
-
-
- tree = {best_feature:{}}
-
-
-
- features = [i for i in features if i != best_feature]
-
-
-
- for value in np.unique(data[best_feature]):
- value = value
-
- sub_data = data.where(data[best_feature] == value).dropna()
-
-
- subtree = ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
-
-
- tree[best_feature][value] = subtree
-
- return(tree)
-
- def predict(query,tree,default = 1):
-
- for key in list(query.keys()):
- if key in list(tree.keys()):
-
- try:
- result = tree[key][query[key]]
- except:
- return default
-
- result = tree[key][query[key]]
-
- if isinstance(result,dict):
- return predict(query,result)
- else:
- return result
-
- def train_test_split(dataset):
- training_data = dataset.iloc[:80].reset_index(drop=True)
-
- testing_data = dataset.iloc[80:].reset_index(drop=True)
- return training_data,testing_data
-
- training_data = train_test_split(dataset)[0]
- testing_data = train_test_split(dataset)[1]
-
- def test(data,tree):
-
-
- queries = data.iloc[:,:-1].to_dict(orient = "records")
-
-
- predicted = pd.DataFrame(columns=["predicted"])
-
-
- for i in range(len(data)):
- predicted.loc[i,"predicted"] = predict(queries[i],tree,1.0)
- print('The prediction accuracy is: ',(np.sum(predicted["predicted"] == data["class"])/len(data))*100,'%')
-
- tree = ID3(training_data,training_data,training_data.columns[:-1])
- pprint(tree)
- test(testing_data,tree)
the output that I got is
{'legs': {0: {'fins': {0.0: {'toothed': {0.0: 7.0, 1.0: 3.0}},
1.0: {'eggs': {0.0: 1.0, 1.0: 4.0}}}},
2: {'hair': {0.0: 2.0, 1.0: 1.0}},
4: {'hair': {0.0: {'toothed': {0.0: 7.0, 1.0: 5.0}}, 1.0: 1.0}},
6: {'aquatic': {0.0: 6.0, 1.0: 7.0}}, 8: 7.0}}
The prediction accuracy is: 85.71428571428571 %
2. Using SKLearn
-
- from sklearn.tree import DecisionTreeClassifier
-
-
- dataset = pd.read_csv('zoo.csv',
- names=['animal_name','hair','feathers','eggs','milk',
- 'airbone','aquatic','predator','toothed','backbone',
- 'breathes','venomous','fins','legs','tail','domestic','catsize','class',])
-
- dataset=dataset.drop('animal_name',axis=1)
-
- train_features = dataset.iloc[:80,:-1]
- test_features = dataset.iloc[80:,:-1]
- train_targets = dataset.iloc[:80,-1]
- test_targets = dataset.iloc[80:,-1]
-
- tree = DecisionTreeClassifier(criterion = 'entropy').fit(train_features,train_targets)
-
- prediction = tree.predict(test_features)
-
- print("The prediction accuracy is: ",tree.score(test_features,test_targets)*100,"%")
The output that I got is:
The prediction accuracy is: 80.95238095238095 %
3. Using TensorFlow
- import numpy as np
- import pandas as pd
- import tensorflow as tf
- from functools import reduce
- import matplotlib.pyplot as plt
- %matplotlib inline
-
- np.random.seed(1943)
- tf.set_random_seed(1943)
-
- iris = [((5.1, 3.5, 1.4, 0.2), (1, 0, 0)),
- ((4.9, 3.0, 1.4, 0.2), (1, 0, 0)),
- ((4.7, 3.2, 1.3, 0.2), (1, 0, 0)),
- ((4.6, 3.1, 1.5, 0.2), (1, 0, 0)),
- ((5.0, 3.6, 1.4, 0.2), (1, 0, 0)),
- ((5.4, 3.9, 1.7, 0.4), (1, 0, 0)),
- ((4.6, 3.4, 1.4, 0.3), (1, 0, 0)),
- ((5.0, 3.4, 1.5, 0.2), (1, 0, 0)),
- ((4.4, 2.9, 1.4, 0.2), (1, 0, 0)),
- ((4.9, 3.1, 1.5, 0.1), (1, 0, 0)),
- ((5.4, 3.7, 1.5, 0.2), (1, 0, 0)),
- ((4.8, 3.4, 1.6, 0.2), (1, 0, 0)),
- ((4.8, 3.0, 1.4, 0.1), (1, 0, 0)),
- ((4.3, 3.0, 1.1, 0.1), (1, 0, 0)),
- ((5.8, 4.0, 1.2, 0.2), (1, 0, 0)),
- ((5.7, 4.4, 1.5, 0.4), (1, 0, 0)),
- ((5.4, 3.9, 1.3, 0.4), (1, 0, 0)),
- ((5.1, 3.5, 1.4, 0.3), (1, 0, 0)),
- ((5.7, 3.8, 1.7, 0.3), (1, 0, 0)),
- ((5.1, 3.8, 1.5, 0.3), (1, 0, 0)),
- ((5.4, 3.4, 1.7, 0.2), (1, 0, 0)),
- ((5.1, 3.7, 1.5, 0.4), (1, 0, 0)),
- ((4.6, 3.6, 1.0, 0.2), (1, 0, 0)),
- ((5.1, 3.3, 1.7, 0.5), (1, 0, 0)),
- ((4.8, 3.4, 1.9, 0.2), (1, 0, 0)),
- ((5.0, 3.0, 1.6, 0.2), (1, 0, 0)),
- ((5.0, 3.4, 1.6, 0.4), (1, 0, 0)),
- ((5.2, 3.5, 1.5, 0.2), (1, 0, 0)),
- ((5.2, 3.4, 1.4, 0.2), (1, 0, 0)),
- ((4.7, 3.2, 1.6, 0.2), (1, 0, 0)),
- ((4.8, 3.1, 1.6, 0.2), (1, 0, 0)),
- ((5.4, 3.4, 1.5, 0.4), (1, 0, 0)),
- ((5.2, 4.1, 1.5, 0.1), (1, 0, 0)),
- ((5.5, 4.2, 1.4, 0.2), (1, 0, 0)),
- ((4.9, 3.1, 1.5, 0.1), (1, 0, 0)),
- ((5.0, 3.2, 1.2, 0.2), (1, 0, 0)),
- ((5.5, 3.5, 1.3, 0.2), (1, 0, 0)),
- ((4.9, 3.1, 1.5, 0.1), (1, 0, 0)),
- ((4.4, 3.0, 1.3, 0.2), (1, 0, 0)),
- ((5.1, 3.4, 1.5, 0.2), (1, 0, 0)),
- ((5.0, 3.5, 1.3, 0.3), (1, 0, 0)),
- ((4.5, 2.3, 1.3, 0.3), (1, 0, 0)),
- ((4.4, 3.2, 1.3, 0.2), (1, 0, 0)),
- ((5.0, 3.5, 1.6, 0.6), (1, 0, 0)),
- ((5.1, 3.8, 1.9, 0.4), (1, 0, 0)),
- ((4.8, 3.0, 1.4, 0.3), (1, 0, 0)),
- ((5.1, 3.8, 1.6, 0.2), (1, 0, 0)),
- ((4.6, 3.2, 1.4, 0.2), (1, 0, 0)),
- ((5.3, 3.7, 1.5, 0.2), (1, 0, 0)),
- ((5.0, 3.3, 1.4, 0.2), (1, 0, 0)),
- ((7.0, 3.2, 4.7, 1.4), (0, 1, 0)),
- ((6.4, 3.2, 4.5, 1.5), (0, 1, 0)),
- ((6.9, 3.1, 4.9, 1.5), (0, 1, 0)),
- ((5.5, 2.3, 4.0, 1.3), (0, 1, 0)),
- ((6.5, 2.8, 4.6, 1.5), (0, 1, 0)),
- ((5.7, 2.8, 4.5, 1.3), (0, 1, 0)),
- ((6.3, 3.3, 4.7, 1.6), (0, 1, 0)),
- ((4.9, 2.4, 3.3, 1.0), (0, 1, 0)),
- ((6.6, 2.9, 4.6, 1.3), (0, 1, 0)),
- ((5.2, 2.7, 3.9, 1.4), (0, 1, 0)),
- ((5.0, 2.0, 3.5, 1.0), (0, 1, 0)),
- ((5.9, 3.0, 4.2, 1.5), (0, 1, 0)),
- ((6.0, 2.2, 4.0, 1.0), (0, 1, 0)),
- ((6.1, 2.9, 4.7, 1.4), (0, 1, 0)),
- ((5.6, 2.9, 3.6, 1.3), (0, 1, 0)),
- ((6.7, 3.1, 4.4, 1.4), (0, 1, 0)),
- ((5.6, 3.0, 4.5, 1.5), (0, 1, 0)),
- ((5.8, 2.7, 4.1, 1.0), (0, 1, 0)),
- ((6.2, 2.2, 4.5, 1.5), (0, 1, 0)),
- ((5.6, 2.5, 3.9, 1.1), (0, 1, 0)),
- ((5.9, 3.2, 4.8, 1.8), (0, 1, 0)),
- ((6.1, 2.8, 4.0, 1.3), (0, 1, 0)),
- ((6.3, 2.5, 4.9, 1.5), (0, 1, 0)),
- ((6.1, 2.8, 4.7, 1.2), (0, 1, 0)),
- ((6.4, 2.9, 4.3, 1.3), (0, 1, 0)),
- ((6.6, 3.0, 4.4, 1.4), (0, 1, 0)),
- ((6.8, 2.8, 4.8, 1.4), (0, 1, 0)),
- ((6.7, 3.0, 5.0, 1.7), (0, 1, 0)),
- ((6.0, 2.9, 4.5, 1.5), (0, 1, 0)),
- ((5.7, 2.6, 3.5, 1.0), (0, 1, 0)),
- ((5.5, 2.4, 3.8, 1.1), (0, 1, 0)),
- ((5.5, 2.4, 3.7, 1.0), (0, 1, 0)),
- ((5.8, 2.7, 3.9, 1.2), (0, 1, 0)),
- ((6.0, 2.7, 5.1, 1.6), (0, 1, 0)),
- ((5.4, 3.0, 4.5, 1.5), (0, 1, 0)),
- ((6.0, 3.4, 4.5, 1.6), (0, 1, 0)),
- ((6.7, 3.1, 4.7, 1.5), (0, 1, 0)),
- ((6.3, 2.3, 4.4, 1.3), (0, 1, 0)),
- ((5.6, 3.0, 4.1, 1.3), (0, 1, 0)),
- ((5.5, 2.5, 4.0, 1.3), (0, 1, 0)),
- ((5.5, 2.6, 4.4, 1.2), (0, 1, 0)),
- ((6.1, 3.0, 4.6, 1.4), (0, 1, 0)),
- ((5.8, 2.6, 4.0, 1.2), (0, 1, 0)),
- ((5.0, 2.3, 3.3, 1.0), (0, 1, 0)),
- ((5.6, 2.7, 4.2, 1.3), (0, 1, 0)),
- ((5.7, 3.0, 4.2, 1.2), (0, 1, 0)),
- ((5.7, 2.9, 4.2, 1.3), (0, 1, 0)),
- ((6.2, 2.9, 4.3, 1.3), (0, 1, 0)),
- ((5.1, 2.5, 3.0, 1.1), (0, 1, 0)),
- ((5.7, 2.8, 4.1, 1.3), (0, 1, 0)),
- ((6.3, 3.3, 6.0, 2.5), (0, 0, 1)),
- ((5.8, 2.7, 5.1, 1.9), (0, 0, 1)),
- ((7.1, 3.0, 5.9, 2.1), (0, 0, 1)),
- ((6.3, 2.9, 5.6, 1.8), (0, 0, 1)),
- ((6.5, 3.0, 5.8, 2.2), (0, 0, 1)),
- ((7.6, 3.0, 6.6, 2.1), (0, 0, 1)),
- ((4.9, 2.5, 4.5, 1.7), (0, 0, 1)),
- ((7.3, 2.9, 6.3, 1.8), (0, 0, 1)),
- ((6.7, 2.5, 5.8, 1.8), (0, 0, 1)),
- ((7.2, 3.6, 6.1, 2.5), (0, 0, 1)),
- ((6.5, 3.2, 5.1, 2.0), (0, 0, 1)),
- ((6.4, 2.7, 5.3, 1.9), (0, 0, 1)),
- ((6.8, 3.0, 5.5, 2.1), (0, 0, 1)),
- ((5.7, 2.5, 5.0, 2.0), (0, 0, 1)),
- ((5.8, 2.8, 5.1, 2.4), (0, 0, 1)),
- ((6.4, 3.2, 5.3, 2.3), (0, 0, 1)),
- ((6.5, 3.0, 5.5, 1.8), (0, 0, 1)),
- ((7.7, 3.8, 6.7, 2.2), (0, 0, 1)),
- ((7.7, 2.6, 6.9, 2.3), (0, 0, 1)),
- ((6.0, 2.2, 5.0, 1.5), (0, 0, 1)),
- ((6.9, 3.2, 5.7, 2.3), (0, 0, 1)),
- ((5.6, 2.8, 4.9, 2.0), (0, 0, 1)),
- ((7.7, 2.8, 6.7, 2.0), (0, 0, 1)),
- ((6.3, 2.7, 4.9, 1.8), (0, 0, 1)),
- ((6.7, 3.3, 5.7, 2.1), (0, 0, 1)),
- ((7.2, 3.2, 6.0, 1.8), (0, 0, 1)),
- ((6.2, 2.8, 4.8, 1.8), (0, 0, 1)),
- ((6.1, 3.0, 4.9, 1.8), (0, 0, 1)),
- ((6.4, 2.8, 5.6, 2.1), (0, 0, 1)),
- ((7.2, 3.0, 5.8, 1.6), (0, 0, 1)),
- ((7.4, 2.8, 6.1, 1.9), (0, 0, 1)),
- ((7.9, 3.8, 6.4, 2.0), (0, 0, 1)),
- ((6.4, 2.8, 5.6, 2.2), (0, 0, 1)),
- ((6.3, 2.8, 5.1, 1.5), (0, 0, 1)),
- ((6.1, 2.6, 5.6, 1.4), (0, 0, 1)),
- ((7.7, 3.0, 6.1, 2.3), (0, 0, 1)),
- ((6.3, 3.4, 5.6, 2.4), (0, 0, 1)),
- ((6.4, 3.1, 5.5, 1.8), (0, 0, 1)),
- ((6.0, 3.0, 4.8, 1.8), (0, 0, 1)),
- ((6.9, 3.1, 5.4, 2.1), (0, 0, 1)),
- ((6.7, 3.1, 5.6, 2.4), (0, 0, 1)),
- ((6.9, 3.1, 5.1, 2.3), (0, 0, 1)),
- ((5.8, 2.7, 5.1, 1.9), (0, 0, 1)),
- ((6.8, 3.2, 5.9, 2.3), (0, 0, 1)),
- ((6.7, 3.3, 5.7, 2.5), (0, 0, 1)),
- ((6.7, 3.0, 5.2, 2.3), (0, 0, 1)),
- ((6.3, 2.5, 5.0, 1.9), (0, 0, 1)),
- ((6.5, 3.0, 5.2, 2.0), (0, 0, 1)),
- ((6.2, 3.4, 5.4, 2.3), (0, 0, 1)),
- ((5.9, 3.0, 5.1, 1.8), (0, 0, 1))]
-
- feature = np.vstack([np.array(i[0]) for i in iris])
- label = np.vstack([np.array(i[1]) for i in iris])
-
- x = feature[:, 2:4]
- y = label
- d = x.shape[1]
-
- def tf_kron_prod(a, b):
- res = tf.einsum('ij,ik->ijk', a, b)
- res = tf.reshape(res, [-1, tf.reduce_prod(res.shape[1:])])
- return res
-
-
- def tf_bin(x, cut_points, temperature=0.1):
-
-
-
- D = cut_points.get_shape().as_list()[0]
- W = tf.reshape(tf.linspace(1.0, D + 1.0, D + 1), [1, -1])
- cut_points = tf.contrib.framework.sort(cut_points)
- b = tf.cumsum(tf.concat([tf.constant(0.0, shape=[1]), -cut_points], 0))
- h = tf.matmul(x, W) + b
- res = tf.nn.softmax(h / temperature)
- return res
-
-
- def nn_decision_tree(x, cut_points_list, leaf_score, temperature=0.1):
-
- leaf = reduce(tf_kron_prod,
- map(lambda z: tf_bin(x[:, z[0]:z[0] + 1], z[1], temperature), enumerate(cut_points_list)))
- return tf.matmul(leaf, leaf_score)
-
- num_cut = [1, 1]
- num_leaf = np.prod(np.array(num_cut) + 1)
- num_class = 3
-
- sess = tf.InteractiveSession()
- x_ph = tf.placeholder(tf.float32, [None, d])
- y_ph = tf.placeholder(tf.float32, [None, num_class])
- cut_points_list = [tf.Variable(tf.random_uniform([i])) for i in num_cut]
- leaf_score = tf.Variable(tf.random_uniform([num_leaf, num_class]))
- y_pred = nn_decision_tree(x_ph, cut_points_list, leaf_score, temperature=0.1)
- loss = tf.reduce_mean(tf.losses.softmax_cross_entropy(logits=y_pred, onehot_labels=y_ph))
- opt = tf.train.AdamOptimizer(0.1)
- train_step = opt.minimize(loss)
- sess.run(tf.global_variables_initializer())
- for i in range(1000):
- _, loss_e = sess.run([train_step, loss], feed_dict={x_ph: x, y_ph: y})
- if i % 200 == 0:
- print(loss_e)
- print('error rate %.2f' % (1 - np.mean(np.argmax(y_pred.eval(feed_dict={x_ph: x}), axis=1) == np.argmax(y, axis=1))))
- sample_x0 = np.repeat(np.linspace(0, np.max(x[:,0]), 100), 100).reshape(-1,1)
- sample_x1 = np.tile(np.linspace(0, np.max(x[:,1]), 100).reshape(-1,1), [100,1])
- sample_x = np.hstack([sample_x0, sample_x1])
- sample_label = np.argmax(y_pred.eval(feed_dict={x_ph: sample_x}), axis=1)
- plt.figure(figsize=(8,8))
- plt.scatter(x[:,0],
- x[:,1],
- c=np.argmax(y, axis=1),
- marker='o',
- s=50,
- cmap='summer',
- edgecolors='black')
- plt.scatter(sample_x0.flatten(),
- sample_x1.flatten(),
- c=sample_label.flatten(),
- marker='D',
- s=20,
- cmap='summer',
- edgecolors='none',
- alpha=0.33)
The output that I got is
1.1529802
0.32481167
0.1562062
0.15097024
0.118773825
error rate 0.04
Conclusion
In this article, we studied what is decision tree, when is decision tree used, assumptions of decision tree, key terms, how to make decision tree, advantages of decision tree, disadvantages of decision tree, types of decision tree, what is the importance of decision tree, regression tree vs classification tree, entropy, information gain and gini index calculations, decision tree example, python implementation of decision tree using sklearn, numpy, and
TensorFlow. Hope you were able to understand each and everything. For
any doubts, please comment on your query.
In the next article, we will learn about Naive Bayes.
Congratulations!!! You have climbed your next step in becoming a successful ML Engineer.