Decision Tree Explained

Overview

A decision tree is a graphical representation of possible solutions to a decision based on certain conditions. It's called a decision tree because it starts with a single box, which then branches off into a number of solutions, just like a tree.

There are 3 main parts of the tree

Root Node: First decision point or root of the tree
Internal Nodes: Node is something which has splits, it can either split into nodes or leaves
Leaf: Leaf is the last decision point i.e. we cannot break down the rules further

Gini Impurity:

Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen. But what is actually meant by ‘impurity’? If all the elements belong to a single class, then it can be called pure. The degree of Gini index varies between 0 and 1, where 0 denotes that all elements belong to a certain class or if there exists only one class, and 1 denotes that the elements are randomly distributed across various classes. A Gini Index of 0.5 denotes equally distributed elements into some classes.

Formula for Gini Index =

Steps followed by decision tree for classification

Step1: Calculate gini index for all the variables present in the dataset

Step2: The 1st split will be according to the variable with minimum gini index

Step3: Now, for every node calculate the gini index again for all variables as well as independent leaf

Step4: If the gini score for the node itself is the lowest score, we don't need to split it further and it can become a leaf node else pick the variable with minimum/lowest impurity value

Step5: The process continues till we reach the ending criteria either min samples per leaf or max depth etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!