Tree Based Algorithms

 

Tree based algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based algorithms empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand . 

 

Methods like decision trees, random forest, gradient boosting are being popularly used in all kinds of data science problems. Hence, for every analyst , it’s important to learn these algorithms and use them for modeling.

Tree-based classification models are a type of supervised machine learning algorithm that uses a series of conditional statements to partition training data into subsets. Each successive split adds some complexity to the model, which can be used to make predictions. The end result model can be visualized as a roadmap of logical tests that describes the data set.


 The tree-based model can be drawn like below. Starting from the top node, it divides into 2 branches at every depth level. 

The last end branches where they do not split anymore are the decisions, usually called the leaves. In every depth, there are conditions questioning the feature values. 

The binary answer will decide which branch we are going to next. This process continues until we reach one of the leaves, where it does not split anymore. We can get the prediction from that final leaf.

How Decision Tree is developed ?

Here is an illustration of how the Decision Tree algorithm works in segmenting a set of data points into 2 classes: “sold out” and “not sold out”. 

First, the algorithm will divide the data into two parts using a horizontal or vertical line. In this case, the first step is done by splitting the x-axis using a vertical line separating the price above and below $600. Next, the algorithm splits the y-axis into the left and right sides. We can see that for the price above $600, the products will be sold if the quality is above 60 and not sold if it is below 60. If the price is below $600, the algorithm needs further segmentation. 

Can you continue the process by observing the following figure? This illustration only accounts for 1 feature and 1 target variable. It will get more complicated with multiple features.

 

 

 


 


Enregistrer un commentaire

Plus récente Plus ancienne