Tree-Based Model

Introduction

Tree-based machine learning methods are among the most commonly used supervised learning methods. They are constructed by two entities; branches and nodes. Tree-based ML methods are built by recursively splitting a training sample, using different features from a dataset at each node that splits the data most effectively. The splitting is based on learning simple decision rules inferred from the training data.

Common Terminology

i) Root node — this represents the entire population or the sample, which gets divided into two or more homogenous subsets.

ii) Splitting — subdividing a node into two or more sub-nodes.

iii) Decision node — this is when a sub-node is divided into further sub-nodes.

iv) Leaf/Terminal node — this is the final/last node that we consider for our model output. It cannot be split further.

v) Pruning — removing unnecessary sub-nodes of a decision node to combat overfitting.

vi) Branch/Sub-tree — the sub-section of the entire tree.

vii) Parent and Child node — a node that’s subdivided into a sub-node is a parent, while the sub-node is the child node.

Types of tree based model

Tree-based machine learning models are a category of algorithms that make decisions by recursively partitioning the input space into regions. Some common types of tree-based models include:

Decision Trees:

Overview: Decision trees are a fundamental type of tree-based model that makes decisions based on a series of if-else conditions. Each internal node represents a decision based on a feature, and each leaf node represents the output.

Applications: Decision trees are versatile and can be used for both classification and regression tasks.

Random Forest:

Overview: Random Forest is an ensemble learning method that constructs a multitude of decision trees during training and outputs the average prediction (for regression tasks) or the majority vote (for classification tasks) of the individual trees.

Applications: Random Forest is effective in reducing overfitting and improving accuracy.

Gradient Boosting Machines (GBM):

Overview: GBM is another ensemble method that builds trees sequentially, with each tree compensating for the errors of the previous ones. It combines weak learners to create a strong predictive model.

Applications: GBM is widely used for both regression and classification tasks and is known for its high predictive power.

XGBoost (Extreme Gradient Boosting):

Overview: XGBoost is an optimized and efficient implementation of gradient boosting. It incorporates regularization techniques, parallel processing, and tree pruning to enhance performance.

Applications: XGBoost is commonly used in various machine learning competitions and real-world applications due to its speed and accuracy.

LightGBM:

Overview: LightGBM is a gradient boosting framework that uses a tree-based learning algorithm. It is designed for distributed and efficient training and can handle large datasets.

Applications: LightGBM is suitable for large-scale machine learning tasks and is particularly efficient in scenarios with high dimensionality.

Decision tree+Random forest

Ensemble Method