Ensemble Method: Bagging

Ensemble Method: Bagging

Overview

Bagging = 同一个模型,用不同数据训练,再平均结果
典型代表
  • BaggingClassifier
  • Random Forest ⭐⭐⭐⭐⭐
The big concept of bagging is similar to voting classifier, but instead of using a different set of classifiers, we are using the same classifier trained on a different subsets of our training data.
Now the question is: how does one classifier get a different subset of our training data?
This is normally can be achieved via random sampling, i.e we take a specific number of data points from a training instance randomly as samples, and then we use those samples to train our classifier model.
notion image

Bagging 是怎么做的?

Step 1:Bootstrap Sampling

  • 从原始数据集中
  • 有放回抽样
  • 得到多个不同训练集
📌 每个子集:
  • 大小相同
  • 但样本组成不同

Step 2:训练多个 同类型模型

  • 通常是 Decision Tree
  • 每个 tree 用一个 bootstrap dataset

Step 3:Aggregation

  • Classification:Majority Vote
  • Regression:Average

Bagging 主要解决什么问题?

🎯 核心目标:降低 variance

  • 特别适合:
    • Decision Tree(高 variance 模型)
  • 防止:
    • 深树 overfitting

Random Forest 是 Bagging 的升级版(重点)

Random Forest = Bagging + Feature Subsampling
除了对数据做 bootstrap:
  • 每次 split 只看 部分 features
  • 进一步降低模型相关性
👉 variance 降得更狠

优缺点总结

✅ 优点

  • 对 overfitting 非常有效
  • 对 noisy data 稳定
  • 不太依赖调参

❌ 缺点

  • Bias 不会明显下降
  • 可解释性差