Managing Multiple Models

Session Overview

  • Multiple Models
    • Management
    • Assessment
  • Ensembling
  • AutoML

Jumping In!…

Use the Module 4 Repo

  • Open Cloud9
  • Clone the Module 4 Repo
  • Navigate to 01-Multiple-Models
  • Run run-jupyter.sh

Python

Why Manage Multiple Models?

  • Diversity of predictions: Different models capture different patterns in data.
  • Robustness: A combination of models can be more robust to overfitting. (Or you can REALLY overfit your data…)
  • Performance: Sometimes individual models are weak, but together they are strong.

Model Management

Let the computer do it for you!…


  • Use Sci-Kit Learn Pipelines and swap out models.

Model Assessment

What should we look for?

Model Assessment

What should we look for?


  • In-Sample (CV) and Out of Sample Performance
  • Where does the model perform well (and where does it fail)?

Session Overview

  • Multiple Models
    • Management
    • Assessment
  • Ensembling
  • AutoML

Model Ensembling

Definition: Combining predictions from multiple models.


Methods:

  • Bagging
  • Boosting
  • Stacking

Choosing the Right Method

Bagging: (Bootstrap Aggregating)

  • Multiple subsets of data are created with replacement, each training a model. Predictions are averaged (regression) or majority vote (classification).
  • Reduces variance, helps avoid overfitting.


Boosting:

  • Sequentially train models, each focusing on the errors of the previous ones. Combine through weighted majority voting.
  • Reduce bias, improve predictive flexibility.


Stacking:

  • Exploit strengths of each (different) model.
  • First-layer models make predictions (as input for second-layer models) which are then used by a final model to make the ultimate prediction.

Choosing the Right Method


  • Bagging is best when you have strong learners prone to overfitting.
  • Boosting is preferred if you have weak learners and are dealing with bias.
  • Stacking should be considered when diversity among the used models is expected to improve prediction accuracy.

Session Overview

  • Multiple Models
    • Management
    • Assessment
  • Ensembling
  • AutoML

AutoML


  • Simplifies model selection and hyperparameter tuning.
  • Automatically handles preprocessing (and feature engineering)
  • Tries many models
  • Optimizes model ensembles