ML Algorithms and Approaches

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Supervised

  • General Approach to these algorithms
  • How they work
  • Focus on when and how to apply

Linear Models

  • Many Flavors
  • Can be used for both Classification and Regression
  • Relatively low complexity model
  • Can be made more complex by adding parameters
  • … and nonlinear parameters (Splining, etc.)

Flavors of Linear Models

OLS Linear Regression

  • \(y = mx + b\)
  • Uses OLS as MLE
  • No hyperparameters
  • Baseline
  • Can add parameters

Generalized Linear Model (GLM)

  • Expansion of basic linear regression
  • Ties responses to outcomes using a link function
  • Common uses: Logistic Regression, Poisson Regression
  • Many, many sub-flavors including Generalized Additive Models

Regularized Regression

  • Imposes penalties on sizes and numbers of coefficients
  • Options: Ridge, Lasso or Both (ElasticNet)
  • Added complexity with adjustments for overfitting
  • Really great option for baseline regression…
  • glmnet in R, ElasticNet() in SkLearn

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Support Vector Machines

  • Technically both Classification and Regression
  • But used mainly for Classification
  • Finds hyperplanes that bisect data (Classification) or adapt to error constraints (regression)
  • By its nature -> binary classification but extended to multiclass
  • Few hyperparameters (penalty, kernel, maybe gamma)
  • Extremely intuitive and communicable
  • Fast, Easy, Beautiful

Support Vector Machines

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Nearest Neighbor

  • Welcome to the neighborhood!
  • Both Classification and Regression
  • Based on idea that similar points will cluster together in multi-dimensional space -> neighbors can predict outcomes.
  • Important hyperparameter: number of neighbors
  • Important hyperparameter: weight/consensus function

Nearest Neighbor

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Naive Bayes

  • Classification
  • Low Complexity
  • Many flavors
  • Assumes a generative model for each class and starts with minimal ‘naive’ assumptions about this model
  • Uses this approach to estimate probability of class given observed features
  • Gaussian, Multinomial, etc.

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Trees

  • Classification and Regression
  • Can handle staggering amounts of complexity
  • Really good with non-linear dynamics
  • Some of the highest performing algorithms for tabular data
  • Many, many flavors
  • Bagging and Boosting

Decision Trees

  • Classification and Regression Trees (CART)
  • Simple trees constructed using binary splitting
  • Most important, largest split towards top of tree

Forests of Trees

Ensembling

  • Combining multiple models
  • Improve prediction ability through combining different types of learning (different models)
  • Can occur at multiple levels

Bagging

  • Bootstrap Aggregating
  • Bootstrapping -> Sampling with replacement to estimate population distribution

  • Random Forests -> Bagged Trees with random feature selection

Large number of uncorrelated models together tend to outperform individual models

Boosting

  • Sequential application of trees to residuals of previous models - combination of models built on residuals yields high performing models.
  • Idea -> build upon the error and learn from each iteration
  • Many flavors: AdaBoost, Gradient Boosting, XGBoost, LGBM, CatBoost
  • Top performing non deep learning algorithms

Considerations with Tree Based Models

  • Easy to overfit
  • Extremely good at handling non-linear complexity
  • Top performing non-deep learning algorithms
  • Almost always necessitate hyperparameter tuning if not optimization
  • Can be explainable, but not intuitive to stakeholders

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Unsupervised Clustering

K Means

  • Group similar points together based on common center
  • Assign centers then iterate to find ‘best’ centers

K-Means Graphic

Heirarchical Clustering

  • Uses distance metrics to connect similar groups
  • Top Down and Bottom up
  • Many different Flavors

Session Overview

  • Supervised
    • Linear Models
    • Support Vector Machines
    • Nearest Neighbor
    • Naive Bayes
    • Trees - Lots of Trees!
  • Unsupervised
    • Clustering
    • Anomaly Detection

Anomaly Detection

What is different from the others?

Isolation Forest

  • Generates Anomaly Score from position of points (length of path) in a tree.