ML Algorithms and Approaches

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Supervised

General Approach to these algorithms
How they work
Focus on when and how to apply

Linear Models

Many Flavors
Can be used for both Classification and Regression
Relatively low complexity model
Can be made more complex by adding parameters
… and nonlinear parameters (Splining, etc.)

Flavors of Linear Models

OLS Linear Regression

\(y = mx + b\)
Uses OLS as MLE
No hyperparameters
Baseline
Can add parameters

Generalized Linear Model (GLM)

Expansion of basic linear regression
Ties responses to outcomes using a link function
Common uses: Logistic Regression, Poisson Regression
Many, many sub-flavors including Generalized Additive Models

Regularized Regression

Imposes penalties on sizes and numbers of coefficients
Options: Ridge, Lasso or Both (ElasticNet)
Added complexity with adjustments for overfitting
Really great option for baseline regression…
glmnet in R, ElasticNet() in SkLearn

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Support Vector Machines

Technically both Classification and Regression
But used mainly for Classification
Finds hyperplanes that bisect data (Classification) or adapt to error constraints (regression)
By its nature -> binary classification but extended to multiclass
Few hyperparameters (penalty, kernel, maybe gamma)
Extremely intuitive and communicable
Fast, Easy, Beautiful

Support Vector Machines

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Nearest Neighbor

Welcome to the neighborhood!
Both Classification and Regression
Based on idea that similar points will cluster together in multi-dimensional space -> neighbors can predict outcomes.
Important hyperparameter: number of neighbors
Important hyperparameter: weight/consensus function

Nearest Neighbor

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Naive Bayes

Classification
Low Complexity
Many flavors
Assumes a generative model for each class and starts with minimal ‘naive’ assumptions about this model
Uses this approach to estimate probability of class given observed features
Gaussian, Multinomial, etc.

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Trees

Classification and Regression
Can handle staggering amounts of complexity
Really good with non-linear dynamics
Some of the highest performing algorithms for tabular data
Many, many flavors
Bagging and Boosting

Decision Trees

Classification and Regression Trees (CART)
Simple trees constructed using binary splitting
Most important, largest split towards top of tree

Forests of Trees

Ensembling

Combining multiple models
Improve prediction ability through combining different types of learning (different models)
Can occur at multiple levels

Bagging

Bootstrap Aggregating
Bootstrapping -> Sampling with replacement to estimate population distribution

Random Forests -> Bagged Trees with random feature selection

Large number of uncorrelated models together tend to outperform individual models

Boosting

Sequential application of trees to residuals of previous models - combination of models built on residuals yields high performing models.
Idea -> build upon the error and learn from each iteration
Many flavors: AdaBoost, Gradient Boosting, XGBoost, LGBM, CatBoost
Top performing non deep learning algorithms

Considerations with Tree Based Models

Easy to overfit
Extremely good at handling non-linear complexity
Top performing non-deep learning algorithms
Almost always necessitate hyperparameter tuning if not optimization
Can be explainable, but not intuitive to stakeholders

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Unsupervised Clustering

K Means

Group similar points together based on common center
Assign centers then iterate to find ‘best’ centers

K-Means Graphic

Heirarchical Clustering

Uses distance metrics to connect similar groups
Top Down and Bottom up
Many different Flavors

Session Overview

Supervised
- Linear Models
- Support Vector Machines
- Nearest Neighbor
- Naive Bayes
- Trees - Lots of Trees!
Unsupervised
- Clustering
- Anomaly Detection

Anomaly Detection

What is different from the others?

Isolation Forest

Generates Anomaly Score from position of points (length of path) in a tree.