Machine Learning Metrics

Session Overview

What they mean and how to interpret/implement them

  • Why Metrics?
  • Bias-Variance Tradeoff
  • Classification Metrics
  • Regression Metrics

Why Metrics

Start with the beginning in mind…


Objective Measures:

  • How do we know we are successful?
  • How do we communicate our success?
  • How do we interpret our success?

Model Prediction Error

We want our models to be:

  • Generalizable (work with previous AND new data)
  • Low error

Bias - Variance Tradeoff

Bias: Difference between model prediction and correct value.

Variance: Variability of a prediction at a given point.

Bias - Variance Tradeoff | Example

Research Objective: Measure soil temperature using remote sensing observations.

Method: Use 10% of cloudless satellite photos, calculate the mean ground temperature from a single wavelength band.


Our Error

Bias-Variance Error

Finding the Sweet Spot

Study Design:

Bias:

  • Remove Systemic Bias in your sampling

Variance:

  • Sample Size

Tradeoff?

Algorithms:

  • Less complex models (Linear, Parametric) tend to have higher bias, but lower variance

  • More complex models (Trees, Deep Learning) tend to have lower bias, but higher variance


Training:

Metrics


Finding the sweet spot requires metrics!

Session Overview

What they mean and how to interpret/implement them

  • Why Metrics?
  • Bias-Variance Tradeoff
  • Classification Metrics
  • Regression Metrics

Classification and Regression

Supervised Learning!

Classification:

  • Performance is measured by how many labels it gets right (or wrong).
  • Performance metrics: Accuracy, Precision, Recall

Regression:

  • Performance is measured by how close it comes to the correct value
  • Performance metrics: Mean Absolute Error, Root-Mean-Square-Error

Regression

Root Mean Square Error (RMSE):

  • Procedure -> Square all the error, Average it, then square root it.
  • Weights larger errors more

  • Desirable when need to penalize large errors more (ie the relative weight of large errors is more than small errors)

  • RMSE increases with the variance of the frequency distribution of error magnitudes.

  • RMSE tends to increase as sample size increases (bad for comparing across different sample sizes)

  • Less intuitively explainable to stakeholders

Mean Absolute Error (MAE):

  • Procedure -> Take the absolute value of every error, then average it.
  • All errors are weighted equally.

  • Does not increase with the variance of the frequency distribution of error magnitudes.

  • Not sensitive to sample size.

  • Intuitively explainable to stakeholders

Takeaways:

  • With Regression you are measuring error, not accuracy!

  • Low error is good, high error is bad!

  • Continuous metric - know your relative units!

  • Other metrics: \(R^2\), MAPE, Adj \(R^2\), MSE, etc.

Classification

Confusion Matrix:

Accuracy:

  • How many did the model get right divided by total predictions.
  • Overall, how well did the model do at making correct predictions
  • What happens with imbalanced data?

Balanced Accuracy:

  • Average of True Positive Rate (predictions/samples) and True Negative Rate

{\displaystyle {\text{Balanced accuracy}}={\frac {TPR+TNR}{2}}\,}

Precision:

  • What proportion of positive identifications was actually correct?

  • The number of true positives divided by the total number the model thought were positive

\(Precision = TP / (TP + FP)\)

Recall:

  • AKA: Sensitivity

  • What proportion of actual positives was identified correctly?

  • The number of true positives divided by number of true positives and number of false negatives.

\(Recall = TP/(TP + FN)\)

Specificity:

  • What proportion of actual negatives was identified correctly?

  • The number of true negatives divided by number of true negatives and number of false positives

\(Specificity = TN/(TN + FP)\)

Classification: Curves

  • Classification algorithms don’t just provide a class.

  • They generate a probability of pertaining to a class then use a threshold algorithm to assign classes.

Receiver Operator Characteristic (ROC) Curve

  • Plots performance of classification model at different thresholds
  • Usually True Positive Rate and False Positive Rate
  • Area Under the Curve (AUC) is used as a metric

ROC Curve showing TP Rate vs. FP Rate at different classification thresholds. FP Rate TP Rate 1 0 0 1 TP vs. FP rate at one decision threshold TP vs. FP rate at another decision threshold

AUC (Area under the ROC Curve). image/svg+xml FP Rate TP Rate

Classification Takeaways

  • With Classification you are measuring accuracy (or similar metrics), not error!

  • High Accuracy is good, low accuracy is bad

  • Continuous metric, but between 0-1 (or 0 and 100%)

  • Use confusion matrices to understand your situation

  • Other metrics: … many others including variations on Curves, and F1 Scores which combine precision and recall

But…

Log Loss:

  • Classification algorithms don’t just provide a class.

  • They generate a probability of pertaining to a class then use a threshold algorithm to assign classes.

  • What if we used a metric that measured error of the probabilities?

Log Loss measures how close the predicted class probability is to the correct value. The farther away the probability is, the higher the log loss value.

Session Overview

What they mean and how to interpret/implement them

  • Why Metrics?
  • Bias-Variance Tradeoff
  • Classification Metrics
  • Regression Metrics

Earth System Data Science in the Cloud

Machine Learning Metrics

  1. Slides

  2. Tools

  3. Close
  • Machine Learning Metrics
  • Session Overview
  • Why Metrics
  • Model Prediction Error
  • Bias - Variance Tradeoff
  • Bias - Variance Tradeoff | Example
  • Our Error
  • Bias-Variance Error
  • Finding the Sweet Spot
  • Tradeoff?
  • Metrics
  • Session Overview
  • Classification and Regression
  • Regression
  • Classification
  • Classification: Curves
  • Classification Takeaways
  • But…
  • Session Overview
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help