What they mean and how to interpret/implement them
Start with the beginning in mind…
Objective Measures:
We want our models to be:
Bias: Difference between model prediction and correct value.
Variance: Variability of a prediction at a given point.
Research Objective: Measure soil temperature using remote sensing observations.
Method: Use 10% of cloudless satellite photos, calculate the mean ground temperature from a single wavelength band.
Study Design:
Bias:
Variance:
Algorithms:
Less complex models (Linear, Parametric) tend to have higher bias, but lower variance
More complex models (Trees, Deep Learning) tend to have lower bias, but higher variance
Training:
Finding the sweet spot requires metrics!
What they mean and how to interpret/implement them
Supervised Learning!
Classification:
Regression:
Root Mean Square Error (RMSE):
Weights larger errors more
Desirable when need to penalize large errors more (ie the relative weight of large errors is more than small errors)
RMSE increases with the variance of the frequency distribution of error magnitudes.
RMSE tends to increase as sample size increases (bad for comparing across different sample sizes)
Less intuitively explainable to stakeholders
Mean Absolute Error (MAE):
All errors are weighted equally.
Does not increase with the variance of the frequency distribution of error magnitudes.
Not sensitive to sample size.
Intuitively explainable to stakeholders
Takeaways:
With Regression you are measuring error, not accuracy!
Low error is good, high error is bad!
Continuous metric - know your relative units!
Other metrics: \(R^2\), MAPE, Adj \(R^2\), MSE, etc.
Accuracy:
Balanced Accuracy:
Precision:
What proportion of positive identifications was actually correct?
The number of true positives divided by the total number the model thought were positive
Recall:
AKA: Sensitivity
What proportion of actual positives was identified correctly?
The number of true positives divided by number of true positives and number of false negatives.
Specificity:
What proportion of actual negatives was identified correctly?
The number of true negatives divided by number of true negatives and number of false positives
Classification algorithms don’t just provide a class.
They generate a probability of pertaining to a class then use a threshold algorithm to assign classes.
Receiver Operator Characteristic (ROC) Curve
With Classification you are measuring accuracy (or similar metrics), not error!
High Accuracy is good, low accuracy is bad
Continuous metric, but between 0-1 (or 0 and 100%)
Use confusion matrices to understand your situation
Other metrics: … many others including variations on Curves, and F1 Scores which combine precision and recall
Log Loss:
Classification algorithms don’t just provide a class.
They generate a probability of pertaining to a class then use a threshold algorithm to assign classes.
What if we used a metric that measured error of the probabilities?
Log Loss measures how close the predicted class probability is to the correct value. The farther away the probability is, the higher the log loss value.
What they mean and how to interpret/implement them
Earth System Data Science in the Cloud