Production Machine Learning

Production ML

Delivering your models so they can be used.

Session Overview

Packaging
Serving
(Scaling)
Monitoring

Packaging your Models

You have an awesome model, now what?

Packaging your Models

You have an awesome model, now what?

Serialization

Serialization

Converts an object to series of bytes.
These bytes can be stored (on disk)
Uses standards so that other services can deserialize or convert the bytes back into objects.
For ML, this means taking an in-memory Model and saving it for reuse in a standard format.

What gets Serialized?

Architecture
Weights
Hyperparameters
Metadata/Dependency Information

Serialization Formats

Pickle
ONNX
PMML
PyTorch & Tensorflow Serialization
Safetensors

Pickle

NOT SECURE
Any python object, including models
Highly dependent on versions and dependencies
Better to use joblib JobLib
.pkl, .joblib, .pickle

ONNX

Open Neural Network Exchange
Graph Based
Binary
Can be relatively performant

PMML

Predictive Model Markup Language
XML Based
Enough Said

Other Serialization

PyTorch & Tensorflow Serialization
- Optimized for those libraries
- Generally well designed
- Widely supported in those frameworks
- Also consider ONNX and/or Safetensors
Safetensor
- Fast, secure serializer for tensor based models
- Rust Based
- Lazy loading, zero copy (close)
- Supports numpy, PyTorch (really well), and Tensorflow

Session Overview

Packaging
Serving
(Scaling)
Monitoring

Serving

Your model package is a function, it takes a set of inputs and returns an output.

Serving

How do we serve functions?

Inference Engines

Inference: Making predictions on new data

Serve models
Typically use REST API endpoints
Are essentially Functions as a Service*

* Many inference engines are not serverless

Inference Engines

Running Inference

Scaling Considerations

Desired Workload
Batch versus single observation
Level of parallelization

Session Overview

Packaging
Serving
(Scaling)
Monitoring

Monitoring

Why might we want to monitor our model?

Drift

Data Drift
Concept Drift
Technical Drift

Drift

Data Drift
- Changes in input/feature distributions over time
- Causes: time based trends, changes in data collection/processing, changes in calibration
Concept Drift
- Relationship between inputs and outputs changes
- Causes: Definitions change, interpretations change, or physical relationship changes
Technical Drift
- Computational/Technical Dependencies Change
- Causes: Software updates, package deprecation, or dependency unavailability

Detecting Drift

> Only way to detect drift is through monitoring using metrics.

Production ML

Let’s put a model into production!