Production Machine Learning

Production ML


Delivering your models so they can be used.

Session Overview

  • Packaging
  • Serving
  • (Scaling)
  • Monitoring

Packaging your Models

You have an awesome model, now what?

Packaging your Models

You have an awesome model, now what?


Serialization

Serialization

  • Converts an object to series of bytes.
  • These bytes can be stored (on disk)
  • Uses standards so that other services can deserialize or convert the bytes back into objects.
  • For ML, this means taking an in-memory Model and saving it for reuse in a standard format.

What gets Serialized?

  • Architecture
  • Weights
  • Hyperparameters
  • Metadata/Dependency Information

Serialization Formats

Pickle

  • NOT SECURE
  • Any python object, including models
  • Highly dependent on versions and dependencies
  • Better to use joblib JobLib
  • .pkl, .joblib, .pickle

ONNX

PMML

  • Predictive Model Markup Language
  • XML Based
  • Enough Said

Other Serialization

  • PyTorch & Tensorflow Serialization
    • Optimized for those libraries
    • Generally well designed
    • Widely supported in those frameworks
    • Also consider ONNX and/or Safetensors
  • Safetensor
    • Fast, secure serializer for tensor based models
    • Rust Based
    • Lazy loading, zero copy (close)
    • Supports numpy, PyTorch (really well), and Tensorflow

Session Overview

  • Packaging
  • Serving
  • (Scaling)
  • Monitoring

Serving


Your model package is a function, it takes a set of inputs and returns an output.

Serving


How do we serve functions?

Inference Engines


Inference: Making predictions on new data


  • Serve models
  • Typically use REST API endpoints
  • Are essentially Functions as a Service*


* Many inference engines are not serverless

Inference Engines

Running Inference

Scaling Considerations

  • Desired Workload
  • Batch versus single observation
  • Level of parallelization

Session Overview

  • Packaging
  • Serving
  • (Scaling)
  • Monitoring

Monitoring


Why might we want to monitor our model?

Drift

  • Data Drift
  • Concept Drift
  • Technical Drift

Drift

  • Data Drift
    • Changes in input/feature distributions over time
    • Causes: time based trends, changes in data collection/processing, changes in calibration
  • Concept Drift
    • Relationship between inputs and outputs changes
    • Causes: Definitions change, interpretations change, or physical relationship changes
  • Technical Drift
    • Computational/Technical Dependencies Change
    • Causes: Software updates, package deprecation, or dependency unavailability

Detecting Drift


> Only way to detect drift is through monitoring using metrics.

Production ML


Let’s put a model into production!