Production Machine Learning
Production ML
Delivering your models so they can be used.
Packaging your Models
You have an awesome model, now what?
Packaging your Models
You have an awesome model, now what?
Serialization
Serialization
- Converts an object to series of bytes.
- These bytes can be stored (on disk)
- Uses standards so that other services can deserialize or convert the bytes back into objects.
- For ML, this means taking an in-memory Model and saving it for reuse in a standard format.
What gets Serialized?
- Architecture
- Weights
- Hyperparameters
- Metadata/Dependency Information
Pickle
- NOT SECURE
- Any python object, including models
- Highly dependent on versions and dependencies
- Better to use
joblib
JobLib
.pkl
, .joblib
, .pickle
PMML
- Predictive Model Markup Language
- XML Based
- Enough Said
Serving
Your model package is a function, it takes a set of inputs and returns an output.
Serving
How do we serve functions?
Inference Engines
Inference: Making predictions on new data
- Serve models
- Typically use REST API endpoints
- Are essentially Functions as a Service*
* Many inference engines are not serverless
Scaling Considerations
- Desired Workload
- Batch versus single observation
- Level of parallelization
Monitoring
Why might we want to monitor our model?
Drift
- Data Drift
- Concept Drift
- Technical Drift
Detecting Drift
> Only way to detect drift is through monitoring using metrics.
Production ML
Let’s put a model into production!