Parallelization Backends and Tools

Session Overview

  • Options
  • Brief Background
  • Exercises
    • Low Level to High Level
    • Work through together

Our Tools

Options Python
Single Machine MultiProcessing/Polars/Dask
Cluster Dask/DaskHub
Lambda Function/ Containers

Single Machine | Python

Multiprocessing/Concurrent Futures

  • Lower Level Python Construct for Parallelization

Dask Delayed

  • Lower level construct for Dask Programming

Dask

  • Supports base Python, Numpy, Xarray, SciKit Learn, Pandas syntax
  • Configurable backend (automated and manual)
  • Supports primarily lazy computation.
  • Supports Synchronous and asynchronous calls
  • Essentially same syntax for cluster, just different setup
  • Runs a Dask Scheduler