I/O in Python

Session Overview

  • Lazy-Loading Overview
  • Setting up Environment
  • Lazy Loading in Python

Overview: Idealized Pattern

Architecture:

  • Configure parallel backend
  • Lazy-Load
  • Filter/Subset
  • Adjust
    • Rename/Normalize
  • Groupby/Chunk -> Map
  • Aggregate/Summarize -> Reduce

Execution:

  • Load data in parallel as needed for computation
  • Perform subsetting/processing
  • Run MapReduce
  • Return results (either in memory or in files)

The Architecture Step creates a computational graph, usually a Directed Acyclic Graph (DAG). This is essentially a map or a plan of action.

Session Overview

  • Lazy-Loading Overview
  • Setting up Environment
  • Lazy Loading in Python

Setting up Environment

  1. Open up Cloud9 from earlier today.

  2. Clone IO-Python Practice Repo.

  3. Navigate into the repo, and run the run-jupyter.sh script.

Lazy Loading in Python

  1. Tabular Data

  2. Array Data