Welcome Back
Yesterday:
- Coder
- Intro to AWS
- Team Git Repo
Welcome Back
Today:
- I/O
- Containers
- Parallel Computing Intro
- Programmatic Cloud Access
- Team Projects
Session Overview
- Input/Output
- Data Formats
- Parquet
- Zarr/Icechunk
- Visualization Formats
Session Overview
- Input/Output
- Data Formats
- Parquet
- Zarr/Icechunk
- Visualization Formats
Parquet
- Apache Parquet
- Apache Iceberg
- Columnar Data Store
- Tabular Data
- The Better CSV
- Orders of Magnitude Improvements in Size and Speed
- Lazy loading in R (Vroom), Python (Polars, DuckDB, Dask), and JavaScript
- Stable, well supported, community
- Underlying foundation for many AWS Serverless Offerings
Data Conversion
- Do it yourself: Read it in as one format, export to another:
- AWS Lambda is very good at this.
- Use conversion tools:
Parquet and Zarr
- Open up the
intro-to-io/intro-io.ipynb
notebook.
- Play