In performant computing, seconds matter…
The corollary: To solve big data problems, make them small data problems.
“I don’t care about performance, I just want to get the job done…”
“If you don’t care about performance, you won’t get your job done.”
Simultaneous use of processing (compute) on a computational problem.
Many Hands make Light Work
One of these will be a limiting factor on your performance.
Can adjust each of Compute, Memory, Storage, Networking almost independently.
Right-size resources for work.
Atomic Unit of Parallelization:
The INNER loop of your nested for loops.
Average daily precipitation?
7 day rolling average of temperature?
County mortality index?
… the challenge: programming in parallel.
Common mental model and framework for big data tasks.
All modern digital technologies you use today are built on the map reduce paradigm.
MapReduce was originally Java in Hadoop
… but now it is everywhere.
… including in Python.
Note: For each of these options, you can scale Compute, Memory, Storage, and Networking more or less independently.
… And we will be exploring both of these in the next module…
Earth System Data Science in the Cloud