Ingredients for a successful project:
Thinking about Products…
(EDA)
The first step in any project.
Initial investigation into the data
Check assumptions
Spot patterns
Find anomalies
Identify problems before investing time
Answer:
Are there gaps? Are there assumptions in the data that prevent me from answering my question? Do I need more data?
What challenges might we encounter using these data?
What are our data boundary conditions?
If, for example, we wanted to use these data for ML, where might we have to be careful applying/extending these models? (If we want to predict something in Alaska, but all our data are from the continental US, that could be a problem.)
Assess:
Quantity
Quality
Volume (How Much?)
Velocity (How fast/frequently?)
Variety (Formats, data types?)
Veracity (Do we trust these data?)
Complete (Missing data?)
Corrupt (Files ok?)
Data Patterns
What to look for in your data?
What to look for in your data?
Earth System Data Science in the Cloud