Team Project Play

Defining A Project

Ingredients for a successful project:

  • Research Question!
  • Data
  • Analysis, AI/ML, etc.
  • Products

Thinking about Products…

Exploratory Data Analysis

(EDA)

  • Exploratory Data Analysis
  • Quantity
  • Quality

EDA

The first step in any project.

  • Initial investigation into the data

  • Check assumptions

  • Spot patterns

  • Find anomalies

  • Identify problems before investing time

EDA Questions

Answer:

  • Will these data help us answer my research question?

Are there gaps? Are there assumptions in the data that prevent me from answering my question? Do I need more data?

  • What challenges might we encounter using these data?

  • What are our data boundary conditions?

If, for example, we wanted to use these data for ML, where might we have to be careful applying/extending these models? (If we want to predict something in Alaska, but all our data are from the continental US, that could be a problem.)

EDA Principles

Assess:

  • Quantity

  • Quality

Quantity

  • Volume (How Much?)

  • Velocity (How fast/frequently?)

  • Variety (Formats, data types?)

Quality

  • Veracity (Do we trust these data?)

  • Complete (Missing data?)

  • Corrupt (Files ok?)

  • Data Patterns

    • Distributions
    • Outliers
    • Drift

Missing Data Tools

Python: missingno

Session Overview

  • Defining a project
  • Introduction to EDA
  • Choosing a language
  • Finding Data
  • Accessing Data
  • Patterns of Analysis
  • Introduction to Data Formats

Finding Data

NOAA Open Data Dissemination

Other Sources

  • USGS, DOC, Census, USDA, NASA, data.gov, etc.

Accessing Data

What to look for in your data?

  1. Has the information you need.
  • Measures what you want.
  • Is complete.
  • Is spatiotemporally appropriate.

Accessing Data

What to look for in your data?

  1. Is accessible.
  • Cloud Object Storage
  • Format.
  • No throttling.