Earth System Data Science in the Cloud

Session Overview

  • Welcome
  • Interim Check-In
  • Data Driven Science
  • What is Data Science?
  • Where We Are/Context
  • Course Goals and Objectives
  • Module Goals and Objectives
  • Course Logistics

Welcome To Module 3!

Interim Check-In


How did you apply Module 2 to your work?


What did you explore?

Where are you in your team environment?

This Module

Data Product Development

Data Driven Science

Data Science

Course Goals & Objectives

  1. Make the Impossible Possible
  2. 10x Performance

Course Goals & Objectives

  • Conversant and practiced in developing cloud based earth system data science workflows.

  • Comfortable developing data science products.

  • Comfortable and practiced in working effectively on interdisciplinary teams.

  • Able to rapidly pick up new skills, tools, and techniques.

Principles & Practices

Module Goals & Objectives

By the end of this module, you will be familiar with and conversant in the following areas:

  • Principles of data cleaning and transformation.
  • Applications of parallel computing in a cloud.
  • Differences between statistical analysis and machine learning.
  • Applications of statistical analysis and machine learning to Earth Systems Data in the Cloud.

Module Goals & Objectives

Specifically by the end of the course, you will have accomplished the following:

  • Cleaned data using parallel processing in a distributed cloud environment.
  • Prepared data using feature engineering for statistical analysis and machine learning.
  • Scale analysis from small dev environments to larger testing environments.
  • Built reproducible data pipelines to automatically create analysis-ready datasets.
  • Applied statistical models in parallel over large earth-system datasets.
  • Applied machine learning models at scale to large earth-system datasets.

Module Outline

Days:

  1. AI/ML Intro
  2. Feature Engineering, Metrics & ML
  3. Clusters
  4. ML Algorithms & Approaches
  5. Team Presentations and Next Steps

Team Project Outline

Days:

  1. Data Processing
  2. Feature Engineering
  3. ML Applications
    • Analysis Ready Data
    • First Run ML
  4. Presentation Practice
  5. Presentations

Team Project Deliverables

Presentation

  • 10 minutes (no more, can be less)
  • Team Name
  • Research Question (START)
  • Data Processing Methods
    • What and Why
  • ML Approach
  • Next Steps

Session Overview

  • Welcome
  • What is Data Science?
  • What is Earth System Data Science in the Cloud?
  • Course Goals and Objectives
  • Module Goals and Objectives
  • Course Logistics

Course Logistics

Strategies for Success

This is a lot of information.

  • You would not be here if you could not handle it.
  • Be present.
  • You will not understand everything the first time. That is OK!
  • Keep a Journal of topics to return to and explore more
  • You will see each topic/idea at least 3 times on separate days
  • Ask questions
  • Invest the time now…

Final Note

We made this course for you! We want your feedback!

Please reach out anytime on Slack or at dwillett@cicsnc.org & ggraham@cicsnc.org.

Team Pre-Project Assessment

ML Assessment