Fundamentals of Computing

Session Overview

  • Computers
  • Programming Languages
  • Python

Computers

  • What is a Computer?
    • Storage
    • Memory (RAM)
    • Compute
    • Networking

What is a Computer?

  • Storage
  • Memory (RAM)
  • Compute
  • Networking

Your Computer

Analysis: Weekly minimum and maximim temperatures for US weather stations for the past year.

  • Storage: 1TB SSD
  • Memory: 16GB RAM
  • Compute: 8 CPUs (No GPUs)
  • Networking: Not really a factor

HPC

Analysis: Weekly minimum and maximim temperatures for global weather stations for all time.

  • Storage: 100s TBs
  • Memory: 256GB RAM
  • Compute: 256 CPUs (No GPUs)
  • Networking: 10-100 Gig/local storage

Cloud

On the cloud, Storage, Memory, Compute, and Networking are independently configurable.

Analysis: Weekly minimum and maximim temperatures for US weather stations for the past year.

Analysis: Weekly minimum and maximim temperatures for global weather stations for all time.

Session Overview

  • Computers
  • Programming Languages
  • Python

Programming Languages

  • What is a programming language?
  • Types of Programming languages
  • Compiled and Interpreted
    • Scripting
  • Advantages and Disadvantages
  • The programming Language Landscape

A Programming Language is a Translation Engine


Human Understandable Commands >>> Computer Understandable Commands


This engine can operate at multiples levels of abstraction and in different ways.

Types of Programming Languages

  • High Level: Python, R
  • Low Level: Assembly
  • Intermediate: C
  • Object Oriented and/or Functional
  • Compiled or Interpreted

Compiled vs. Interpreted Languages

Compiled

  • Engine converts human understandable syntax to machine code prior to runtime.
  • Creates executable program that is not human readable.
  • Do not need programming language installed on machine to run.
  • Tend to be faster at runtime (although compilation takes time)

Interpreted

  • Engine converts human syntax to machine code during runtime.
  • Runs scripts.
  • Need interpreter installed on machine to run
  • Tend to be slower at runtime (but no compile time)

Session Overview

  • What is a programming language?
  • Types of Programming languages
  • Compiled and Interpreted
    • Scripting
  • Advantages and Disadvantages
  • The programming Language Landscape

Advantages & Disadvantages

  • Performance
  • Human Development & Iteration
  • Level of Abstraction
  • Portability/Compatibility
  • Data Science Applications?

Programming Language Landscape

Compiled

  • Fortran
  • C Family
  • Rust
  • Java
  • Go

Interpreted

  • Python
  • R
  • Javascript
  • Ruby
  • Julia
  • Matlab

Session Overview

  • Computers
  • Programming Languages
  • Python

Python Overview

  • History
  • Advantages and Disadvantages
  • Parallelization (Thread Lock)
  • Other Languages…

Python

  • General Programming Language (Thank you Guido)
  • Awesome Glue language
  • Very rapid development, but sometimes has issues
  • Multifaceted development, watch Pangeo, Anaconda, many others
  • Everywhere

In Addition:

  • Open Source
  • Scripted
  • Really good REPL
  • Literate Programming Support
  • Flexible Extensions/Package/Library ecosystems

Advantages

  • Basic Stats and Analysis
  • Top of Class General AI/ML
  • Best in class Deep Learning
  • Defacto standard for LLMs
  • Connections between programming languages
  • Tabular Data!
  • Gridded data!
  • Clusters!

Disadvantages

  • Performance (Can be slow)
  • Spaces, really?
  • Highly specialized domains.

Parallelization in Python

  • Global Interpreter Lock (GIL)
    • Mutex that protects access to Python objects
    • Prevents multiple native threads from executing Python bytecodes at once
  • Multi-processing allows Python code to bypass the GIL by running in separate processes with their own interpreters
  • Multi-threading can still be useful for I/O-bound tasks where the GIL is less of a bottleneck

Also: Can use compiled libraries (C, Rust) for parallelization outside of python.

Other Data Science Languages

To Be Aware Of:

  • R, Julia

To be familiar with:

  • JavaScript - anything and everything web
  • Rust - the data engineering language of the future
  • SQL - the de facto standard for accessing and analyzing tabular data and many cloud APIs