43
Week 1 The computer environment and Python Introduc7on to Coding and Objec7ve Analysis for the Atmospheric and Oceanic Sciences

Week 1 The computer environment and Python IntroducQon to

Embed Size (px)

Citation preview

Page 1: Week 1 The computer environment and Python IntroducQon to

Week 1 

The computer environment and Python 

Introduc7on to Coding and Objec7ve Analysis  

for the Atmospheric and Oceanic Sciences 

Page 2: Week 1 The computer environment and Python IntroducQon to

Statistics and data analysis resources for AOS

We have also created a Dropbox folder containing some of the material below, and that can be accessed here. Pleasesend suggestions on how best to make these resources available.

Published books

• Daniel S. Wilks. Statistical methods in the atmospheric sciences, volume 100 of International geophysics series.Academic Press, 2011

• Hans von Storch and Francis W. Zwiers. Statistical analysis in climate research. Cambridge University Press,1999

Course lecture notes available online

Some of the most valuable material lives in the lecture notes of courses at other universities. Note that the first twobelow use MATLAB code for most examples, though the syntax is very similar to Python, and we are happy to helptranslate.

• Dennis Hartmann’s (U. Washington) course notes for “Objective analysis”

• Chris Bretherton’s (U. Washington) course notes for “Computational methods for data analysis”

• Julien Emile-Geay’s (USC) course notes for “Data analysis in the earth & environmental sciences”

Other good links

Websites

• The folks at NCAR have written a great Climate Data Guide (intro to common tools and methods)

• Johnny Lin, a professor and Python enthusiast, maintains his own Python for the Atmospheric and OceanicSciences blog

Documents

• Silvia A. Venegas’ “Statistical methods for signal detection in climate”

• Abdel Hannachi’s “Primer for EOF analysis of climate data”

• Charles E. Grinstead and J. Laurie Snell’s “Introduction to probability”

Green Tea Press

Allen B. Downey, an engineering professor at Olin in Massachusetts, also writes great (and free!) books on learningPython, as well as introductory and advanced statistics with a Python bent. A few of his books are linked below,though more can be found at www.greenteapress.com.

• Think Python → a book on how to ‘think like a computer scientist’ using Python (introductory chapters aregreat for beginners; later chapters are heavy on object oriented programming)

• Think Stats → a book on probability and statistics using Python

• Think Bayes → a book on Bayesian statistics using Python

Page 3: Week 1 The computer environment and Python IntroducQon to

UNIX/LINUX An opera7ng system that is widely used for scien7fic programming 

 

Interface is the command line in a terminal shell 

 

Synop7c lab computers, professor clusters, Apple products (OSX), Google products 

(Android), and supercomputers use it 

Page 4: Week 1 The computer environment and Python IntroducQon to

UNIX/LINUX An opera7ng system that is widely used for scien7fic programming 

 

Interface is the command line in a terminal shell 

 

Synop7c lab computers, professor clusters, Apple products (OSX), Google products 

(Android), and supercomputers use it 

Top level directory (root, home directory, /home/neil/) 

Subdirectory #1 (/home/neil/scripts/)  Subdirectory #2 (/home/neil/figures/) 

hello_world.py 

fortran_is_old.f90 

matlab_is_my_friend.m 

america.eps 

china.png 

chile.jpg 

Page 5: Week 1 The computer environment and Python IntroducQon to

20 useful commands 1.  “ls” – list the contents of a directory (‐l, ‐h, ‐r, ‐t) 

2.  “cp” – change directories 

3.  “pwd” – print current working directory 

4.  “cp” – copy a filename to a new filename or copy a file to a new directory (‐r) 

5.  “mv” – change a filename or move a file to a new directory (‐r) 

6.   “rm” – remove files (‐f, ‐r) 

7.  “mkdir” and “rmdir” – make and remove a directory 

8.  “sudo” – make a super do ac7on (requires owner or root password)  

9.  “man” – print the manual for the following command  

10.   “which” – show loca7on of shell commands 

11.  “top” – print details of CPU processes (users and opera7ons going on now) 

12.  “ssh” – secure shell login to remote machines (ssh –Y or ssh –X for shell forwarding) 

13.  “scp”, “s_p”, “rsync”, “_p” – (securely) transfer files from one machine to another 

14.  “bg” – print what jobs are running the background 

15.  “.” or “./” – put files in the current directory or run a script in the current directory  

16.  “screen” – run a long script in a screen, allows you to log off and not kill the script 

17.  “ctrl‐c” or “ctrl‐z” – kill a script 

18.  “ctrl‐a”, “ctrl‐e” – move to the beginning or end of the command line  

19.  “cat” – print the contents of a filename on console without opening an editor 

20.  “grep” – a magical find all command 

Page 6: Week 1 The computer environment and Python IntroducQon to
Page 7: Week 1 The computer environment and Python IntroducQon to

Package managers 

A package manager is a collec7on of so_ware and tools that helps you install things cleanly and easily on your computer 

This is the best way to keep installa7ons clean and manageable.  If you have to install so_ware you will use on the command line, USE A PACKAGE MANAGER IF POSSIBLE. 

 

•  For Mac – homebrew (hdp://brew.sh/) 

•  For LINUX – yum (hdp://yum.baseurl.org/) –  also apt, ap7tude, pacman 

•  For Windows – chocolatey (hdps://chocolatey.org/) 

•  Anaconda – use for Python!  (all opera7ng systems) 

 

Page 8: Week 1 The computer environment and Python IntroducQon to

Text Editors 

•  Simply a plaiorm to type code, read text files, 

or write love notes to your opera7ng system  

•  There are so many! 

– Some are great for coding on your laptop, like 

TextWrangler (OSX), SublimeText, etc. 

Page 9: Week 1 The computer environment and Python IntroducQon to

Text Editors 

•  But there are two “classics” that unix systems usually have pre‐installed 

– vi (or vim)  

– emacs 

–  the learning curve is steep for each, but the payoff is that you can code on any unix machine without any editor compa7bility issues 

– “A 2009 survey of Linux Journal readers found that vi was the most widely used text editor…” (Wikipedia aka FACT).  

Page 10: Week 1 The computer environment and Python IntroducQon to
Page 11: Week 1 The computer environment and Python IntroducQon to

Your bash profile 

•  Scripts that are executed when bash (i.e. terminal) is used 

•  vi ~/.bash_profile

•  source ~/.bash_profile

home directory  “hidden” files 

Page 12: Week 1 The computer environment and Python IntroducQon to

Your bash profile 

Let’s get a nice preface to the command line: 

export PS1="\u@\h\w:”

neilberg@whiz: [type commands here]

Page 13: Week 1 The computer environment and Python IntroducQon to

Your bash profile 

Let’s get some color highligh7ng (machine dependent):  

export LSCOLORS=ExFxBxDxCxegedabagacad

final_paper.pdf

coding_class/

Page 14: Week 1 The computer environment and Python IntroducQon to

Your bash profile 

Let’s define some shortcuts: 

alias cheddar="ssh -Y [email protected]

alias ls=“ls –l”

>>> cheddar (executes ssh –Y [email protected]

>>> ls (executes ls –l) 

Typing on the command line 

Page 15: Week 1 The computer environment and Python IntroducQon to

Your bash profile 

Let’s define some paths: 

export NCARG_ROOT=/usr/local/ncarg

>>> echo $NCARG_ROOT

/usr/local/ncarg

Or append a path to your command path: 

export PATH=$NCARG_ROOT/bin:$PATH

>>> echo $PATH

/Users/neilberg/anaconda/bin:/usr/local/ncarg

Page 16: Week 1 The computer environment and Python IntroducQon to
Page 17: Week 1 The computer environment and Python IntroducQon to

Why should I code in Python? 

•  Free –  completely free.  100% free. 

•  Open source –  no company or person owns Python 

–  coding geniuses are constantly refining, improving, and adding to the language   

•  Readability –   the “code” reads like speech; forced indents make the code clean and organized 

•  High level  –  a ton of the real computer science is done behind the scenes, e.g. no compiling 

 

•  Mathema7cal, scien7fic, ploung libraries –  powerful, well documented, and only a few keystrokes away 

•  Jobs a_er you graduate –  NASA/NOAA/NCAR/NWS, Los Alamos Na7onal Laboratory, LNLL, and ESRI all use Python…oh, and 

same with Google, Reddit, Yahoo, Walt Disney Anima7on, Pintrest, Dropbox, YouTube, and Yelp.  

 

 

Page 18: Week 1 The computer environment and Python IntroducQon to

The Zen of Python (hdps://www.python.org/dev/peps/pep‐0020/) 

The Zen of Python 

 

    Beau7ful is beder than ugly. 

 

    Explicit is beder than implicit. 

 

    Simple is beder than complex. 

 

    Complex is beder than complicated. 

 

    Readability counts. 

     

    In the face of ambiguity, refuse the tempta7on to guess. 

 

    Now is beder than never. 

Page 19: Week 1 The computer environment and Python IntroducQon to

hdp://www.7obe.com/index.php/content/paperinfo/tpci/index.html 

Page 20: Week 1 The computer environment and Python IntroducQon to

Coding in Python ‐ resources 

•  “Think Python” by Allen Downey (free and awesome) –  hdp://www.greenteapress.com/thinkpython/thinkpython.pdf 

•  The official documenta7on (it’s very well wriden, always has examples) –  hdp://docs.scipy.org/doc/numpy/reference/ 

•  Stackoverflow (stackoverflow.com) –  Smarter people have already coded your problem (or parts of it) 

–  Search previously submided ques7ons or submit your own   

–  Pure programmers can be ego7s7cal and obnoxious. Don’t take offense, they’re just jealous they aren’t atmospheric, oceanic, and space scien7sts. 

•  Google anything and everything –  copy and paste error messages 

–  search for help through message boards, list serves, blog posts, etc 

–  seriously, how did people code before Google? 

Page 21: Week 1 The computer environment and Python IntroducQon to

Coding in Python – installa7on 

•  Xcode – Apple so_ware development tools  

(hdps://developer.apple.com/xcode/downloads/) ‐‐ the App Store 

 

•  XQuartz if Mac: hdp://xquartz.macosforge.org/landing/ 

 

•  Python  

–  Pre‐installed on all Macs, but not all packages you will need •  To install missing packages, you can use Homebrew: hdp://brew.sh/ 

 

–  Anaconda is the easiest way to get everything: hdp://con7nuum.io/downloads •  Available for Windows, Mac, and Linux plaiorms 

 

–  Other op7ons: •  Enthought Canopy ‐ hdps://www.enthought.com/products/canopy/ 

  

 

Page 22: Week 1 The computer environment and Python IntroducQon to

Python 2 vs Python 3 

•  I (Neil) code with Python version 2.7, but if I started today, I’d go with version 3.4.  –  2.7 is most commonly used today because of its robust libraries and 

historical usage 

–   3.4 is newer and ac7vely being developed, but may not have every library that is available in 2.7 

 

•  There are some incompa7ble differences, like the “print” command.  Aside from that, you probably won’t no7ce a difference.   

•  The following slides will use version 3 syntax, but this syntax was back ported (allowed) in version 2.x.  So you will not need to change anything if you’re using 2.x.  

Page 23: Week 1 The computer environment and Python IntroducQon to

The essen7al modules 

•  NumPy – Numerical Python (hdp://www.numpy.org/) 

–  arrays (matrices), linear algebra, Fourier transform 

–  make the switch now MATLAB users: 

hdp://wiki.scipy.org/NumPy_for_Matlab_Users 

 

•  SciPy – Scien7fic Python (hdp://www.scipy.org/) 

–  advanced mathema7cal algorithms 

•  Matplotlib – Ploung (hdp://matplotlib.org/) 

–  MATLAB style ploung, but in Python. Winning!  

•  netCDF4 – Network Common Data Format version 4 (hdps://code.google.com/p/

netcdf4‐python/) 

–  the format that many data sets are stored in 

–  atmospheric/climate models, reanalyses, observa7ons 

Page 24: Week 1 The computer environment and Python IntroducQon to

Let’s code 

Guido van Rossum, creator of Python 

Page 25: Week 1 The computer environment and Python IntroducQon to

Check installa7ons 

•  Open an interacEve python shell 

 

>>> python

Python 2.7.9 |Anaconda 2.0.1 (x86_64)| (default, Dec 15 2014, 10:37:34) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin

Page 26: Week 1 The computer environment and Python IntroducQon to

Check installa7ons 

•  Import the modules you need 

 

>>> import numpy

>>> [blank line if installed correctly]

>>> import numpyy

>>> Traceback (most recent call last): File "<stdin>", line 1, in <module>ImportError: No module named numpyy

>>> import scipy

>>> import matplotlib

>>> import netCDF4

Page 27: Week 1 The computer environment and Python IntroducQon to

Code on! 

•  Make the computer say “Hello World!” 

 

>>> print(‘Hello World!’)

Hello World!

[Note: Python accepts single or double quotes for strings] 

Page 28: Week 1 The computer environment and Python IntroducQon to

Code on! 

•  For loops – itera7ng over a set of something X 7mes 

 

>>> for course in [‘AOS1’, ‘AOS2’, ‘AOS3’]:...print(‘I have taken ‘+course)...I have taken AOS1I have taken AOS2I have taken AOS3

----------------Notes: 

1. anything bracketed is a “list” 

2. loops must end with a colon 

3. all code following a for loop must be indented by 1 tab (4 spaces) 

4. concatena7ng (joining) strings can be done through the + sign 

Page 29: Week 1 The computer environment and Python IntroducQon to

Callable scripts 

•  The interac7ve shell is great for simple code and debugging, but we’re wri7ng 100+ line scripts that need to be saved. 

•  All Python scripts end with .py 

•  Run the code in the current directory with 

>>> ./python my_script_name.py 

You can also add “.” to the $PATH environmental variable in 

your .bash_profile to avoid having to type ./ every 7me you 

 call a Python script: 

 

export PATH=.:$PATH

Page 30: Week 1 The computer environment and Python IntroducQon to

NumPy – simple stats 

>>> import numpy as np

Page 31: Week 1 The computer environment and Python IntroducQon to

NumPy – simple stats 

>>> import numpy as np

>>> a = np.array([1,2,3,4,5], dtype=‘int’)

>>> a

array([1, 2, 3, 4, 5])

Page 32: Week 1 The computer environment and Python IntroducQon to

NumPy – simple stats 

>>> import numpy as np

>>> a = np.array([1,2,3,4,5], dtype=‘int’)

>>> a

array([1, 2, 3, 4, 5])

>>> a + 5

array([ 6, 7, 8, 9, 10])

Page 33: Week 1 The computer environment and Python IntroducQon to

NumPy – simple stats 

>>> import numpy as np

>>> a = np.array([1,2,3,4,5], dtype=‘int’)

>>> a

array([1, 2, 3, 4, 5])

>>> a + 5

array([ 6, 7, 8, 9, 10])

>>> np.mean(a)

3.0

Page 34: Week 1 The computer environment and Python IntroducQon to

NumPy – simple stats 

>>> import numpy as np

>>> a = np.array([1,2,3,4,5], dtype=‘int’)

>>> a

array([1, 2, 3, 4, 5])

>>> a + 5

array([ 6, 7, 8, 9, 10])

>>> np.mean(a)

3.0

>>> np.std(a)

1.41421356237

>>> np.std(a, ddof=1)

1.58113883008

Page 35: Week 1 The computer environment and Python IntroducQon to

NumPy – mul7‐dimensional arrays 

>>> b = np.array([[5,6,7,8], [9,10,11,12]])

>>> b

array([[ 5, 6, 7, 8],

[ 9, 10, 11, 12]])

Page 36: Week 1 The computer environment and Python IntroducQon to

NumPy – mul7‐dimensional arrays 

>>> b = np.array([[5,6,7,8], [9,10,11,12]])

>>> b

array([[ 5, 6, 7, 8],

[ 9, 10, 11, 12]])

>>> first_column = b[:,0]

>>> first_row = b[0,:]

>>> first_column

array([5, 9])

>>> first_row

array([5, 6, 7, 8])

Python indexing starts at ZERO!!! 

Page 37: Week 1 The computer environment and Python IntroducQon to

NumPy – mul7‐dimensional arrays 

>>> b = np.array([[5,6,7,8], [9,10,11,12]])

>>> b

array([[ 5, 6, 7, 8],

[ 9, 10, 11, 12]])

>>> first_column = b[:,0]

>>> first_row = b[0,:]

>>> first_column

array([5, 9])

>>> first_row

array([5, 6, 7, 8])

>>> for value in first_row:

... if value > 6:

... print(value)

...

7

8

Page 38: Week 1 The computer environment and Python IntroducQon to

NumPy – indexing and slicing 

•  As we’ve seen, Python indexing start at ZERO! 

•  Moreover, a “range” of numbers in Python does NOT include 

the end value. 

>>> range(5)

[0, 1, 2, 3, 4]

>>> range(1,10)

[1, 2, 3, 4, 5, 6, 7, 8, 9]

 

What value results from this expression? 

>>> range(1,10)[5]

Page 39: Week 1 The computer environment and Python IntroducQon to

NumPy – indexing and slicing 

What value results from this expression? 

>>> range(1,10)[5]

Why not 5? 

Because Python indexing starts at zero, so the 5th index is the 6th value in 

the range argument. 

>>> range(1,10)

[1,2,3,4,5,6,7,8,9]

indices: [0th,1st,2nd,3rd,4th,5th,6th,7th,8th]  

Page 40: Week 1 The computer environment and Python IntroducQon to

Next level slicing 

>>> numbers = range(10)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

•  Reverse >>> numbers[::-1] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

•  Every second element>>> numbers[::2][0, 2, 4, 6, 8]

•  Second half (using “len” function)>>> numbers[len(numbers)/2:][5, 6, 7, 8, 9]

Page 41: Week 1 The computer environment and Python IntroducQon to

NumPy – missing values 

•  “nan” is handled very well in NumPy 

–  recommend seung missing values (e.g. 1e20, 

‐999, m, ‐‐). to “np.nan” 

Page 42: Week 1 The computer environment and Python IntroducQon to

NumPy – missing values 

•  “nan” is handled very well in NumPy 

–  recommend seung missing values (e.g. 1e20, 

‐999, m, ‐‐). to “np.nan” 

>>> precip = np.array([25.3, 26.2, 24.5, -999])

>>> precip[precip==-999] = np.nan

or

>>> precip[precip < 0] = np.nan

>>> precip

array([ 25.3, 26.2, 24.5, nan])

Page 43: Week 1 The computer environment and Python IntroducQon to

NumPy – missing values 

>>> precip = np.array([25.3, 26.2, 24.5, -999])>>> precip[precip==-999] = np.nanor>>> precip[precip < 0] = np.nan>>> preciparray([ 25.3, 26.2, 24.5, nan])

•  Average with missing (nan) values>>> np.nanmean(precip)25.333333333333332

•  Sum with missing (nan) values>>> np.nansum(precip)76.0

•  Max/min with missing (nan) values>>> np.nanmax(precip)26.199999 (this is 26.2 but is carried out to many decimal points because it’s a

float64 type)