1 PC204 Lecture 4 Data Structures Copyright 2011 by Tom Ferrin and the Regents of the University of California. All rights reserved

1

PC204

Lecture 4

Data Structures

Copyright 2011 by Tom Ferrin and the Regents of the University of California. All rights reserved.

2

First, to answer a question that came up…

How do you include Python modules that are saved in a different directory?

This example comes from before swampy was a Python package. Suppose you have a bunch of modules saved in a directory called “swampy” like this…

pythagoras> ls –l ~/Desktop/pc204/CaseStudy1/swampy-rw-r--r-- 1 tef staff 6210 Oct 5 12:09 AmoebaWorld.py-rwxr-xr-x 1 tef staff 6492 Oct 5 12:09 CellWorld.py*-rwxr-xr-x 1 tef staff 39590 Oct 5 12:09 Gui.py*-rwxr-xr-x 1 tef staff 51450 Oct 7 19:55 Gui.pyc*-rwxr-xr-x 1 tef staff 47260 Oct 5 12:09 Lumpy.py*-rwxr-xr-x 1 tef staff 16566 Oct 5 12:09 Sync.py*-rw-r--r-- 1 tef staff 3938 Oct 5 12:09 TurmiteWorld.py-rw-r--r-- 1 tef staff 7814 Oct 5 12:09 TurtleWorld.py-rwxr-xr-x 1 tef staff 6122 Oct 5 12:10 World.py*-rw-r--r-- 1 tef staff 286 Oct 5 12:09 coke.py-rw-r--r-- 1 tef staff 2761 Oct 5 12:09 danger.gif-rwxr-xr-x 1 tef staff 265 Oct 5 12:09 lumpy_test.py*-rwxr-xr-x 1 tef staff 226 Oct 5 12:09 lumpy_test2.py*-rwxr-xr-x 1 tef staff 301 Oct 5 12:09 lumpy_test3.py*-rw-r--r-- 1 tef staff 130 Oct 5 12:09 mutex.py-rw-r--r-- 1 tef staff 377 Oct 5 12:09 readwrite.py-rw-r--r-- 1 tef staff 283 Oct 5 12:09 turtle_code.py-rw-r--r-- 1 tef staff 1130523 Oct 5 12:10 words.txt

And suppose you’re working in the directory “CaseStudy1.” You can’t just say “import” because Python will only look in the current directory for the module and this will happen…

3

pythagoras> pwd/Users/tef/Desktop/pc204/CaseStudy1pythagoras> pythonPython 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin>>> from TurtleWorld import *Traceback (most recent call last): File "<stdin>", line 1, in <module>ImportError: No module named TurtleWorld>>>

You need to tell Python to also look in the swampy directory for modules, in addition to the current directory. You can do this with the following two lines of code:

>>> import sys>>> sys.path.insert(0, "/Users/tef/Desktop/pc204/CaseStudy1/swampy")

Now when you do the import, everything works fine...

>>> from TurtleWorld import *>>>

The two lines of code tell Python to first import the "sys" module (sys is short for system and is a module containing functions that interface with the computer's operating system), and then to append a string to the path variable. The string is the pathname of the directory containing the other modules. If you have several modules in different directories that you want to import from, you can just use multiple sys.path.insert() lines, each specifying a different directory.

4

This technique works with Linux, Mac OS X, and Windows. Except Windows is a little different...

>>> import sys>>> sys.path.insert(0, "E:\\home\\conrad\\swampy")

This is because Windows uses a different way to specify directories. Also, on Windows you must use two backslashes instead of one everywhere.

On Linux and Mac OS X there’s an alternative way to accomplish the same thing using the PYTHONPATH environment variable…

pythagoras> export PYTHONPATH="/Users/tef/Desktop/pc204/CaseStudy1/swampy”

…and if you want to avoid typing this in every time you open a Terminal window, you can just add this command to your .bashrc file. If you have multiple directories you want to add, put them in a colon separated list…

pythagoras> export PYTHONPATH="/Users/tef/Desktop/pc204/CaseStudy1/swampy:/Users/tef/MyModules”

5

Quick Review from previous classes…

Python programs can be decomposed into modules, statements, and objects:

Programs are composed of modules;

Modules contain statements;

Statements create and process objects.

“Objects” are also known as “data structures” in some programming languages. They’re called objects in Python to distinguish them because the low-level data structure manipulation functions often needed with many programming languages aren’t needed in Python.

Python has several built-in object types. These are…

Object Type Examples

Numbers 3.1416, 42, 123456789L

Strings ‘pc204’, “Tom’s Story”

Files text = open(‘eggs’, ‘r’).read()

Lists [1, [2, ‘three’], 4]

Dictionaries {‘food’: ‘spam’, ‘taste’: ‘yum’}

Tuples (1, ‘spam’, 4, ‘U’)

6

Numbers can be any of several types…

Constant Interpretation

1234, -24, 0 Normal integers

999999999L Long integer (see below)

1.23, 3.14e-10, 0.0 Floating point number

0177, 0x9ff Octal and hexadecimal constant

3+4j, 3.0+4.0j, 3J Complex number constants

Integers can be in the range of -2,147,483,648 to 2,147,483,647

(i.e. roughly +/- 2.15 billion)

Floating point numbers can range from +/- 4.9e-324 to

+/- 1.8e+308 and have approximately 16 digits of precision

Long integers have unlimited precision (i.e. they can have has many digits as your memory space allows)

More about floating point numbers…

Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has the value 1/10 + 2/100 + 5/1000.

Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.

Consider the fraction 1/3. You can approximate that as a base 10 fraction as 0.3333333333333. But no matter how many digits you specify, the result is never exactly 1/3.

In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.00011001100110011001100110011 00110011001100110011...

It’s easy to forget that the stored value is an approximation to the original decimal fraction, because of the way that floats are displayed at the interpreter prompt. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine.

7

If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display…

0.1000000000000000055511151231257827021181583404541015625

Since this is more digits than most people find useful, Python keeps the number of digits manageable by displaying a rounded value instead, so that 0.1 prints as 0.1

It’s important to realize that this is, in a real sense, an illusion: the value in the machine is not exactly 1/10, you’re simply rounding the display of the true machine value. This fact becomes apparent as soon as you try to do arithmetic with these values…

>>> 0.1 + 0.2

0.30000000000000004

Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all computer languages that support floating-point arithmetic.

Binary floating-point arithmetic holds several surprises like this. You should read The Perils of Floating Point by Bruce Bush for a more complete account of other common surprises: http://www.lahey.com/float.htm

8

9

Python Lists

Lists are ordered collections of arbitrary objects that can be accessed by offsets (just like strings), can vary in length, and can contain other lists (i.e. are nestable). Unlike strings, lists are mutable sequences because they can be modified in place, which means they support operations like deletion, index assignment, and methods. Lists contain, technically, zero or more references to other objects.

Common List Expressions and Methods:

Operation Interpretation

L1 = [] Creates an empty list

L2 = [0, 1, 2, 3] A four element (or item) list

L3 = [‘one’, ‘two’, [1, 2]] Nested sublists

L2[2] Third item in a list

L3[2][0] First sublist item in the third list

item

L2[i:j] Slice (just like in strings)

len(L3) Length (just like in strings)

L1 + L2 Concatenation

L1 * 4 Repetition

for x in L2: Iteration

‘two’ in L3 Membership test

10

Common List Expressions and Methods (continued):


L2.append(4) Grow list at end by 1 item (a ‘4’)

L2.extend([1, 2, 3]) Grow list at end by multiple items

L2.sort() Sort the list

L2.index(n) Find index of ‘n’ in list

L2.reverse() Reverse the items in the list

del L2[k] Remove kth item from the list

L2[i:j] = [] Remove ith through (j-1)th items

L2[2] = 42 Replace 3rd item in the list

(index assignment)

L2[1:3] = [0, 0] Replace 2nd & 3rd list items

with zeros (slice assignment)

Just like with strings, items in a list are fetched by indexing; i.e., providing the numeric offset of the desired item in the list. In other words, indexing a list begins at 0 and ends at one less than the length of the list. You can also fetch items from a list using negative offsets, which just count backwards from the end of the list.

11

Here are some simple examples:

>>> L1=[1, 2, 3, 4]>>> L2=[5, 6]>>> len(L1)4>>> L1[1:3] # indexing begins at 0![2, 3]>>> L1[:3] # missing first index means from the beginning[1, 2, 3]>>> L1[2:] # missing second index means through end of the list[3, 4]>>> L1[0:-1] # -1 means one less item than the last item[1, 2, 3]>>> L1 + L2[1, 2, 3, 4, 5, 6]>>> L1.append(L2)>>> print L1[1, 2, 3, 4, [5, 6]]

Indexing and Slicing of lists are very common operations, so just remember...

Indexing (L[i]):•Fetch items at offsets (the first item is at offset zero)•Negative indexes mean to count from the end•L[0] fetches the first item•L[-2] fetches the second item from the end

Slicing(L[i:j]):•Extracts contiguous section of items from a list•Slice boundaries default to zero and the list length•L[1:3] fetches from offsets 1 up to but not including 3•L[1:] fetches from offset 1 through the end•L[:-1] fetches from offset 0 up to but not including the last item

12

NOTE TO SELF: Come up with a better example!

Example usage in a function:

def extract(s):

"""extract space-separated words from string 's' and return each

word in a list"""

result = [] # begin with empty list

while s:

k = 0 # k will be the end of a slice

for c in s: # inspect chars in string one at a time

if c == ' ':

break # exit the loop if a space

k = k + 1 # increment slice limit by 1

result.append(s[0:k]) # add a new word to list

s = s[k+1:] # save remaining string

return result # done - return the list to the caller

print extract('Now is the time for all good people')

[’Now', 'is', 'the', 'time', 'for', 'all', 'good', 'people']

print extract('Now') # how about with just one word in s?

[‘Now’]

It's always good to test your code, especially at the "boundaries" of where it's designed to work. For example, what if we call extract() with a argument that doesn't have any words in it?

print extract('') # test for correct result with null str

[]

That seems right. But what about a string that only has spaces and no words?

print extract(' ') # a string with three spaces

['', '', '']

Hmm. Is a list of three items the correct result?

13

At least for strings, most would argue that it doesn't matter how many spaces there are between words. So, it's one or more spaces that separate words in a string. Similarly, if there aren't any words at all in the string, even if there are some spaces, then the right answer for our function to return should be an empty list. So we need to fix our code and re-test it...

def extract(s):

"""extract space-separated words from string 's' and return each

word in a list"""

result = [] # begin with empty list

while s:

k = 0 # k will be the end of a slice

for c in s: # inspect chars in string one at a time

if c == ' ':

break # exit the loop if a space

k = k + 1 # increment slice limit by 1

if k != 0: # only add a new word if one was found

result.append(s[0:k]) # there was, so add it

s = s[k+1:] # save remaining string

return result # done - return the list to the caller

print extract(' ') # a string with only three spaces[]

print extract(' Now is the time ') # spaces here and there['Now', 'is', 'the', 'time']

The point here is that programmers have to decide how they want their code to behave and then test to make sure it does what is desired. "Boundary cases" such as this come up all the time in programming and you must get in the habit of testing your code for these cases. For example, does the code do the right thing if a loop never gets executed at all? How about if it's only executed once? How about if it's executed multiple times? You should also get in the habit of documenting the correct behavior – especially with unusual cases – with comments so that if you or someone else later looks at your code, it's clear that you've considered the boundary cases.

14

Let's build on our previous example by adding another function…

def instring(w, s): # is word w contained in s?

words = extract(s) # 'words' is now a list of words

for x in words: # consider each, one at a time

if x == w: # test for a match

return True

return False

str = " Now is the time for all good people "

print instring("time", str)

True

print instring("pc204", str)

False

print instring("", str)

False

print instring("", "")

False

The last two cases are again testing boundary conditions. Is an empty string contained in str? For our example here (where our code is designed to operate on words), since an empty string is not a word then it can never be contained in str and our function is therefore working correctly. The point is, you're the programmer and you need to decide, test, and document what the correct behavior of your code should be.

There are some shortcuts for lists:

def instring(w, s): # is word w contained in s?

words = extract(s) # 'words' is now a list of words

#for x in words: # consider each, one at a time

# if x == w: # test for a match

# return True

#return False

return w in words # Use membership testing

There are Python built-in functions for handling lists:

Function Return Value

all(L1) True if all elements in L1 are True

any(L1) True if any element in L1 is True

min(L1) Smallest element in L1

max(L1) Largest element in L1

reversed(L1) Elements of L1 in reverse order

sorted(L1) Elements of L1 in sorted order

sum(L1) Sum of elements in L1

sum(L1, start) Sum of elements in L1 + start

15

Another built-in function that is very useful:

# Loop through array using indices

for index in range(len(L1)):value = L1[index]

print index, value

# Here’s the short version

for index, value in enumerate(L1):print index, value

Finally, list comprehension

# Apply an operation to each element and keep

# the results as a list

L2 = []

for value in L1:

L2.append(value + 10)

# Using list comprehension:

L2 = [ value + 10 for value in L1 ]

# You can even skip elements:L3 = [ value + 10 for value in L1 if value < 100 ]

You don’t need to use list comprehension, but you should recognize it when you see it. Sometimes, people get too clever with list comprehension and the code ends up being very difficult to read. Best use is for simple mapping from one set of values to another.

16

17

Python Dictionaries

Dictionaries are unordered collections of arbitrary values that can be accessed via an associated “key”. Dictionaries are of the category mutable mapping, which means they can be modified in place (like lists), but don’t support sequence operations (like strings and lists). An item is a (key, value) pair.

Common Dictionary Constants and Operations:


D1 = {} Creates and empty dictionary

D2 = {‘tom’:1, ‘conrad’:2} Two item dictionary

D3 = {‘tom’:1, ‘conrad’:{‘greg’:3, ‘eric’:4}} Nesting

D2[‘conrad’] Retrieval of value by key

D3[‘conrad’][‘eric’] Nested retrieval

D2.has_key(‘tom’) Membership test

D2.keys() List of all keys in the dictionary

D2.values() List of all values in the dictionary

D2.items() List of (key, value) pairs

D2.get(k, v) Value with key "k" if k in D2,

otherwise v (v defaults to None)

D2.setdefault(k, v) Like D2.get(), but also adds item

to dictionary D2

len(D2) Number of keys in the dict.

D2[key] = value Add or change an item

del D2[key] Delete an item

18

Dictionary Examples:d1 = {‘tom’:1, ‘conrad’:2, ‘greg’:3, ‘eric’:4}

print d1[‘greg’] # given a key, fetch associated value

3

print len(d1) # return number of items in the dict.

4

print d1.has_key(‘eric’) # test for the presence of a key

True

print d1.keys() # return list of all keys

[‘tom’, ‘conrad’, ‘greg’, ‘eric’]

d1[‘tom’] = 42 # assign a new value for key ‘tom’

d1[‘conrad’] = [3, 4, 5] # items can be arbitrary objects

print d1

{'tom': 42, 'eric': 4, 'greg': 3, 'conrad': [3, 4, 5]}

del d1[‘greg’] # delete an entry

print d1

{'tom': 42, 'eric': 4, 'conrad': [3, 4, 5]}

d1[‘al’] = ‘good man’ # assigning to a new index

adds a new entry

print d1

{'tom': 42, 'eric': 4, 'al': 'good man', 'conrad': [3, 4, 5]}

d1[101] = ‘test’ # keys can be any immutable object

print d1

{'tom': 42, 'eric': 4, 101: 'test', 'al': 'good man', 'conrad': [3, 4, 5]}

19

Sets

Sets are unordered non-redundant collections of data, just like dictionaries, but unlike dictionaries they only have keys – no values. They provide for rapid lookups and are ideal for testing for membership.

Set examples:

s1 = set ([1, 2, 3, 4]) # this is a set

s1.add(5) # adds a new element to the set

s1.add(5) # does nothing, since "5" is already a

member of the set

s1.remove(4) # remove an element

s1.discard(4) # like remove but won't cause an exception

2 in s1 # membership test (much faster than a list)

s2 = set([1, 3])

s1 | s2 # returns union of s1 and s2

s1 & s2 # returns intersection of s1 and s2

s1 – s2 # returns difference of s1 and s2

20

Tuples

Tuples are just like Python lists, but they are immutable. All the same operations that worked on lists work on tuples except tuples don’t provide the methods that lists do (e.g. append(), sort(), reverse()). Also concatenation, repetition, and slicing applied to tuples return the results in new tuples. The immutability of tuples provides object integrity; you can be sure that a tuple can’t be changed inadvertently somewhere else in your program.

Tuple examples:

t1 = (1, 2, 3, 4) # this is a tuple

t2 = 5, 6, 7, 8 # this too (syntactically unambiguous)

t3 = (9,) # a one-item tuple (comma required to

# avoid ambiguity with expressions)

t4 = (9) # an expression that evaluates to 9

t5 = (‘abc’, (1, 2, 3), ‘def’) # nested tuples

A common use of tuples is in function return values:

def trivial():

x = 2.71828

y = 3.14159

return (x, y)

m, n = trivial()

print m, n

2.71828 3.14159

21

Here's a more real-world example (available as jumble.py in the lectureNotes directory):

## Jumble example:# This is an example of using strings, lists and dictionaries# in implementing a simple puzzle solver.## Jumble is a game found in the San Francisco Chronicle.# The player is given letters from a word in random order# and the goal is to find the original word.## There are at least a couple different ways of solving# this problem. For example, we can compute all permutations# of the given letters and check whether each one is a# real word. Another is to start with a list of real# words and check which ones have the same letter composition# (fingerprint) as the given set of letters.## We are implementing the latter solution. In our program# words are Python strings. Fingerprints are Python dictionaries# whose keys are letters and whose values are the number# of occurrences of those letters (in short, a histogram).# The list of real words is kept in a Python dictionary# whose keys are the lengths of words and whose values are# Python lists of words of those lengths. (The choice to# use word length as the key is for easy elimination of# words that cannot possibly match the given input.)## An interesting question is "which approach will be faster?"# Consider five-letter words (the statistics are similar for# six-letter words). There are about 3,000 to 4,000 "real"# words, depending on your dictionary. Is it faster to compute# the 120 (=5*4*3*2*1) permutations of letters and check them# against a Python dictionary; or is it faster to compute# the fingerprints for all the "real" words and compare them# against the fingerprint of the given letters?#

22

## Main program#def main():

"""Main program"""# Load a dictionary of words then prompt users# for letters that they want to "unjumble"load('words.txt')while 1:

line = raw_input('Letters: ')characters = line.strip()if not characters:

break # blank line means quit program

print jumble(characters)

def jumble(w):"""Find a word in the dictionary with the same letters as

'w'"""matches = []n = length(w)fp = fingerprint(w)for wol in wordsOfLength(n):

if fingerprint(wol) == fp:matches.append(wol)

return matches

23

## Functions for handling a dictionary of words#words = {}

def wordsOfLength(L):"""Return a list of words of requested length ‘L'"""return words.get(L, [])

def load(filename):"""Load dictionary of words from a file"""# Only keep words of length 5 and 6 because the# SF Chronicle jumble only uses words of those lengthsf = open(filename)while 1:

line = f.readline()if not line:

breakw = line.strip()n = length(w)if n != 5 and n != 6:

continuetry:

words[n].append(w)except KeyError:

words[n] = [w]f.close()

24

## Functions for handling words#def length(word):

"""Return number of alphabetic characters in 'word'"""count = 0for c in word:

if c.isalpha():count = count + 1

return count

def fingerprint(word):"""Return fingerprint (character histogram) of 'word'"""fp = {}for c in word:

if not c.isalpha(): # ignore non-alphabeticcontinue

c = c.lower() # convert all letters to LC

try:fp[c] = fp[c] + 1

except KeyError:fp[c] = 1

return fp

## If invoked directly, run the program## Some words to try: timer, retina, silly, server, python#if __name__ == ‘__main__’:

main()

25

Variables, objects, and values:

Variables are just references to objects. Objects have types and categories and may be mutable, but names don’t have these properties. Thus, all the following are true…>>> x = 42 # x bound to integer

>>> x = ‘pc204’ # x is now bound to string

>>> x = [1, 2, 3] # x is now bound to a list

>>> y = [‘a’, x, ‘c’] # embed a reference to a list>>> print y

[‘a’, [1, 2, 3], ‘c’]>>> x[1] = ‘b’ # this changes the object y references

>>> print y

[‘a’, [1, ‘b’, 3], ‘c’]

A reference assigned to another reference is still a reference...>>> x = [1, 2, 3]

>>> z = x # both x and z reference the same object

>>> y = [‘a’, z, ‘c’]>>> print y

[‘a’, [1, 2, 3], ‘c’]>>> x[1] = ‘b’ # this still changes the object y references

>>> print y

[‘a’, [1, ‘b’, 3], ‘c’]

26

If you don’t want x and y to share the same object, you need to create an explicit copy of the object…

>>> x = [1, 2, 3]

>>> # the ‘empty list’ slice below creates a copy

>>> y = ['a', x[:], 'c']

>>> print y

['a', [1, 2, 3], 'c']

>>> x[1] = 'b' # so now this doesn’t change y

>>> print y

['a', [1, 2, 3], 'c']

>>> print x

[1, 'b', 3]

Why is this example different? (Hint: what’s the object?)

>>> x = 5

>>> y = [‘a’, x, ‘c’]>>> print y

[‘a’, 5, ‘c’] >>> x = 10

>>> print y

[‘a’, 5, ‘c’]

Answer: The object is a number. Numbers are immutable numerics and they can’t be changed (the number 5 always equals 5). There’s no analogy to a list assignment as in the earlier examples.

27

A Quick Review of what we've covered in the first four weeks…

Objects in Python:

Numbers immutable numeric

Strings immutable sequence of characters

Lists mutable sequence of objects

Dictionaries mutable mapping of objects

Tuples immutable sequence of objects

Files mutable sequence of characters used for long-term storage. Files only have methods.

Functions immutable sequence of Python statements

Python Operators:

x or y Logical ‘or’ (y evaluated only if x is false)

x and y Logical ‘and’ (y evaluated only if x is true)

not x Logical negation

<, <=, >, >=, = =, <>, !=,

is, is not, in, not in Comparison operators, identity tests, sequence membership

x | y Bitwise or

x ^ y Bitwise exclusive or

x & y Bitwise and

x<<y, x>>y Shift x left or right by y bits

x + y, x - y Addition/concatenation, subtraction

x * y, x / y, x % y Multiplication/repetition,

division, remainder/format

-x, +x, ~xUnary negation, identity, bitwise

complement

x[i], x[i:j], x.y, x(…) Indexing, slicing, qualification,

function calls

(…), […], {…}, `…` Tuple, list, dictionary, conversion

to string

28

Python compares objects as follows:

Number are compared by relative magnitude;

Strings are compared lexicography;

List and tuples are compared by comparing each component;

Dictionaries are compared as though comparing sorted (key,value) lists.

Any empty object (a string, list, dictionary, tuple, or the ‘None’ special object) is false, while nonempty objects are true.

Python keywords:

and del from not while

as elif global or with

assert else if pass yield

break except import print

class exec in raise

continue finally is return

def for lambda try

(Words in boldface on this and previous page have already been discussed in class. The others are still coming in future lectures.)

Documents

1 PC204 Lecture 4 Data Structures Copyright 2011 by Tom Ferrin and the Regents of the University of California. All rights reserved