Upload
clarence-walker
View
213
Download
0
Embed Size (px)
Citation preview
1
PC204
Lecture 4
Data Structures
Copyright 2011 by Tom Ferrin and the Regents of the University of California. All rights reserved.
2
First, to answer a question that came up…
How do you include Python modules that are saved in a different directory?
This example comes from before swampy was a Python package. Suppose you have a bunch of modules saved in a directory called “swampy” like this…
pythagoras> ls –l ~/Desktop/pc204/CaseStudy1/swampy-rw-r--r-- 1 tef staff 6210 Oct 5 12:09 AmoebaWorld.py-rwxr-xr-x 1 tef staff 6492 Oct 5 12:09 CellWorld.py*-rwxr-xr-x 1 tef staff 39590 Oct 5 12:09 Gui.py*-rwxr-xr-x 1 tef staff 51450 Oct 7 19:55 Gui.pyc*-rwxr-xr-x 1 tef staff 47260 Oct 5 12:09 Lumpy.py*-rwxr-xr-x 1 tef staff 16566 Oct 5 12:09 Sync.py*-rw-r--r-- 1 tef staff 3938 Oct 5 12:09 TurmiteWorld.py-rw-r--r-- 1 tef staff 7814 Oct 5 12:09 TurtleWorld.py-rwxr-xr-x 1 tef staff 6122 Oct 5 12:10 World.py*-rw-r--r-- 1 tef staff 286 Oct 5 12:09 coke.py-rw-r--r-- 1 tef staff 2761 Oct 5 12:09 danger.gif-rwxr-xr-x 1 tef staff 265 Oct 5 12:09 lumpy_test.py*-rwxr-xr-x 1 tef staff 226 Oct 5 12:09 lumpy_test2.py*-rwxr-xr-x 1 tef staff 301 Oct 5 12:09 lumpy_test3.py*-rw-r--r-- 1 tef staff 130 Oct 5 12:09 mutex.py-rw-r--r-- 1 tef staff 377 Oct 5 12:09 readwrite.py-rw-r--r-- 1 tef staff 283 Oct 5 12:09 turtle_code.py-rw-r--r-- 1 tef staff 1130523 Oct 5 12:10 words.txt
And suppose you’re working in the directory “CaseStudy1.” You can’t just say “import” because Python will only look in the current directory for the module and this will happen…
3
pythagoras> pwd/Users/tef/Desktop/pc204/CaseStudy1pythagoras> pythonPython 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin>>> from TurtleWorld import *Traceback (most recent call last): File "<stdin>", line 1, in <module>ImportError: No module named TurtleWorld>>>
You need to tell Python to also look in the swampy directory for modules, in addition to the current directory. You can do this with the following two lines of code:
>>> import sys>>> sys.path.insert(0, "/Users/tef/Desktop/pc204/CaseStudy1/swampy")
Now when you do the import, everything works fine...
>>> from TurtleWorld import *>>>
The two lines of code tell Python to first import the "sys" module (sys is short for system and is a module containing functions that interface with the computer's operating system), and then to append a string to the path variable. The string is the pathname of the directory containing the other modules. If you have several modules in different directories that you want to import from, you can just use multiple sys.path.insert() lines, each specifying a different directory.
4
This technique works with Linux, Mac OS X, and Windows. Except Windows is a little different...
>>> import sys>>> sys.path.insert(0, "E:\\home\\conrad\\swampy")
This is because Windows uses a different way to specify directories. Also, on Windows you must use two backslashes instead of one everywhere.
On Linux and Mac OS X there’s an alternative way to accomplish the same thing using the PYTHONPATH environment variable…
pythagoras> export PYTHONPATH="/Users/tef/Desktop/pc204/CaseStudy1/swampy”
…and if you want to avoid typing this in every time you open a Terminal window, you can just add this command to your .bashrc file. If you have multiple directories you want to add, put them in a colon separated list…
pythagoras> export PYTHONPATH="/Users/tef/Desktop/pc204/CaseStudy1/swampy:/Users/tef/MyModules”
5
Quick Review from previous classes…
Python programs can be decomposed into modules, statements, and objects:
Programs are composed of modules;
Modules contain statements;
Statements create and process objects.
“Objects” are also known as “data structures” in some programming languages. They’re called objects in Python to distinguish them because the low-level data structure manipulation functions often needed with many programming languages aren’t needed in Python.
Python has several built-in object types. These are…
Object Type Examples
Numbers 3.1416, 42, 123456789L
Strings ‘pc204’, “Tom’s Story”
Files text = open(‘eggs’, ‘r’).read()
Lists [1, [2, ‘three’], 4]
Dictionaries {‘food’: ‘spam’, ‘taste’: ‘yum’}
Tuples (1, ‘spam’, 4, ‘U’)
6
Numbers can be any of several types…
Constant Interpretation
1234, -24, 0 Normal integers
999999999L Long integer (see below)
1.23, 3.14e-10, 0.0 Floating point number
0177, 0x9ff Octal and hexadecimal constant
3+4j, 3.0+4.0j, 3J Complex number constants
Integers can be in the range of -2,147,483,648 to 2,147,483,647
(i.e. roughly +/- 2.15 billion)
Floating point numbers can range from +/- 4.9e-324 to
+/- 1.8e+308 and have approximately 16 digits of precision
Long integers have unlimited precision (i.e. they can have has many digits as your memory space allows)
More about floating point numbers…
Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has the value 1/10 + 2/100 + 5/1000.
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.
Consider the fraction 1/3. You can approximate that as a base 10 fraction as 0.3333333333333. But no matter how many digits you specify, the result is never exactly 1/3.
In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.00011001100110011001100110011 00110011001100110011...
It’s easy to forget that the stored value is an approximation to the original decimal fraction, because of the way that floats are displayed at the interpreter prompt. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine.
7
If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display…
0.1000000000000000055511151231257827021181583404541015625
Since this is more digits than most people find useful, Python keeps the number of digits manageable by displaying a rounded value instead, so that 0.1 prints as 0.1
It’s important to realize that this is, in a real sense, an illusion: the value in the machine is not exactly 1/10, you’re simply rounding the display of the true machine value. This fact becomes apparent as soon as you try to do arithmetic with these values…
>>> 0.1 + 0.2
0.30000000000000004
Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all computer languages that support floating-point arithmetic.
Binary floating-point arithmetic holds several surprises like this. You should read The Perils of Floating Point by Bruce Bush for a more complete account of other common surprises: http://www.lahey.com/float.htm
8
9
Python Lists
Lists are ordered collections of arbitrary objects that can be accessed by offsets (just like strings), can vary in length, and can contain other lists (i.e. are nestable). Unlike strings, lists are mutable sequences because they can be modified in place, which means they support operations like deletion, index assignment, and methods. Lists contain, technically, zero or more references to other objects.
Common List Expressions and Methods:
Operation Interpretation
L1 = [] Creates an empty list
L2 = [0, 1, 2, 3] A four element (or item) list
L3 = [‘one’, ‘two’, [1, 2]] Nested sublists
L2[2] Third item in a list
L3[2][0] First sublist item in the third list
item
L2[i:j] Slice (just like in strings)
len(L3) Length (just like in strings)
L1 + L2 Concatenation
L1 * 4 Repetition
for x in L2: Iteration
‘two’ in L3 Membership test
10
Common List Expressions and Methods (continued):
Operation Interpretation
L2.append(4) Grow list at end by 1 item (a ‘4’)
L2.extend([1, 2, 3]) Grow list at end by multiple items
L2.sort() Sort the list
L2.index(n) Find index of ‘n’ in list
L2.reverse() Reverse the items in the list
del L2[k] Remove kth item from the list
L2[i:j] = [] Remove ith through (j-1)th items
L2[2] = 42 Replace 3rd item in the list
(index assignment)
L2[1:3] = [0, 0] Replace 2nd & 3rd list items
with zeros (slice assignment)
Just like with strings, items in a list are fetched by indexing; i.e., providing the numeric offset of the desired item in the list. In other words, indexing a list begins at 0 and ends at one less than the length of the list. You can also fetch items from a list using negative offsets, which just count backwards from the end of the list.
11
Here are some simple examples:
>>> L1=[1, 2, 3, 4]>>> L2=[5, 6]>>> len(L1)4>>> L1[1:3] # indexing begins at 0![2, 3]>>> L1[:3] # missing first index means from the beginning[1, 2, 3]>>> L1[2:] # missing second index means through end of the list[3, 4]>>> L1[0:-1] # -1 means one less item than the last item[1, 2, 3]>>> L1 + L2[1, 2, 3, 4, 5, 6]>>> L1.append(L2)>>> print L1[1, 2, 3, 4, [5, 6]]
Indexing and Slicing of lists are very common operations, so just remember...
Indexing (L[i]):•Fetch items at offsets (the first item is at offset zero)•Negative indexes mean to count from the end•L[0] fetches the first item•L[-2] fetches the second item from the end
Slicing(L[i:j]):•Extracts contiguous section of items from a list•Slice boundaries default to zero and the list length•L[1:3] fetches from offsets 1 up to but not including 3•L[1:] fetches from offset 1 through the end•L[:-1] fetches from offset 0 up to but not including the last item
12
NOTE TO SELF: Come up with a better example!
Example usage in a function:
def extract(s):
"""extract space-separated words from string 's' and return each
word in a list"""
result = [] # begin with empty list
while s:
k = 0 # k will be the end of a slice
for c in s: # inspect chars in string one at a time
if c == ' ':
break # exit the loop if a space
k = k + 1 # increment slice limit by 1
result.append(s[0:k]) # add a new word to list
s = s[k+1:] # save remaining string
return result # done - return the list to the caller
print extract('Now is the time for all good people')
[’Now', 'is', 'the', 'time', 'for', 'all', 'good', 'people']
print extract('Now') # how about with just one word in s?
[‘Now’]
It's always good to test your code, especially at the "boundaries" of where it's designed to work. For example, what if we call extract() with a argument that doesn't have any words in it?
print extract('') # test for correct result with null str
[]
That seems right. But what about a string that only has spaces and no words?
print extract(' ') # a string with three spaces
['', '', '']
Hmm. Is a list of three items the correct result?
13
At least for strings, most would argue that it doesn't matter how many spaces there are between words. So, it's one or more spaces that separate words in a string. Similarly, if there aren't any words at all in the string, even if there are some spaces, then the right answer for our function to return should be an empty list. So we need to fix our code and re-test it...
def extract(s):
"""extract space-separated words from string 's' and return each
word in a list"""
result = [] # begin with empty list
while s:
k = 0 # k will be the end of a slice
for c in s: # inspect chars in string one at a time
if c == ' ':
break # exit the loop if a space
k = k + 1 # increment slice limit by 1
if k != 0: # only add a new word if one was found
result.append(s[0:k]) # there was, so add it
s = s[k+1:] # save remaining string
return result # done - return the list to the caller
print extract(' ') # a string with only three spaces[]
print extract(' Now is the time ') # spaces here and there['Now', 'is', 'the', 'time']
The point here is that programmers have to decide how they want their code to behave and then test to make sure it does what is desired. "Boundary cases" such as this come up all the time in programming and you must get in the habit of testing your code for these cases. For example, does the code do the right thing if a loop never gets executed at all? How about if it's only executed once? How about if it's executed multiple times? You should also get in the habit of documenting the correct behavior – especially with unusual cases – with comments so that if you or someone else later looks at your code, it's clear that you've considered the boundary cases.
14
Let's build on our previous example by adding another function…
def instring(w, s): # is word w contained in s?
words = extract(s) # 'words' is now a list of words
for x in words: # consider each, one at a time
if x == w: # test for a match
return True
return False
str = " Now is the time for all good people "
print instring("time", str)
True
print instring("pc204", str)
False
print instring("", str)
False
print instring("", "")
False
The last two cases are again testing boundary conditions. Is an empty string contained in str? For our example here (where our code is designed to operate on words), since an empty string is not a word then it can never be contained in str and our function is therefore working correctly. The point is, you're the programmer and you need to decide, test, and document what the correct behavior of your code should be.
There are some shortcuts for lists:
def instring(w, s): # is word w contained in s?
words = extract(s) # 'words' is now a list of words
#for x in words: # consider each, one at a time
# if x == w: # test for a match
# return True
#return False
return w in words # Use membership testing
There are Python built-in functions for handling lists:
Function Return Value
all(L1) True if all elements in L1 are True
any(L1) True if any element in L1 is True
min(L1) Smallest element in L1
max(L1) Largest element in L1
reversed(L1) Elements of L1 in reverse order
sorted(L1) Elements of L1 in sorted order
sum(L1) Sum of elements in L1
sum(L1, start) Sum of elements in L1 + start
15
Another built-in function that is very useful:
# Loop through array using indices
for index in range(len(L1)):value = L1[index]
print index, value
# Here’s the short version
for index, value in enumerate(L1):print index, value
Finally, list comprehension
# Apply an operation to each element and keep
# the results as a list
L2 = []
for value in L1:
L2.append(value + 10)
# Using list comprehension:
L2 = [ value + 10 for value in L1 ]
# You can even skip elements:L3 = [ value + 10 for value in L1 if value < 100 ]
You don’t need to use list comprehension, but you should recognize it when you see it. Sometimes, people get too clever with list comprehension and the code ends up being very difficult to read. Best use is for simple mapping from one set of values to another.
16
17
Python Dictionaries
Dictionaries are unordered collections of arbitrary values that can be accessed via an associated “key”. Dictionaries are of the category mutable mapping, which means they can be modified in place (like lists), but don’t support sequence operations (like strings and lists). An item is a (key, value) pair.
Common Dictionary Constants and Operations:
Operation Interpretation
D1 = {} Creates and empty dictionary
D2 = {‘tom’:1, ‘conrad’:2} Two item dictionary
D3 = {‘tom’:1, ‘conrad’:{‘greg’:3, ‘eric’:4}} Nesting
D2[‘conrad’] Retrieval of value by key
D3[‘conrad’][‘eric’] Nested retrieval
D2.has_key(‘tom’) Membership test
D2.keys() List of all keys in the dictionary
D2.values() List of all values in the dictionary
D2.items() List of (key, value) pairs
D2.get(k, v) Value with key "k" if k in D2,
otherwise v (v defaults to None)
D2.setdefault(k, v) Like D2.get(), but also adds item
to dictionary D2
len(D2) Number of keys in the dict.
D2[key] = value Add or change an item
del D2[key] Delete an item
18
Dictionary Examples:d1 = {‘tom’:1, ‘conrad’:2, ‘greg’:3, ‘eric’:4}
print d1[‘greg’] # given a key, fetch associated value
3
print len(d1) # return number of items in the dict.
4
print d1.has_key(‘eric’) # test for the presence of a key
True
print d1.keys() # return list of all keys
[‘tom’, ‘conrad’, ‘greg’, ‘eric’]
d1[‘tom’] = 42 # assign a new value for key ‘tom’
d1[‘conrad’] = [3, 4, 5] # items can be arbitrary objects
print d1
{'tom': 42, 'eric': 4, 'greg': 3, 'conrad': [3, 4, 5]}
del d1[‘greg’] # delete an entry
print d1
{'tom': 42, 'eric': 4, 'conrad': [3, 4, 5]}
d1[‘al’] = ‘good man’ # assigning to a new index
adds a new entry
print d1
{'tom': 42, 'eric': 4, 'al': 'good man', 'conrad': [3, 4, 5]}
d1[101] = ‘test’ # keys can be any immutable object
print d1
{'tom': 42, 'eric': 4, 101: 'test', 'al': 'good man', 'conrad': [3, 4, 5]}
19
Sets
Sets are unordered non-redundant collections of data, just like dictionaries, but unlike dictionaries they only have keys – no values. They provide for rapid lookups and are ideal for testing for membership.
Set examples:
s1 = set ([1, 2, 3, 4]) # this is a set
s1.add(5) # adds a new element to the set
s1.add(5) # does nothing, since "5" is already a
member of the set
s1.remove(4) # remove an element
s1.discard(4) # like remove but won't cause an exception
2 in s1 # membership test (much faster than a list)
s2 = set([1, 3])
s1 | s2 # returns union of s1 and s2
s1 & s2 # returns intersection of s1 and s2
s1 – s2 # returns difference of s1 and s2
20
Tuples
Tuples are just like Python lists, but they are immutable. All the same operations that worked on lists work on tuples except tuples don’t provide the methods that lists do (e.g. append(), sort(), reverse()). Also concatenation, repetition, and slicing applied to tuples return the results in new tuples. The immutability of tuples provides object integrity; you can be sure that a tuple can’t be changed inadvertently somewhere else in your program.
Tuple examples:
t1 = (1, 2, 3, 4) # this is a tuple
t2 = 5, 6, 7, 8 # this too (syntactically unambiguous)
t3 = (9,) # a one-item tuple (comma required to
# avoid ambiguity with expressions)
t4 = (9) # an expression that evaluates to 9
t5 = (‘abc’, (1, 2, 3), ‘def’) # nested tuples
A common use of tuples is in function return values:
def trivial():
x = 2.71828
y = 3.14159
return (x, y)
m, n = trivial()
print m, n
2.71828 3.14159
21
Here's a more real-world example (available as jumble.py in the lectureNotes directory):
## Jumble example:# This is an example of using strings, lists and dictionaries# in implementing a simple puzzle solver.## Jumble is a game found in the San Francisco Chronicle.# The player is given letters from a word in random order# and the goal is to find the original word.## There are at least a couple different ways of solving# this problem. For example, we can compute all permutations# of the given letters and check whether each one is a# real word. Another is to start with a list of real# words and check which ones have the same letter composition# (fingerprint) as the given set of letters.## We are implementing the latter solution. In our program# words are Python strings. Fingerprints are Python dictionaries# whose keys are letters and whose values are the number# of occurrences of those letters (in short, a histogram).# The list of real words is kept in a Python dictionary# whose keys are the lengths of words and whose values are# Python lists of words of those lengths. (The choice to# use word length as the key is for easy elimination of# words that cannot possibly match the given input.)## An interesting question is "which approach will be faster?"# Consider five-letter words (the statistics are similar for# six-letter words). There are about 3,000 to 4,000 "real"# words, depending on your dictionary. Is it faster to compute# the 120 (=5*4*3*2*1) permutations of letters and check them# against a Python dictionary; or is it faster to compute# the fingerprints for all the "real" words and compare them# against the fingerprint of the given letters?#
22
## Main program#def main():
"""Main program"""# Load a dictionary of words then prompt users# for letters that they want to "unjumble"load('words.txt')while 1:
line = raw_input('Letters: ')characters = line.strip()if not characters:
break # blank line means quit program
print jumble(characters)
def jumble(w):"""Find a word in the dictionary with the same letters as
'w'"""matches = []n = length(w)fp = fingerprint(w)for wol in wordsOfLength(n):
if fingerprint(wol) == fp:matches.append(wol)
return matches
23
## Functions for handling a dictionary of words#words = {}
def wordsOfLength(L):"""Return a list of words of requested length ‘L'"""return words.get(L, [])
def load(filename):"""Load dictionary of words from a file"""# Only keep words of length 5 and 6 because the# SF Chronicle jumble only uses words of those lengthsf = open(filename)while 1:
line = f.readline()if not line:
breakw = line.strip()n = length(w)if n != 5 and n != 6:
continuetry:
words[n].append(w)except KeyError:
words[n] = [w]f.close()
24
## Functions for handling words#def length(word):
"""Return number of alphabetic characters in 'word'"""count = 0for c in word:
if c.isalpha():count = count + 1
return count
def fingerprint(word):"""Return fingerprint (character histogram) of 'word'"""fp = {}for c in word:
if not c.isalpha(): # ignore non-alphabeticcontinue
c = c.lower() # convert all letters to LC
try:fp[c] = fp[c] + 1
except KeyError:fp[c] = 1
return fp
## If invoked directly, run the program## Some words to try: timer, retina, silly, server, python#if __name__ == ‘__main__’:
main()
25
Variables, objects, and values:
Variables are just references to objects. Objects have types and categories and may be mutable, but names don’t have these properties. Thus, all the following are true…>>> x = 42 # x bound to integer
>>> x = ‘pc204’ # x is now bound to string
>>> x = [1, 2, 3] # x is now bound to a list
>>> y = [‘a’, x, ‘c’] # embed a reference to a list>>> print y
[‘a’, [1, 2, 3], ‘c’]>>> x[1] = ‘b’ # this changes the object y references
>>> print y
[‘a’, [1, ‘b’, 3], ‘c’]
A reference assigned to another reference is still a reference...>>> x = [1, 2, 3]
>>> z = x # both x and z reference the same object
>>> y = [‘a’, z, ‘c’]>>> print y
[‘a’, [1, 2, 3], ‘c’]>>> x[1] = ‘b’ # this still changes the object y references
>>> print y
[‘a’, [1, ‘b’, 3], ‘c’]
26
If you don’t want x and y to share the same object, you need to create an explicit copy of the object…
>>> x = [1, 2, 3]
>>> # the ‘empty list’ slice below creates a copy
>>> y = ['a', x[:], 'c']
>>> print y
['a', [1, 2, 3], 'c']
>>> x[1] = 'b' # so now this doesn’t change y
>>> print y
['a', [1, 2, 3], 'c']
>>> print x
[1, 'b', 3]
Why is this example different? (Hint: what’s the object?)
>>> x = 5
>>> y = [‘a’, x, ‘c’]>>> print y
[‘a’, 5, ‘c’] >>> x = 10
>>> print y
[‘a’, 5, ‘c’]
Answer: The object is a number. Numbers are immutable numerics and they can’t be changed (the number 5 always equals 5). There’s no analogy to a list assignment as in the earlier examples.
27
A Quick Review of what we've covered in the first four weeks…
Objects in Python:
Numbers immutable numeric
Strings immutable sequence of characters
Lists mutable sequence of objects
Dictionaries mutable mapping of objects
Tuples immutable sequence of objects
Files mutable sequence of characters used for long-term storage. Files only have methods.
Functions immutable sequence of Python statements
Python Operators:
x or y Logical ‘or’ (y evaluated only if x is false)
x and y Logical ‘and’ (y evaluated only if x is true)
not x Logical negation
<, <=, >, >=, = =, <>, !=,
is, is not, in, not in Comparison operators, identity tests, sequence membership
x | y Bitwise or
x ^ y Bitwise exclusive or
x & y Bitwise and
x<<y, x>>y Shift x left or right by y bits
x + y, x - y Addition/concatenation, subtraction
x * y, x / y, x % y Multiplication/repetition,
division, remainder/format
-x, +x, ~xUnary negation, identity, bitwise
complement
x[i], x[i:j], x.y, x(…) Indexing, slicing, qualification,
function calls
(…), […], {…}, `…` Tuple, list, dictionary, conversion
to string
28
Python compares objects as follows:
Number are compared by relative magnitude;
Strings are compared lexicography;
List and tuples are compared by comparing each component;
Dictionaries are compared as though comparing sorted (key,value) lists.
Any empty object (a string, list, dictionary, tuple, or the ‘None’ special object) is false, while nonempty objects are true.
Python keywords:
and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lambda try
(Words in boldface on this and previous page have already been discussed in class. The others are still coming in future lectures.)