Upload
darlene-morrison
View
257
Download
0
Embed Size (px)
Citation preview
CS3101 PythonLecture 3
Agenda• Scoping
• Documentation, Coding Practices, Pydoc
• Functions
• Named, optional, and arbitrary arguments
• Generators and Iterators
• Functional programming tools
• lambda, map, filter, reduce
• Regular expressions
• Homework 3
Extra credit solution to HW1
• Dynamic programming: 15+- lines– determining whether a bill of N dollars is
satisfiable resolves to whether you can satisfy a bill of N – J dollars where J is an item on your menu
– Create an empty list (knapsack) with N+1 entries– Base case: we know we can satisfy a bill of 0
dollars– For each item on your menu– For index = 0 to N + 1
• If knapsack[index] is empty, and knapsack[index – item’s cost] is not:
• We now know how to satisfy this bill, so append the current item to a solution list which lives at knapsack[index]
Homework 3, Exercise 1
• Requirements:– 1. Write a program using
regular expressions retrieve the current weather from a website of your choosing. Just the temperature is OK.
– 2. Use that information to suggest a sport to play.
– ./sport.py – It’s 36 degrees today.
You should ski!
http://www.nytimes.com/weather
Homework 3, Exercise 2
• Requirements:– a) Write a program
which uses regular expressions and URLLIB to print the address of each image on Columbia’s homepage (www.columbia.edu)
– b) Use regular expressions to print the title of each of the news stories on the main page
– ./news.py– ./images.py
Scoping• Local Namespaces / Local scope
• A functions parameters and variables that are bound within the function
• Module scope
• Variables outside functions at the module level are global
• Hiding
• Inner scope wins: if a name conflict occurs between a local variable and a global one, the local one takes precedence
Global statement• Local scope wins by default
• If within a function you must refer to a global variable of the same name, redeclare it first with the global keyword
• ‘global identifiers’, where identifiers contains one or more IDs separated by commas
• Never use global if your function just accesses a global variable, only if it rebinds it
• Global in general is poor style, it breaks encapsulation, but you’ll see it out there
Closures and Nested Scope
•Using a def statement with another functions body defines a nester or inner function
•The parent function is referred to as a the outer
•Nested functions may access outer functions parameters - but not rebind them
•This trick can be used to form closures as we’ll see in lambdas
Closures• This example adopted from Python in a
Nutshell
• def make_adder(augend):
• def add(addend):
• return addend+augent
• return add
•Calling make_adder(7) returns a function that accepts a single argument and adds seven to it
Namespace resolution
• Name resolutions proceeds as follows
• Local scope (i.e., this function)
• Outer scope (i.e., enclosing functions)
• Module level scope (i.e., global variables)
• Built in scope (i.e., predefined python keywords)
• A word to the wise - do not name your variables when there is a danger of conflicting with modules your may import
• E.g., ‘open = 5’ is dangerous if you’re using file objects, later use of the open method might not resolve where you expect!
Documentation and Pydoc
• String literal beginning method, class, or module:
• One sentence concise summary, followed by a blank, followed by detail.
• References
• http://www.python.org/dev/peps/pep-0257/
def complex(real=0.0, imag=0.0): """Form a complex number. Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0:
return complex_zero ...
Code is read MANY more times than it is
written• Trust me, it’s worth it
• First line should be a concise and descriptive statement of purpose
• Self documentation is good, but do not repeat the method name! (e.g., def setToolTip(text) #sets the tool tip)
• Next paragraph should describe the method and any side effects
• Then arguments
Python’s thoughts on documentation
•A Foolish Consistency is the Hobgoblin of Little Minds
•http://www.python.org/dev/peps/pep-0008/
Functions, returning multiple values
•Functions can return multiple values (of arbitrary type), just separate them by commas
•Always reminded me of MATLAB
•def foo():
• return [1,2,3], 4, (5,6)
•myList, myInt, myTuple = foo()
A word on mutable arguments
•Be cautious when passing mutable data structures (lists, dictionaries) to methods - especially if they’re sourced from modules that are not your own
•When in doubt, either copy or cast them as tuples
Semantics of argument passing
• Recall that while functions can not rebind arguments, they can alter mutable types
• Positional arguments
• Named arguments
• Special forms *(sequence) and ** (dictionary)
• Sequence:
• zero or more positional followed by
• zero or more named
• zero or 1 *
• zero or 1 **
Positional arguments
• def myFunction(arg1, arg2, arg3, arg4, arg5, arg6):
• .....
• Potential for typos
• Readability declines
• Maintenance a headache
• Frequent headache in Java / C (I’m sure we can all recall some monster functions written by colleagues / fellow students)
• We can do better
Named arguments• Syntax identifier = expression
• Named arguments specified in the function declaration optional arguments, the expression is their default value if not provided by the calling function
• Two forms
• 1) you may name arguments passed to functions even if they are listed positionally
• 2) you may name arguments within a functions declaration to supply default values
• Outstanding for self documentation!
Named argument example
•def add(a, b):
• return a + b
•Equivilent calls:
•print add(4,2)
•print add(a=4, b=2)
•print add(b=2, a=4)
Default argument example
•def add(a=4, b=2):
• return a+b
•print add(b=4)
•print add(a=2, b=4)
•print add(4, b=2)
Sequence arguments
• Sequence treats additional arguments as iterable positional arguments
• def sum(*args):
• #equivilent to return sum(args)
• sum = 0
• for arg in args:
• sum += arg
• return sum
• Valid calls:
• sum(4,2,1,3)
• sum(1)
• sum(1,23,4,423,234)
• **dct must be a dictionary whose keys are all strings, values of course are arbitrary
•each items key is the parameter name, the value is the argument
Sequences of named arguments
# **# collects keyword # arguments into a
dictionary
def foo(**args): print args
foo(homer=‘donut’,\ lisa = ‘tofu’){'homer': 'donut', 'lisa':
'tofu'}
Optional arguments are everywhere
# three ways to call the range function
# up torange(5)[0, 1, 2, 3, 4]
# from, up torange(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
# from, up to , steprange(-5, 5, 2)[-5, -3, -1, 1, 3]
Arbitrary arguments example
We can envision a max function pretty easily
# idea: max2(1,5,3,1)# >>> 5# idea: max2(‘a’, ‘b’, ‘c’, ‘d’, ‘e’)# >>> e
def max2(*args):for arg in args…
Arbitrary arguments example
def max1(*args):best = args[0]for arg in args[1:]:
if arg > best:best = arg
return best
def max2(*args):return sorted(args)[0]
Argument matching rules
• General rule: more complicated to the right• For both calling and definitional code:• All positional arguments must appear
first– Followed by all keyword arguments• Followed by the * form
– And finally **
Functions as arguments
• Of course we can pass functions are arguments as well
def myCompare(x, y): –…
sorted([5, 3, 1, 9], cmp=myCompare)
LambdasLambdas
• The closer you can get to mathematics the more elegant your programs become• In addition to the def statement,
Python provides an expression which in-lines a function – similar to LISP• Instead of assigning a name to a
function, lambda just returns the function itself – anonymous functions
When should you use Lambda
When should you use Lambda
• Lambda is designed for handling simple functions– Conciseness: lambdas can live places def’s
cannot (inside a list literal, or a function call itself)
– Elegance• Limitations– Not as general as a def – limited to a single
expression, there is only so much you can squeeze in without using blocks of code
• Use def for larger tasks• Do not sacrifice readability– More important that your work is a) correct and
b) efficient w.r.t. to people hours
Quick examples
• Arguments work just like functions – including defaults, *, and **
• The lambda expression returns a function, so you can assign a name to it if you wish
foo = (lambda a, b=“simpson”: a + “ “ + b)foo(“lisa”)lisa simpsonfoo(“bart”)bart simpson
More examples# Embedding lambdas in a listmyList = [(lambda x: x**2), (lambda x: x**3)]for func in myList:
print func(2)
48
# Embedding lambdas in a dictionarydonuts = {'homer' : (lambda x: x * 4), 'lisa' : (lambda x: x *
0)}Donuts[‘homer’](2)8Donuts[‘lisa’](2)0
Multiple arguments
(lambda x, y: x + " likes " + y)('homer', 'donuts')
'homer likes donuts‘
State
def remember(x):return (lambda y: x + y)
foo = remember(5)
print foo
<function <lambda> at 0x01514970>
foo(2)
7
MapsMaps• One of the most
common tasks with lists is to apply an operation to each element in the sequence
# w/o mapsdonuts = [1,2,3,4]myDonuts = []for d in donuts:myDonuts.append(d * 2)
print myDonuts[2, 4, 6, 8]
# w mapsdef more(d): return d * 2myDonuts = map(more, donuts)print myDonuts[2, 4, 6, 8]
Map using Lambdas
def more(d): return d * 3myDonuts = map(more, donuts)print myDonuts[3, 6, 9, 12]
myDonuts = map((lambda d: d * 3), donuts)print myDonuts[3, 6, 9, 12]
donuts = [1,2,3,4]
More mapsMore maps# map is smart
# understands functions requiring multiple arguments
# operates over sequences in parallelpow(2, 3)
8
map(pow, [2, 4, 6], [1, 2, 3])
[2, 16, 216]
map((lambda x,y: x + " likes " + y),\
['homer', 'bart'], ['donuts', 'sugar'])
['homer likes donuts', 'bart likes sugar‘]
Functional programming tools:Filter and reduce
Functional programming tools:Filter and reduce
• Theme of functional programming– apply functions to sequences
• Relatives of map: – filter and reduce
• Filter: – filters out items relative to a test function
• Reduce: –Applies functions to pairs of items and
running results
FilterFilter
range(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
def isEven(x): return x % 2 == 0filter ((isEven, range(-5,5))[-4, -2, 0, 2, 4]
filter((lambda x: x % 2 == 0), range(-5, 5))[-4, -2, 0, 2, 4]
ReduceReduce• A bit more
complicated• By default the first
argument is used to initialize the tally
• def reduce(fn, seq):– tally = seq[0]– For next in seq:
• tally = fn(tally, next)– return tally
• FYI More functional tools available
reduce((lambda x, y: x + y), \ [1,2,3,4])
10
import operatorreduce(operator.add, [1, 2,
3])6
List comprehensions revisited: combining filter and map
List comprehensions revisited: combining filter and map
# Say we wanted to collect the squares of the even numbers below 11
# Using a list comprehension[x ** 2 for x in range(11) if x % 2 == 0][0, 4, 16, 36, 64, 100]
#Using map and filtermap((lambda x: x ** 2), filter((lambda x: x % 2 == 0),
range(11)))[0, 4, 16, 36, 64, 100]
# Easier way, this uses the optional stepping argument in range[x ** 2 for x in range(0,11,2)][0, 4, 16, 36, 64, 100]
Reading files with list comprehensions
# old waylines = open(‘simpsons.csv’).readlines()[‘homer,donut\n’, ‘lisa,brocolli\n’]for line in lines:
– line = line.strip()…
# with a comprehension[line.strip() for line in open(‘simpsons.csv’).readlines()][‘homer,donut’, ‘lisa,brocolli’]
# with a lambdamap((lambda line: \line.strip(), open(‘simpsons.csv’).readlines())
Generators and IteratorsGenerators and Iterators
• Generators are like normal functions in most respects but they automatically implement the iteration protocol to return a sequence of values over time
• Consider a generator when– you need to compute a series of values lazily
• Generators– Save you the work of saving state– Automatically save theirs when yield is called
• Easy– Just use “yield” instead of “return”
Quick example
def genSquares(n):for i in range(N):
yield i ** 2
print gen <generator object at 0x01524BE8>
for i in genSquares(5):print i, “then”,
0 then 1 then 4 then 9 then 16 then
Error handling previewError handling previewdef gen():
i = 0while i < 5:
i+=1yield i ** 2
x = gen()x.next()>> > 1x.next()>>> 4…Traceback (most recent call last): File "<pyshell#110>", line 1, in
<module> x.next()StopIteration
try:x.next()
except StopIteration:print "done”
5 Minute Exercise
• Begin writing a generator produce primes
• Start with 0• When you find a
prime, yield (return) that value
• Write code to call your generator
def genPrimes():…. yield prime
def main()g = genPrimes()while True:
print g.next()
Regular ExpressionsRegular Expressions
• A regular expression (re) is a string that represents a pattern.
• Idea is to check any string with the pattern to see if it matches, and if so – where
• REs may be compiled or used on the fly• You may use REs to match, search,
substitute, or split strings• Very powerful – a bit of a bear syntactically
Quick examples: Match vs. Search
import rep = re.compile('[a-z]+')m = pattern.match('donut')print m.group(), m.start(),
m.end()donut 0 5
m = pattern.search('12 donuts are
\ better than 1')print m.group(), m.span()donuts (3, 9)
m = pattern.match(‘ \ 12 donuts are better \ than 1')if m: print m.group()else: print "no match“no match
Quick examples: Multiple hitsimport rep = re.compile('\d+\sdonuts')print p.findall('homer has 4 donuts, bart has 2 donuts') ['4 donuts', '2 donuts']
import rep = re.compile('\d+\sdonuts')iterator = p.finditer('99 donuts on the shelf, 98 donuts on the
shelf...')for match in iterator: print match.group(), match.span()
99 donuts (0, 9)98 donuts (24, 33)
Re Patterns 1Pattern Matches
. Matches any character
^ Matches the start of the string
$ Matches the end of the string
* Matches zero or more cases of the previous RE (greedy – match as many as possible)
+ Matches one or more cases of the previous RE (greedy)
? Matches zero or one case of the previous RE
*?, +? Non greedy versions (match as few as possible)
. Matches any character
Re Patterns 2Pattern Matches
\d, \D Matches one digit [0-9] or non-digit [^0-9]
\s, \S Matches whitespace [\t\n\r\f\v] or non-whitespace
\w, \W Matches one alphanumeric char – (understands Unicode and various locales if set)
\b, \B Matches an empty string, but only at the start or end of a word
\Z Matches an empty string at the end of a whole string
\\ Matches on backslash
{m,n} Matches m to n cases of the previous RE
[…] Matches any one of a set of characters
| Matches either the preceding or following expression
(…) Matches the RE within the parenthesis
Gotchas
• RE punctuation is backwards– “.” matches any character when
unescaped, or an actual “.” when in the form “\.”
– “+” and “*” carry regular expression meaning unless escaped
Quick examples * vs. +. \b
.* vs .+• The pattern– ‘Homer.*Simpson’ will
match:• HomerSimpson• Homer Simpson• Homer Jay Simpson
• The pattern– ‘Homer.+Simpson’ will
match:• Homer Simpson• Homer Jay Simpson
\b• The pattern– r’\bHomer\b’ will find a hit
searching– Homer– Homer Simpson
• The pattern– r’\bHomer’ will find a hit
searching– HomerJaySimpson
Sets of chars: []
• Sets of characters are denoted by listing the characters within brackets
• [abc] will match one of a, b, or c
• Ranges are supported• [0-9] will match one digit• You may include special
sets within brackets– Such as \s for a
whitespace character or \d for a digit
p = re.compile('[HJ]')iterator=p.finditer(“\HomerJaySimpson")for match in iterator: print match.group(), \ match.span()
H (0, 1)J (5, 6)
Alternatives: |s
• A vertical bar matches a pattern on either side
import rep = re.compile(‘Homer|Simpson')iterator=p.finditer(“HomerJaySimpson")for match in iterator:
print match.group(), match.span()Homer (0, 5) aco (8, 12)
RE SubstitutionRE Substitution
import re
line = 'Hello World!'
r = re.compile('world', re.IGNORECASE)
m = r.search(line)
>>> World
print r.sub('Mars!!', line, 1)
>>> Hello Mars!!!
import re
line = 'Hello World!'
r = re.compile('world', re.IGNORECASE)
m = r.search(line)
>>> World
print r.sub('Mars!!', line, 1)
>>> Hello Mars!!!
RE SplittingRE Splitting
import re
line = 'lots 42 of random 12 digits 77'
r = re.compile('\d+')
l = r.split(line)
print l
>>> ['lots ', ' of random ', ' digits ', '']
Groups: ()• Frequently you need to obtain more information than
just whether the RE matched or not. • Regular expressions are often used to dissect strings by
writing a RE divided into several subgroups which match different components of interest.
p = re.compile('(homer\s(jay))\ssimpson')m = p.match('homer jay simpson')print m.group(0)print m.group(1)print m.group(2)
homer jay simpsonhomer jayjay
Putting it all together (and optional flags)Putting it all together (and optional flags)
import rer = re.compile('simpson', re.IGNORECASE)print r.search("HomerJaySimpson").group()simpson
r = re.compile('([[A-Z][a-z]+).*?(\d+$)', re.MULTILINE)
iterator = r.finditer('Homer is 42\nMaggie is 6\nBart is 12')for match in iterator: print match.group(1), "was born", match.group(2), "years
ago“
Homer was born 42 years agoMaggie was born 6 years agoBart was born 12 years agoDiscussion: Who can explain how this RE works?
Finding tags within HTML
import re
line = '<tag>my eyes! the goggles do \
nothing.</tag>'
r = re.compile('<tag>(.*)</tag>')
m = r.search(line)
print m.group(1)
>>> my eyes! the goggles do nothing.
import re
line = '<tag>my eyes! the goggles do \
nothing.</tag>'
r = re.compile('<tag>(.*)</tag>')
m = r.search(line)
print m.group(1)
>>> my eyes! the goggles do nothing.
5 Minute Exercise
• Download the Columbia homepage to disk
• Open it with python• Use regular
expressions to being extracting the news
import reline = '<tag>my eyes!
the goggles do \ nothing.</tag>'r =
re.compile('<tag>(.*)</tag>')
m = r.search(line)print m.group(1)>>> my eyes! the
goggles do nothing.
import reline = '<tag>my eyes!
the goggles do \ nothing.</tag>'r =
re.compile('<tag>(.*)</tag>')
m = r.search(line)print m.group(1)>>> my eyes! the
goggles do nothing.