CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named,...

Preview:

Citation preview

CS3101 PythonLecture 3

Agenda• Scoping

• Documentation, Coding Practices, Pydoc

• Functions

• Named, optional, and arbitrary arguments

• Generators and Iterators

• Functional programming tools

• lambda, map, filter, reduce

• Regular expressions

• Homework 3

Extra credit solution to HW1

• Dynamic programming: 15+- lines– determining whether a bill of N dollars is

satisfiable resolves to whether you can satisfy a bill of N – J dollars where J is an item on your menu

– Create an empty list (knapsack) with N+1 entries– Base case: we know we can satisfy a bill of 0

dollars– For each item on your menu– For index = 0 to N + 1

• If knapsack[index] is empty, and knapsack[index – item’s cost] is not:

• We now know how to satisfy this bill, so append the current item to a solution list which lives at knapsack[index]

Homework 3, Exercise 1

• Requirements:– 1. Write a program using

regular expressions retrieve the current weather from a website of your choosing. Just the temperature is OK.

– 2. Use that information to suggest a sport to play.

– ./sport.py – It’s 36 degrees today.

You should ski!

http://www.nytimes.com/weather

Homework 3, Exercise 2

• Requirements:– a) Write a program

which uses regular expressions and URLLIB to print the address of each image on Columbia’s homepage (www.columbia.edu)

– b) Use regular expressions to print the title of each of the news stories on the main page

– ./news.py– ./images.py

Scoping• Local Namespaces / Local scope

• A functions parameters and variables that are bound within the function

• Module scope

• Variables outside functions at the module level are global

• Hiding

• Inner scope wins: if a name conflict occurs between a local variable and a global one, the local one takes precedence

Global statement• Local scope wins by default

• If within a function you must refer to a global variable of the same name, redeclare it first with the global keyword

• ‘global identifiers’, where identifiers contains one or more IDs separated by commas

• Never use global if your function just accesses a global variable, only if it rebinds it

• Global in general is poor style, it breaks encapsulation, but you’ll see it out there

Closures and Nested Scope

•Using a def statement with another functions body defines a nester or inner function

•The parent function is referred to as a the outer

•Nested functions may access outer functions parameters - but not rebind them

•This trick can be used to form closures as we’ll see in lambdas

Closures• This example adopted from Python in a

Nutshell

• def make_adder(augend):

• def add(addend):

• return addend+augent

• return add

•Calling make_adder(7) returns a function that accepts a single argument and adds seven to it

Namespace resolution

• Name resolutions proceeds as follows

• Local scope (i.e., this function)

• Outer scope (i.e., enclosing functions)

• Module level scope (i.e., global variables)

• Built in scope (i.e., predefined python keywords)

• A word to the wise - do not name your variables when there is a danger of conflicting with modules your may import

• E.g., ‘open = 5’ is dangerous if you’re using file objects, later use of the open method might not resolve where you expect!

Documentation and Pydoc

• String literal beginning method, class, or module:

• One sentence concise summary, followed by a blank, followed by detail.

• References

• http://www.python.org/dev/peps/pep-0257/

def complex(real=0.0, imag=0.0): """Form a complex number. Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0:

return complex_zero ...

Code is read MANY more times than it is

written• Trust me, it’s worth it

• First line should be a concise and descriptive statement of purpose

• Self documentation is good, but do not repeat the method name! (e.g., def setToolTip(text) #sets the tool tip)

• Next paragraph should describe the method and any side effects

• Then arguments

Python’s thoughts on documentation

•A Foolish Consistency is the Hobgoblin of Little Minds

•http://www.python.org/dev/peps/pep-0008/

Functions, returning multiple values

•Functions can return multiple values (of arbitrary type), just separate them by commas

•Always reminded me of MATLAB

•def foo():

• return [1,2,3], 4, (5,6)

•myList, myInt, myTuple = foo()

A word on mutable arguments

•Be cautious when passing mutable data structures (lists, dictionaries) to methods - especially if they’re sourced from modules that are not your own

•When in doubt, either copy or cast them as tuples

Semantics of argument passing

• Recall that while functions can not rebind arguments, they can alter mutable types

• Positional arguments

• Named arguments

• Special forms *(sequence) and ** (dictionary)

• Sequence:

• zero or more positional followed by

• zero or more named

• zero or 1 *

• zero or 1 **

Positional arguments

• def myFunction(arg1, arg2, arg3, arg4, arg5, arg6):

• .....

• Potential for typos

• Readability declines

• Maintenance a headache

• Frequent headache in Java / C (I’m sure we can all recall some monster functions written by colleagues / fellow students)

• We can do better

Named arguments• Syntax identifier = expression

• Named arguments specified in the function declaration optional arguments, the expression is their default value if not provided by the calling function

• Two forms

• 1) you may name arguments passed to functions even if they are listed positionally

• 2) you may name arguments within a functions declaration to supply default values

• Outstanding for self documentation!

Named argument example

•def add(a, b):

• return a + b

•Equivilent calls:

•print add(4,2)

•print add(a=4, b=2)

•print add(b=2, a=4)

Default argument example

•def add(a=4, b=2):

• return a+b

•print add(b=4)

•print add(a=2, b=4)

•print add(4, b=2)

Sequence arguments

• Sequence treats additional arguments as iterable positional arguments

• def sum(*args):

• #equivilent to return sum(args)

• sum = 0

• for arg in args:

• sum += arg

• return sum

• Valid calls:

• sum(4,2,1,3)

• sum(1)

• sum(1,23,4,423,234)

• **dct must be a dictionary whose keys are all strings, values of course are arbitrary

•each items key is the parameter name, the value is the argument

Sequences of named arguments

# **# collects keyword # arguments into a

dictionary

def foo(**args): print args

foo(homer=‘donut’,\ lisa = ‘tofu’){'homer': 'donut', 'lisa':

'tofu'}

Optional arguments are everywhere

# three ways to call the range function

# up torange(5)[0, 1, 2, 3, 4]

# from, up torange(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

# from, up to , steprange(-5, 5, 2)[-5, -3, -1, 1, 3]

Arbitrary arguments example

We can envision a max function pretty easily

# idea: max2(1,5,3,1)# >>> 5# idea: max2(‘a’, ‘b’, ‘c’, ‘d’, ‘e’)# >>> e

def max2(*args):for arg in args…

Arbitrary arguments example

def max1(*args):best = args[0]for arg in args[1:]:

if arg > best:best = arg

return best

def max2(*args):return sorted(args)[0]

Argument matching rules

• General rule: more complicated to the right• For both calling and definitional code:• All positional arguments must appear

first– Followed by all keyword arguments• Followed by the * form

– And finally **

Functions as arguments

• Of course we can pass functions are arguments as well

def myCompare(x, y): –…

sorted([5, 3, 1, 9], cmp=myCompare)

LambdasLambdas

• The closer you can get to mathematics the more elegant your programs become• In addition to the def statement,

Python provides an expression which in-lines a function – similar to LISP• Instead of assigning a name to a

function, lambda just returns the function itself – anonymous functions

When should you use Lambda

When should you use Lambda

• Lambda is designed for handling simple functions– Conciseness: lambdas can live places def’s

cannot (inside a list literal, or a function call itself)

– Elegance• Limitations– Not as general as a def – limited to a single

expression, there is only so much you can squeeze in without using blocks of code

• Use def for larger tasks• Do not sacrifice readability– More important that your work is a) correct and

b) efficient w.r.t. to people hours

Quick examples

• Arguments work just like functions – including defaults, *, and **

• The lambda expression returns a function, so you can assign a name to it if you wish

foo = (lambda a, b=“simpson”: a + “ “ + b)foo(“lisa”)lisa simpsonfoo(“bart”)bart simpson

More examples# Embedding lambdas in a listmyList = [(lambda x: x**2), (lambda x: x**3)]for func in myList:

print func(2)

48

# Embedding lambdas in a dictionarydonuts = {'homer' : (lambda x: x * 4), 'lisa' : (lambda x: x *

0)}Donuts[‘homer’](2)8Donuts[‘lisa’](2)0

Multiple arguments

(lambda x, y: x + " likes " + y)('homer', 'donuts')

'homer likes donuts‘

State

def remember(x):return (lambda y: x + y)

foo = remember(5)

print foo

<function <lambda> at 0x01514970>

foo(2)

7

MapsMaps• One of the most

common tasks with lists is to apply an operation to each element in the sequence

# w/o mapsdonuts = [1,2,3,4]myDonuts = []for d in donuts:myDonuts.append(d * 2)

print myDonuts[2, 4, 6, 8]

# w mapsdef more(d): return d * 2myDonuts = map(more, donuts)print myDonuts[2, 4, 6, 8]

Map using Lambdas

def more(d): return d * 3myDonuts = map(more, donuts)print myDonuts[3, 6, 9, 12]

myDonuts = map((lambda d: d * 3), donuts)print myDonuts[3, 6, 9, 12]

donuts = [1,2,3,4]

More mapsMore maps# map is smart

# understands functions requiring multiple arguments

# operates over sequences in parallelpow(2, 3)

8

map(pow, [2, 4, 6], [1, 2, 3])

[2, 16, 216]

map((lambda x,y: x + " likes " + y),\

['homer', 'bart'], ['donuts', 'sugar'])

['homer likes donuts', 'bart likes sugar‘]

Functional programming tools:Filter and reduce

Functional programming tools:Filter and reduce

• Theme of functional programming– apply functions to sequences

• Relatives of map: – filter and reduce

• Filter: – filters out items relative to a test function

• Reduce: –Applies functions to pairs of items and

running results

FilterFilter

range(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

def isEven(x): return x % 2 == 0filter ((isEven, range(-5,5))[-4, -2, 0, 2, 4]

filter((lambda x: x % 2 == 0), range(-5, 5))[-4, -2, 0, 2, 4]

ReduceReduce• A bit more

complicated• By default the first

argument is used to initialize the tally

• def reduce(fn, seq):– tally = seq[0]– For next in seq:

• tally = fn(tally, next)– return tally

• FYI More functional tools available

reduce((lambda x, y: x + y), \ [1,2,3,4])

10

import operatorreduce(operator.add, [1, 2,

3])6

List comprehensions revisited: combining filter and map

List comprehensions revisited: combining filter and map

# Say we wanted to collect the squares of the even numbers below 11

# Using a list comprehension[x ** 2 for x in range(11) if x % 2 == 0][0, 4, 16, 36, 64, 100]

#Using map and filtermap((lambda x: x ** 2), filter((lambda x: x % 2 == 0),

range(11)))[0, 4, 16, 36, 64, 100]

# Easier way, this uses the optional stepping argument in range[x ** 2 for x in range(0,11,2)][0, 4, 16, 36, 64, 100]

Reading files with list comprehensions

# old waylines = open(‘simpsons.csv’).readlines()[‘homer,donut\n’, ‘lisa,brocolli\n’]for line in lines:

– line = line.strip()…

# with a comprehension[line.strip() for line in open(‘simpsons.csv’).readlines()][‘homer,donut’, ‘lisa,brocolli’]

# with a lambdamap((lambda line: \line.strip(), open(‘simpsons.csv’).readlines())

Generators and IteratorsGenerators and Iterators

• Generators are like normal functions in most respects but they automatically implement the iteration protocol to return a sequence of values over time

• Consider a generator when– you need to compute a series of values lazily

• Generators– Save you the work of saving state– Automatically save theirs when yield is called

• Easy– Just use “yield” instead of “return”

Quick example

def genSquares(n):for i in range(N):

yield i ** 2

print gen <generator object at 0x01524BE8>

for i in genSquares(5):print i, “then”,

0 then 1 then 4 then 9 then 16 then

Error handling previewError handling previewdef gen():

i = 0while i < 5:

i+=1yield i ** 2

x = gen()x.next()>> > 1x.next()>>> 4…Traceback (most recent call last): File "<pyshell#110>", line 1, in

<module> x.next()StopIteration

try:x.next()

except StopIteration:print "done”

5 Minute Exercise

• Begin writing a generator produce primes

• Start with 0• When you find a

prime, yield (return) that value

• Write code to call your generator

def genPrimes():…. yield prime

def main()g = genPrimes()while True:

print g.next()

Regular ExpressionsRegular Expressions

• A regular expression (re) is a string that represents a pattern.

• Idea is to check any string with the pattern to see if it matches, and if so – where

• REs may be compiled or used on the fly• You may use REs to match, search,

substitute, or split strings• Very powerful – a bit of a bear syntactically

Quick examples: Match vs. Search

import rep = re.compile('[a-z]+')m = pattern.match('donut')print m.group(), m.start(),

m.end()donut 0 5

m = pattern.search('12 donuts are

\ better than 1')print m.group(), m.span()donuts (3, 9)

m = pattern.match(‘ \ 12 donuts are better \ than 1')if m: print m.group()else: print "no match“no match

Quick examples: Multiple hitsimport rep = re.compile('\d+\sdonuts')print p.findall('homer has 4 donuts, bart has 2 donuts') ['4 donuts', '2 donuts']

import rep = re.compile('\d+\sdonuts')iterator = p.finditer('99 donuts on the shelf, 98 donuts on the

shelf...')for match in iterator: print match.group(), match.span()

99 donuts (0, 9)98 donuts (24, 33)

Re Patterns 1Pattern Matches

. Matches any character

^ Matches the start of the string

$ Matches the end of the string

* Matches zero or more cases of the previous RE (greedy – match as many as possible)

+ Matches one or more cases of the previous RE (greedy)

? Matches zero or one case of the previous RE

*?, +? Non greedy versions (match as few as possible)

. Matches any character

Re Patterns 2Pattern Matches

\d, \D Matches one digit [0-9] or non-digit [^0-9]

\s, \S Matches whitespace [\t\n\r\f\v] or non-whitespace

\w, \W Matches one alphanumeric char – (understands Unicode and various locales if set)

\b, \B Matches an empty string, but only at the start or end of a word

\Z Matches an empty string at the end of a whole string

\\ Matches on backslash

{m,n} Matches m to n cases of the previous RE

[…] Matches any one of a set of characters

| Matches either the preceding or following expression

(…) Matches the RE within the parenthesis

Gotchas

• RE punctuation is backwards– “.” matches any character when

unescaped, or an actual “.” when in the form “\.”

– “+” and “*” carry regular expression meaning unless escaped

Quick examples * vs. +. \b

.* vs .+• The pattern– ‘Homer.*Simpson’ will

match:• HomerSimpson• Homer Simpson• Homer Jay Simpson

• The pattern– ‘Homer.+Simpson’ will

match:• Homer Simpson• Homer Jay Simpson

\b• The pattern– r’\bHomer\b’ will find a hit

searching– Homer– Homer Simpson

• The pattern– r’\bHomer’ will find a hit

searching– HomerJaySimpson

Sets of chars: []

• Sets of characters are denoted by listing the characters within brackets

• [abc] will match one of a, b, or c

• Ranges are supported• [0-9] will match one digit• You may include special

sets within brackets– Such as \s for a

whitespace character or \d for a digit

p = re.compile('[HJ]')iterator=p.finditer(“\HomerJaySimpson")for match in iterator: print match.group(), \ match.span()

H (0, 1)J (5, 6)

Alternatives: |s

• A vertical bar matches a pattern on either side

import rep = re.compile(‘Homer|Simpson')iterator=p.finditer(“HomerJaySimpson")for match in iterator:

print match.group(), match.span()Homer (0, 5) aco (8, 12)

RE SubstitutionRE Substitution

import re

line = 'Hello World!'

r = re.compile('world', re.IGNORECASE)

m = r.search(line)

>>> World

print r.sub('Mars!!', line, 1)

>>> Hello Mars!!!

import re

line = 'Hello World!'

r = re.compile('world', re.IGNORECASE)

m = r.search(line)

>>> World

print r.sub('Mars!!', line, 1)

>>> Hello Mars!!!

RE SplittingRE Splitting

import re

line = 'lots 42 of random 12 digits 77'

r = re.compile('\d+')

l = r.split(line)

print l

>>> ['lots ', ' of random ', ' digits ', '']

Groups: ()• Frequently you need to obtain more information than

just whether the RE matched or not. • Regular expressions are often used to dissect strings by

writing a RE divided into several subgroups which match different components of interest.

p = re.compile('(homer\s(jay))\ssimpson')m = p.match('homer jay simpson')print m.group(0)print m.group(1)print m.group(2)

homer jay simpsonhomer jayjay

Putting it all together (and optional flags)Putting it all together (and optional flags)

import rer = re.compile('simpson', re.IGNORECASE)print r.search("HomerJaySimpson").group()simpson

r = re.compile('([[A-Z][a-z]+).*?(\d+$)', re.MULTILINE)

iterator = r.finditer('Homer is 42\nMaggie is 6\nBart is 12')for match in iterator: print match.group(1), "was born", match.group(2), "years

ago“

Homer was born 42 years agoMaggie was born 6 years agoBart was born 12 years agoDiscussion: Who can explain how this RE works?

Finding tags within HTML

import re

line = '<tag>my eyes! the goggles do \

nothing.</tag>'

r = re.compile('<tag>(.*)</tag>')

m = r.search(line)

print m.group(1)

>>> my eyes! the goggles do nothing.

import re

line = '<tag>my eyes! the goggles do \

nothing.</tag>'

r = re.compile('<tag>(.*)</tag>')

m = r.search(line)

print m.group(1)

>>> my eyes! the goggles do nothing.

5 Minute Exercise

• Download the Columbia homepage to disk

• Open it with python• Use regular

expressions to being extracting the news

import reline = '<tag>my eyes!

the goggles do \ nothing.</tag>'r =

re.compile('<tag>(.*)</tag>')

m = r.search(line)print m.group(1)>>> my eyes! the

goggles do nothing.

import reline = '<tag>my eyes!

the goggles do \ nothing.</tag>'r =

re.compile('<tag>(.*)</tag>')

m = r.search(line)print m.group(1)>>> my eyes! the

goggles do nothing.

Recommended