60
CS3101 Python Lecture 3

CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Embed Size (px)

Citation preview

Page 1: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

CS3101 PythonLecture 3

Page 2: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Agenda• Scoping

• Documentation, Coding Practices, Pydoc

• Functions

• Named, optional, and arbitrary arguments

• Generators and Iterators

• Functional programming tools

• lambda, map, filter, reduce

• Regular expressions

• Homework 3

Page 3: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Extra credit solution to HW1

• Dynamic programming: 15+- lines– determining whether a bill of N dollars is

satisfiable resolves to whether you can satisfy a bill of N – J dollars where J is an item on your menu

– Create an empty list (knapsack) with N+1 entries– Base case: we know we can satisfy a bill of 0

dollars– For each item on your menu– For index = 0 to N + 1

• If knapsack[index] is empty, and knapsack[index – item’s cost] is not:

• We now know how to satisfy this bill, so append the current item to a solution list which lives at knapsack[index]

Page 4: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Homework 3, Exercise 1

• Requirements:– 1. Write a program using

regular expressions retrieve the current weather from a website of your choosing. Just the temperature is OK.

– 2. Use that information to suggest a sport to play.

– ./sport.py – It’s 36 degrees today.

You should ski!

http://www.nytimes.com/weather

Page 5: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Homework 3, Exercise 2

• Requirements:– a) Write a program

which uses regular expressions and URLLIB to print the address of each image on Columbia’s homepage (www.columbia.edu)

– b) Use regular expressions to print the title of each of the news stories on the main page

– ./news.py– ./images.py

Page 6: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Scoping• Local Namespaces / Local scope

• A functions parameters and variables that are bound within the function

• Module scope

• Variables outside functions at the module level are global

• Hiding

• Inner scope wins: if a name conflict occurs between a local variable and a global one, the local one takes precedence

Page 7: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Global statement• Local scope wins by default

• If within a function you must refer to a global variable of the same name, redeclare it first with the global keyword

• ‘global identifiers’, where identifiers contains one or more IDs separated by commas

• Never use global if your function just accesses a global variable, only if it rebinds it

• Global in general is poor style, it breaks encapsulation, but you’ll see it out there

Page 8: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Closures and Nested Scope

•Using a def statement with another functions body defines a nester or inner function

•The parent function is referred to as a the outer

•Nested functions may access outer functions parameters - but not rebind them

•This trick can be used to form closures as we’ll see in lambdas

Page 9: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Closures• This example adopted from Python in a

Nutshell

• def make_adder(augend):

• def add(addend):

• return addend+augent

• return add

•Calling make_adder(7) returns a function that accepts a single argument and adds seven to it

Page 10: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Namespace resolution

• Name resolutions proceeds as follows

• Local scope (i.e., this function)

• Outer scope (i.e., enclosing functions)

• Module level scope (i.e., global variables)

• Built in scope (i.e., predefined python keywords)

• A word to the wise - do not name your variables when there is a danger of conflicting with modules your may import

• E.g., ‘open = 5’ is dangerous if you’re using file objects, later use of the open method might not resolve where you expect!

Page 11: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Documentation and Pydoc

• String literal beginning method, class, or module:

• One sentence concise summary, followed by a blank, followed by detail.

• References

• http://www.python.org/dev/peps/pep-0257/

def complex(real=0.0, imag=0.0): """Form a complex number. Keyword arguments: real -- the real part (default 0.0) imag -- the imaginary part (default 0.0) """ if imag == 0.0 and real == 0.0:

return complex_zero ...

Page 12: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Code is read MANY more times than it is

written• Trust me, it’s worth it

• First line should be a concise and descriptive statement of purpose

• Self documentation is good, but do not repeat the method name! (e.g., def setToolTip(text) #sets the tool tip)

• Next paragraph should describe the method and any side effects

• Then arguments

Page 13: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Python’s thoughts on documentation

•A Foolish Consistency is the Hobgoblin of Little Minds

•http://www.python.org/dev/peps/pep-0008/

Page 14: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Functions, returning multiple values

•Functions can return multiple values (of arbitrary type), just separate them by commas

•Always reminded me of MATLAB

•def foo():

• return [1,2,3], 4, (5,6)

•myList, myInt, myTuple = foo()

Page 15: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

A word on mutable arguments

•Be cautious when passing mutable data structures (lists, dictionaries) to methods - especially if they’re sourced from modules that are not your own

•When in doubt, either copy or cast them as tuples

Page 16: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Semantics of argument passing

• Recall that while functions can not rebind arguments, they can alter mutable types

• Positional arguments

• Named arguments

• Special forms *(sequence) and ** (dictionary)

• Sequence:

• zero or more positional followed by

• zero or more named

• zero or 1 *

• zero or 1 **

Page 17: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Positional arguments

• def myFunction(arg1, arg2, arg3, arg4, arg5, arg6):

• .....

• Potential for typos

• Readability declines

• Maintenance a headache

• Frequent headache in Java / C (I’m sure we can all recall some monster functions written by colleagues / fellow students)

• We can do better

Page 18: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Named arguments• Syntax identifier = expression

• Named arguments specified in the function declaration optional arguments, the expression is their default value if not provided by the calling function

• Two forms

• 1) you may name arguments passed to functions even if they are listed positionally

• 2) you may name arguments within a functions declaration to supply default values

• Outstanding for self documentation!

Page 19: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Named argument example

•def add(a, b):

• return a + b

•Equivilent calls:

•print add(4,2)

•print add(a=4, b=2)

•print add(b=2, a=4)

Page 20: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Default argument example

•def add(a=4, b=2):

• return a+b

•print add(b=4)

•print add(a=2, b=4)

•print add(4, b=2)

Page 21: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Sequence arguments

• Sequence treats additional arguments as iterable positional arguments

• def sum(*args):

• #equivilent to return sum(args)

• sum = 0

• for arg in args:

• sum += arg

• return sum

• Valid calls:

• sum(4,2,1,3)

• sum(1)

• sum(1,23,4,423,234)

Page 22: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

• **dct must be a dictionary whose keys are all strings, values of course are arbitrary

•each items key is the parameter name, the value is the argument

Sequences of named arguments

# **# collects keyword # arguments into a

dictionary

def foo(**args): print args

foo(homer=‘donut’,\ lisa = ‘tofu’){'homer': 'donut', 'lisa':

'tofu'}

Page 23: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Optional arguments are everywhere

# three ways to call the range function

# up torange(5)[0, 1, 2, 3, 4]

# from, up torange(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

# from, up to , steprange(-5, 5, 2)[-5, -3, -1, 1, 3]

Page 24: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Arbitrary arguments example

We can envision a max function pretty easily

# idea: max2(1,5,3,1)# >>> 5# idea: max2(‘a’, ‘b’, ‘c’, ‘d’, ‘e’)# >>> e

def max2(*args):for arg in args…

Page 25: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Arbitrary arguments example

def max1(*args):best = args[0]for arg in args[1:]:

if arg > best:best = arg

return best

def max2(*args):return sorted(args)[0]

Page 26: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Argument matching rules

• General rule: more complicated to the right• For both calling and definitional code:• All positional arguments must appear

first– Followed by all keyword arguments• Followed by the * form

– And finally **

Page 27: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Functions as arguments

• Of course we can pass functions are arguments as well

def myCompare(x, y): –…

sorted([5, 3, 1, 9], cmp=myCompare)

Page 28: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

LambdasLambdas

• The closer you can get to mathematics the more elegant your programs become• In addition to the def statement,

Python provides an expression which in-lines a function – similar to LISP• Instead of assigning a name to a

function, lambda just returns the function itself – anonymous functions

Page 29: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

When should you use Lambda

When should you use Lambda

• Lambda is designed for handling simple functions– Conciseness: lambdas can live places def’s

cannot (inside a list literal, or a function call itself)

– Elegance• Limitations– Not as general as a def – limited to a single

expression, there is only so much you can squeeze in without using blocks of code

• Use def for larger tasks• Do not sacrifice readability– More important that your work is a) correct and

b) efficient w.r.t. to people hours

Page 30: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Quick examples

• Arguments work just like functions – including defaults, *, and **

• The lambda expression returns a function, so you can assign a name to it if you wish

foo = (lambda a, b=“simpson”: a + “ “ + b)foo(“lisa”)lisa simpsonfoo(“bart”)bart simpson

Page 31: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

More examples# Embedding lambdas in a listmyList = [(lambda x: x**2), (lambda x: x**3)]for func in myList:

print func(2)

48

# Embedding lambdas in a dictionarydonuts = {'homer' : (lambda x: x * 4), 'lisa' : (lambda x: x *

0)}Donuts[‘homer’](2)8Donuts[‘lisa’](2)0

Page 32: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Multiple arguments

(lambda x, y: x + " likes " + y)('homer', 'donuts')

'homer likes donuts‘

Page 33: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

State

def remember(x):return (lambda y: x + y)

foo = remember(5)

print foo

<function <lambda> at 0x01514970>

foo(2)

7

Page 34: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

MapsMaps• One of the most

common tasks with lists is to apply an operation to each element in the sequence

# w/o mapsdonuts = [1,2,3,4]myDonuts = []for d in donuts:myDonuts.append(d * 2)

print myDonuts[2, 4, 6, 8]

# w mapsdef more(d): return d * 2myDonuts = map(more, donuts)print myDonuts[2, 4, 6, 8]

Page 35: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Map using Lambdas

def more(d): return d * 3myDonuts = map(more, donuts)print myDonuts[3, 6, 9, 12]

myDonuts = map((lambda d: d * 3), donuts)print myDonuts[3, 6, 9, 12]

donuts = [1,2,3,4]

Page 36: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

More mapsMore maps# map is smart

# understands functions requiring multiple arguments

# operates over sequences in parallelpow(2, 3)

8

map(pow, [2, 4, 6], [1, 2, 3])

[2, 16, 216]

map((lambda x,y: x + " likes " + y),\

['homer', 'bart'], ['donuts', 'sugar'])

['homer likes donuts', 'bart likes sugar‘]

Page 37: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Functional programming tools:Filter and reduce

Functional programming tools:Filter and reduce

• Theme of functional programming– apply functions to sequences

• Relatives of map: – filter and reduce

• Filter: – filters out items relative to a test function

• Reduce: –Applies functions to pairs of items and

running results

Page 38: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

FilterFilter

range(-5, 5)[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

def isEven(x): return x % 2 == 0filter ((isEven, range(-5,5))[-4, -2, 0, 2, 4]

filter((lambda x: x % 2 == 0), range(-5, 5))[-4, -2, 0, 2, 4]

Page 39: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

ReduceReduce• A bit more

complicated• By default the first

argument is used to initialize the tally

• def reduce(fn, seq):– tally = seq[0]– For next in seq:

• tally = fn(tally, next)– return tally

• FYI More functional tools available

reduce((lambda x, y: x + y), \ [1,2,3,4])

10

import operatorreduce(operator.add, [1, 2,

3])6

Page 40: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

List comprehensions revisited: combining filter and map

List comprehensions revisited: combining filter and map

# Say we wanted to collect the squares of the even numbers below 11

# Using a list comprehension[x ** 2 for x in range(11) if x % 2 == 0][0, 4, 16, 36, 64, 100]

#Using map and filtermap((lambda x: x ** 2), filter((lambda x: x % 2 == 0),

range(11)))[0, 4, 16, 36, 64, 100]

# Easier way, this uses the optional stepping argument in range[x ** 2 for x in range(0,11,2)][0, 4, 16, 36, 64, 100]

Page 41: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Reading files with list comprehensions

# old waylines = open(‘simpsons.csv’).readlines()[‘homer,donut\n’, ‘lisa,brocolli\n’]for line in lines:

– line = line.strip()…

# with a comprehension[line.strip() for line in open(‘simpsons.csv’).readlines()][‘homer,donut’, ‘lisa,brocolli’]

# with a lambdamap((lambda line: \line.strip(), open(‘simpsons.csv’).readlines())

Page 42: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Generators and IteratorsGenerators and Iterators

• Generators are like normal functions in most respects but they automatically implement the iteration protocol to return a sequence of values over time

• Consider a generator when– you need to compute a series of values lazily

• Generators– Save you the work of saving state– Automatically save theirs when yield is called

• Easy– Just use “yield” instead of “return”

Page 43: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Quick example

def genSquares(n):for i in range(N):

yield i ** 2

print gen <generator object at 0x01524BE8>

for i in genSquares(5):print i, “then”,

0 then 1 then 4 then 9 then 16 then

Page 44: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Error handling previewError handling previewdef gen():

i = 0while i < 5:

i+=1yield i ** 2

x = gen()x.next()>> > 1x.next()>>> 4…Traceback (most recent call last): File "<pyshell#110>", line 1, in

<module> x.next()StopIteration

try:x.next()

except StopIteration:print "done”

Page 45: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

5 Minute Exercise

• Begin writing a generator produce primes

• Start with 0• When you find a

prime, yield (return) that value

• Write code to call your generator

def genPrimes():…. yield prime

def main()g = genPrimes()while True:

print g.next()

Page 46: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Regular ExpressionsRegular Expressions

• A regular expression (re) is a string that represents a pattern.

• Idea is to check any string with the pattern to see if it matches, and if so – where

• REs may be compiled or used on the fly• You may use REs to match, search,

substitute, or split strings• Very powerful – a bit of a bear syntactically

Page 47: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Quick examples: Match vs. Search

import rep = re.compile('[a-z]+')m = pattern.match('donut')print m.group(), m.start(),

m.end()donut 0 5

m = pattern.search('12 donuts are

\ better than 1')print m.group(), m.span()donuts (3, 9)

m = pattern.match(‘ \ 12 donuts are better \ than 1')if m: print m.group()else: print "no match“no match

Page 48: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Quick examples: Multiple hitsimport rep = re.compile('\d+\sdonuts')print p.findall('homer has 4 donuts, bart has 2 donuts') ['4 donuts', '2 donuts']

import rep = re.compile('\d+\sdonuts')iterator = p.finditer('99 donuts on the shelf, 98 donuts on the

shelf...')for match in iterator: print match.group(), match.span()

99 donuts (0, 9)98 donuts (24, 33)

Page 49: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Re Patterns 1Pattern Matches

. Matches any character

^ Matches the start of the string

$ Matches the end of the string

* Matches zero or more cases of the previous RE (greedy – match as many as possible)

+ Matches one or more cases of the previous RE (greedy)

? Matches zero or one case of the previous RE

*?, +? Non greedy versions (match as few as possible)

. Matches any character

Page 50: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Re Patterns 2Pattern Matches

\d, \D Matches one digit [0-9] or non-digit [^0-9]

\s, \S Matches whitespace [\t\n\r\f\v] or non-whitespace

\w, \W Matches one alphanumeric char – (understands Unicode and various locales if set)

\b, \B Matches an empty string, but only at the start or end of a word

\Z Matches an empty string at the end of a whole string

\\ Matches on backslash

{m,n} Matches m to n cases of the previous RE

[…] Matches any one of a set of characters

| Matches either the preceding or following expression

(…) Matches the RE within the parenthesis

Page 51: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Gotchas

• RE punctuation is backwards– “.” matches any character when

unescaped, or an actual “.” when in the form “\.”

– “+” and “*” carry regular expression meaning unless escaped

Page 52: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Quick examples * vs. +. \b

.* vs .+• The pattern– ‘Homer.*Simpson’ will

match:• HomerSimpson• Homer Simpson• Homer Jay Simpson

• The pattern– ‘Homer.+Simpson’ will

match:• Homer Simpson• Homer Jay Simpson

\b• The pattern– r’\bHomer\b’ will find a hit

searching– Homer– Homer Simpson

• The pattern– r’\bHomer’ will find a hit

searching– HomerJaySimpson

Page 53: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Sets of chars: []

• Sets of characters are denoted by listing the characters within brackets

• [abc] will match one of a, b, or c

• Ranges are supported• [0-9] will match one digit• You may include special

sets within brackets– Such as \s for a

whitespace character or \d for a digit

p = re.compile('[HJ]')iterator=p.finditer(“\HomerJaySimpson")for match in iterator: print match.group(), \ match.span()

H (0, 1)J (5, 6)

Page 54: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Alternatives: |s

• A vertical bar matches a pattern on either side

import rep = re.compile(‘Homer|Simpson')iterator=p.finditer(“HomerJaySimpson")for match in iterator:

print match.group(), match.span()Homer (0, 5) aco (8, 12)

Page 55: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

RE SubstitutionRE Substitution

import re

line = 'Hello World!'

r = re.compile('world', re.IGNORECASE)

m = r.search(line)

>>> World

print r.sub('Mars!!', line, 1)

>>> Hello Mars!!!

import re

line = 'Hello World!'

r = re.compile('world', re.IGNORECASE)

m = r.search(line)

>>> World

print r.sub('Mars!!', line, 1)

>>> Hello Mars!!!

Page 56: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

RE SplittingRE Splitting

import re

line = 'lots 42 of random 12 digits 77'

r = re.compile('\d+')

l = r.split(line)

print l

>>> ['lots ', ' of random ', ' digits ', '']

Page 57: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Groups: ()• Frequently you need to obtain more information than

just whether the RE matched or not. • Regular expressions are often used to dissect strings by

writing a RE divided into several subgroups which match different components of interest.

p = re.compile('(homer\s(jay))\ssimpson')m = p.match('homer jay simpson')print m.group(0)print m.group(1)print m.group(2)

homer jay simpsonhomer jayjay

Page 58: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Putting it all together (and optional flags)Putting it all together (and optional flags)

import rer = re.compile('simpson', re.IGNORECASE)print r.search("HomerJaySimpson").group()simpson

r = re.compile('([[A-Z][a-z]+).*?(\d+$)', re.MULTILINE)

iterator = r.finditer('Homer is 42\nMaggie is 6\nBart is 12')for match in iterator: print match.group(1), "was born", match.group(2), "years

ago“

Homer was born 42 years agoMaggie was born 6 years agoBart was born 12 years agoDiscussion: Who can explain how this RE works?

Page 59: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

Finding tags within HTML

import re

line = '<tag>my eyes! the goggles do \

nothing.</tag>'

r = re.compile('<tag>(.*)</tag>')

m = r.search(line)

print m.group(1)

>>> my eyes! the goggles do nothing.

import re

line = '<tag>my eyes! the goggles do \

nothing.</tag>'

r = re.compile('<tag>(.*)</tag>')

m = r.search(line)

print m.group(1)

>>> my eyes! the goggles do nothing.

Page 60: CS3101 Python Lecture 3. Agenda Scoping Documentation, Coding Practices, Pydoc Functions Named, optional, and arbitrary arguments Generators and Iterators

5 Minute Exercise

• Download the Columbia homepage to disk

• Open it with python• Use regular

expressions to being extracting the news

import reline = '<tag>my eyes!

the goggles do \ nothing.</tag>'r =

re.compile('<tag>(.*)</tag>')

m = r.search(line)print m.group(1)>>> my eyes! the

goggles do nothing.

import reline = '<tag>my eyes!

the goggles do \ nothing.</tag>'r =

re.compile('<tag>(.*)</tag>')

m = r.search(line)print m.group(1)>>> my eyes! the

goggles do nothing.