a taste of
Presented by Jordan BakerOctober 23, 2009DevDays Toronto
About Me
• Open Source Developer
• Founder of Open Source Web Application and CMS service provider: Scryent - www.scryent.com
• Founder of Toronto Plone Users Group - www.torontoplone.ca
Agenda
• About Python
• Show me your CODE
• A Spell Checker in 21 lines of code
• Why Python ROCKS
• Resources for further exploration
About Python
http://www.flickr.com/photos/schoffer/196079076/
About Python
• Gotta love a language named after Monty Python’s Flying Circus
• Used in more places than you might know
Significant WhitespaceC-like
if(x == 2) { do_something();}do_something_else();
Python
if x == 2: do_something()do_something_else()
Significant Whitespace
• less code clutter
• eliminates many common syntax errors
• proper code layout
• use an indentation aware editor or IDE
• Get over it!
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>
Python is Interactive
FIZZ BUZZ
12FIZZ4BUZZ...14FIZZ BUZZ
def fizzbuzz(n): for i in range(n + 1): if not i % 3: print "Fizz", if not i % 5: print "Buzz", if i % 3 and i % 5: print i, print
fizzbuzz(50)
FIZZ BUZZ
def fizzbuzz(n): for i in range(n + 1): if not i % 3: print "Fizz", if not i % 5: print "Buzz", if i % 3 and i % 5: print i, print
fizzbuzz(50)
FIZZ BUZZ
class FizzBuzzWriter(object): def __init__(self, limit): self.limit = limit def run(self): for n in range(1, self.limit + 1): self.write_number(n) def write_number(self, n): if not n % 3: print "Fizz", if not n % 5: print "Buzz", if n % 3 and n % 5: print n, print fizzbuzz = FizzBuzzWriter(50)fizzbuzz.run()
FIZZ BUZZ (OO)
A Spell Checker in 21 Lines of Code
• Written by Peter Norvig
• Duplicated in many languages
• Simple Spellchecking algorithm based on probability
• http://norvig.com/spell-correct.html
The Approach• Census by frequency
• Morph the word (werd)
• Insertions: waerd, wberd, werzd
• Deletions: wrd, wed, wer
• Transpositions: ewrd, wred, wedr
• Replacements: aerd, ward, wbrd, word, wzrd, werz
• Find the one with the highest frequency: were
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(words): model = collections.defaultdict(int) for w in words: model[w] += 1 return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word): s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts)
def known_edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=NWORDS.get)
Norvig Spellchecker
def words(text): return re.findall('[a-z]+', text.lower())
>>> words("The cat in the hat!")['the', 'cat', 'in', 'the', 'hat']
Regular Expressions
>>> d = {'cat':1}>>> d{'cat': 1}>>> d['cat']1
>>> d['cat'] += 1>>> d{'cat': 2}
>>> d['dog'] += 1Traceback (most recent call last): File "<stdin>", line 1, in <module>KeyError: 'dog'
Dictionaries
# Has a factory for missing keys>>> d = collections.defaultdict(int)>>> d['dog'] += 1>>> d{'dog': 1}
>>> int<type 'int'>>>> int()0
def train(words): model = collections.defaultdict(int) for w in words: model[w] += 1 return model
>>> train(words("The cat in the hat!")){'cat': 1, 'the': 2, 'hat': 1, 'in': 1}
defaultdict
>>> text = file('big.txt').read() >>> NWORDS = train(words(text)) >>> NWORDS {'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1, 'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6, 'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1, 'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37, 'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4, 'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5, 'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1, 'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1, 'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1, 'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4, 'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2, 'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1, 'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1, 'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186, 'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107, ... (about 6500 more lines) ...
>>> NWORDS['the'] 80030 >>> NWORDS['unusual'] 32 >>> NWORDS['cephalopod'] 0
Reading the File
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(words): model = collections.defaultdict(int) for w in words: model[w] += 1 return model
NWORDS = train(words(file('big.txt').read()))
Training the Probability Model
# These two are equivalent:
result = []for v in iter: if cond: result.append(expr)
[ expr for v in iter if cond ]
# You can nest loops also:
result = []for v1 in iter1: for v2 in iter2: if cond: result.append(expr)
[ expr for v1 in iter1 for v2 in iter2 if cond ]
List Comprehensions
>>> word = "spam">>> word[:1]'s'>>> word[1:]'pam'
>>> (word[:1], word[1:])('s', 'pam')
>>> range(len(word) + 1)[0, 1, 2, 3, 4]
>>> [(word[:i], word[i:]) for i in range(len(word) + 1)][('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'), ('spam', '')]
String Slicing
>>> word = "spam">>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
>>> deletes = [a + b[1:] for a, b in s if b]
>>> deletes['pam', 'sam', 'spm', 'spa']
>>> a, b = ('s', 'pam')>>> a's'>>> b'pam'
>>> bool('pam')True>>> bool('')False
Deletions
For example: teh => the
>>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
>>> transposes['psam', 'sapm', 'spma']
Transpositions
>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]>>> replaces['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
Replacements
>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> inserts = [a + c + b for a, b in s for c in alphabet]>>> inserts['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
Insertion
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word): s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts)
>>> edits1("spam")set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam','spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm','slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr','lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum','spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
Find all Edits
def known(words): """ Return the known words from `words`. """ return set(w for w in words if w in NWORDS)
Known Words
def known(words): """ Return the known words from `words`. """ return set(w for w in words if w in NWORDS)
def correct(word): candidates = known([word]) or known(edits1(word)) or [word] return max(candidates, key=NWORDS.get)
>>> bool(set([]))False
>>> correct("computr")'computer'
>>> correct("computor")'computer'
>>> correct("computerr")'computer'
Correct
def known_edits2(word): return set( e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS )
def correct(word): candidates = known([word]) or known(edits1(word)) or \ known_edits2(word) or [word] return max(candidates, key=NWORDS.get)
>>> correct("conpuler")'computer'>>> correct("cmpuler")'computer'
Edit Distance 2
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(words): model = collections.defaultdict(int) for w in words: model[w] += 1 return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word): s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts)
def known_edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=NWORDS.get)
Comparing Python & Java Versions
• http://raelcunha.com/spell-correct.php
• 35 lines of Java
import java.io.*;import java.util.*;import java.util.regex.*;
class Spelling {
" private final HashMap<String, Integer> nWords = new HashMap<String, Integer>();
" public Spelling(String file) throws IOException {" " BufferedReader in = new BufferedReader(new FileReader(file));" " Pattern p = Pattern.compile("\\w+");" " for(String temp = ""; temp != null; temp = in.readLine()){" " " Matcher m = p.matcher(temp.toLowerCase());" " " while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1);" " }" " in.close();" }
" private final ArrayList<String> edits(String word) {" " ArrayList<String> result = new ArrayList<String>();" " for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1));" " for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) + word.substring(i, i+1) + word.substring(i+2));" " for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i+1));" " for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i));" " return result;" }
" public final String correct(String word) {" " if(nWords.containsKey(word)) return word;" " ArrayList<String> list = edits(word);" " HashMap<Integer, String> candidates = new HashMap<Integer, String>();" " for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s);" " if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet()));" " for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w);" " return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word;" }
" public static void main(String args[]) throws IOException {" " if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0]));" }
}
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(words): model = collections.defaultdict(int) for w in words: model[w] += 1 return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word): s = [(word[:i], word[i:]) for i in range(len(word) + 1)] deletes = [a + b[1:] for a, b in s if b] transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] replaces = [a + c + b[1:] for a, b in s for c in alphabet if b] inserts = [a + c + b for a, b in s for c in alphabet] return set(deletes + transposes + replaces + inserts)
def known_edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=NWORDS.get)
IDE for Python
• IDE’s for Python include:
• PyDev for Eclipse
• WingIDE
• IDLE for Windows/ Linux/ Mac
• there’s more
Why Python ROCKS
• Elegant and readable language - “Executable Pseudocode”
• Standard Libraries - “Batteries Included”
• Very High level Datatypes
• Dynamically Typed
• It’s FUN!
An Open Source Community
• Projects: Plone, Zope, Grok, BFG, Django, SciPy & NumPy, Google App Engine, PyGame
• PyCon
Resources
• PyGTA
• Toronto Plone Users
• Toronto Django Users
• Stackoverflow
• Dive into Python
• Python Tutorial
Thanks
• I’d love to hear your questions or comments on this presentation. Reach me at:
• http://twitter.com/hexsprite