51
Introduction to Python Chen Lin Chen Lin [email protected] [email protected] COSI 134a COSI 134a Volen 110 Volen 110 Office Hour: Thurs. 3-5 Office Hour: Thurs. 3-5

Introduction to Python Chen Lin [email protected] COSI 134a Volen 110 Office Hour: Thurs. 3-5

Embed Size (px)

Citation preview

Page 1: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Introduction to Python

Chen LinChen [email protected]@brandeis.edu

COSI 134aCOSI 134aVolen 110Volen 110

Office Hour: Thurs. 3-5Office Hour: Thurs. 3-5

Page 2: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

For More Information?

http://python.org/ - documentation, tutorials, beginners guide, core

distribution, ...Books include: Learning Python by Mark Lutz Python Essential Reference by David Beazley Python Cookbook, ed. by Martelli, Ravenscroft and

Ascher (online at

http://code.activestate.com/recipes/langs/python/) http://wiki.python.org/moin/PythonBooks

Page 3: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Python VideosPython Videos

http://showmedo.com/videotutorials/python“5 Minute Overview (What Does Python

Look Like?)”“Introducing the PyDev IDE for Eclipse”“Linear Algebra with Numpy”And many more

Page 4: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

4 Major Versions of Python4 Major Versions of Python

“Python” or “CPython” is written in C/C++

- Version 2.7 came out in mid-2010

- Version 3.1.2 came out in early 2010

“Jython” is written in Java for the JVM“IronPython” is written in C# for the .Net

environmentGo To Website

Page 5: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Development EnvironmentsDevelopment Environmentswhat IDE to use?what IDE to use? http://stackoverflow.com/questions/81584http://stackoverflow.com/questions/81584

1. PyDev with Eclipse 2. Komodo3. Emacs4. Vim5. TextMate6. Gedit7. Idle8. PIDA (Linux)(VIM Based)9. NotePad++ (Windows)10.BlueFish (Linux)

Page 6: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Pydev with EclipsePydev with Eclipse

Page 7: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Python Interactive ShellPython Interactive Shell% python% pythonPython 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)[GCC 4.2.1 (Apple Inc. build 5646)] on darwin[GCC 4.2.1 (Apple Inc. build 5646)] on darwinType "help", "copyright", "credits" or "license" for more information.Type "help", "copyright", "credits" or "license" for more information.>>>>>>

You can type things directly into a running Python sessionYou can type things directly into a running Python session>>> 2+3*4>>> 2+3*41414>>> name = "Andrew">>> name = "Andrew">>> name>>> name'Andrew''Andrew'>>> print "Hello", name>>> print "Hello", nameHello AndrewHello Andrew>>>>>>

Page 8: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/StructureControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 9: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

ListList

A compound data type:A compound data type:[0][0][2.3, 4.5][2.3, 4.5][5, "Hello", "there", 9.8][5, "Hello", "there", 9.8][][]Use len() to get the length of a listUse len() to get the length of a list>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]>>> len(names)>>> len(names)33

Page 10: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Use [ ] to index items in the listUse [ ] to index items in the list>>> names[0]>>> names[0]‘‘Ben'Ben'>>> names[1]>>> names[1]‘‘Chen'Chen'>>> names[2]>>> names[2]‘‘Yaqin'Yaqin'>>> names[3]>>> names[3]Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>IndexError: list index out of rangeIndexError: list index out of range>>> names[-1]>>> names[-1]‘‘Yaqin'Yaqin'>>> names[-2]>>> names[-2]‘‘Chen'Chen'>>> names[-3]>>> names[-3]‘‘Ben'Ben'

[0] is the first item.[1] is the second item...

Out of range valuesraise an exception

Negative valuesgo backwards fromthe last element.

Page 11: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Strings share many features with listsStrings share many features with lists

>>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles[0]>>> smiles[0]'C''C'>>> smiles[1]>>> smiles[1]'(''('>>> smiles[-1]>>> smiles[-1]'O''O'>>> smiles[1:5]>>> smiles[1:5]'(=N)''(=N)'>>> smiles[10:-4]>>> smiles[10:-4]'C(=O)''C(=O)'

Use “slice” notation toget a substring

Page 12: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

String Methods: find, splitString Methods: find, split

smiles = "C(=N)(N)N.C(=O)(O)O"smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles.find("(O)")>>> smiles.find("(O)")1515>>> smiles.find(".")>>> smiles.find(".")99>>> smiles.find(".", 10)>>> smiles.find(".", 10)-1-1>>> smiles.split(".")>>> smiles.split(".")['C(=N)(N)N', 'C(=O)(O)O']['C(=N)(N)N', 'C(=O)(O)O']>>>>>>

Use “find” to find thestart of a substring.

Start looking at position 10.

Find returns -1 if it couldn’tfind a match.

Split the string into partswith “.” as the delimiter

Page 13: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

String operators: in, not inString operators: in, not in

if "Br" in “Brother”:if "Br" in “Brother”:

print "contains brother“print "contains brother“

email_address = “clin”email_address = “clin”

if "@" not in email_address:if "@" not in email_address:

email_address += "@brandeis.edu“email_address += "@brandeis.edu“

Page 14: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

String Method: “strip”, “rstrip”, “lstrip” are ways toString Method: “strip”, “rstrip”, “lstrip” are ways toremove whitespace or selected charactersremove whitespace or selected characters

>>> line = " # This is a comment line \n">>> line = " # This is a comment line \n">>> line.strip()>>> line.strip()'# This is a comment line''# This is a comment line'>>> line.rstrip()>>> line.rstrip()' # This is a comment line'' # This is a comment line'>>> line.rstrip("\n")>>> line.rstrip("\n")' # This is a comment line '' # This is a comment line '>>>>>>

Page 15: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

More String methodsMore String methods

email.startswith(“c") endswith(“u”)email.startswith(“c") endswith(“u”)True/FalseTrue/False

>>> "%[email protected]" % "clin">>> "%[email protected]" % "clin"'[email protected]''[email protected]'

>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]>>> ", ".join(names)>>> ", ".join(names)‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘

>>> “chen".upper()>>> “chen".upper()‘‘CHEN'CHEN'

Page 16: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Unexpected things about stringsUnexpected things about strings

>>> s = "andrew">>> s = "andrew">>> s[0] = "A">>> s[0] = "A"Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item TypeError: 'str' object does not support item

assignmentassignment>>> s = "A" + s[1:]>>> s = "A" + s[1:]>>> s>>> s'Andrew‘'Andrew‘

Strings are read only

Page 17: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

““\” is for special characters\” is for special characters

\n -> newline\n -> newline

\t -> tab\t -> tab

\\ -> backslash\\ -> backslash

......

But Windows uses backslash for directories!filename = "M:\nickel_project\reactive.smi" # DANGER!

filename = "M:\\nickel_project\\reactive.smi" # Better!

filename = "M:/nickel_project/reactive.smi" # Usually works

Page 18: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Lists are mutable - some useful Lists are mutable - some useful methodsmethods

>>> ids = ["9pti", "2plv", "1crn"]>>> ids = ["9pti", "2plv", "1crn"]>>> ids.append("1alm")>>> ids.append("1alm")>>> ids>>> ids['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']>>>ids.extend(L)>>>ids.extend(L) Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.>>> del ids[0]>>> del ids[0]>>> ids>>> ids['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']>>> ids.sort()>>> ids.sort()>>> ids>>> ids['1alm', '1crn', '2plv']['1alm', '1crn', '2plv']>>> ids.reverse()>>> ids.reverse()>>> ids>>> ids['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']>>> ids.insert(0, "9pti")>>> ids.insert(0, "9pti")>>> ids>>> ids['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']

append an element

remove an element

sort by default order

reverse the elements in a list

insert an element at somespecified position.(Slower than .append())

Page 19: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Tuples: Tuples: sort of an immutable list

>>> yellow = (255, 255, 0) # r, g, b>>> yellow = (255, 255, 0) # r, g, b>>> one = (1,)>>> one = (1,)>>> yellow[0]>>> yellow[0]>>> yellow[1:]>>> yellow[1:](255, 0)(255, 0)>>> yellow[0] = 0>>> yellow[0] = 0Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment

Very common in string interpolation:>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056)'Andrew lives in Sweden at latitude 57.7'

Page 20: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

zipping lists togetherzipping lists together

>>> names>>> names['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']

>>> gender =>>> gender = [0, 0, 1] [0, 0, 1]

>>> zip(names, gender)>>> zip(names, gender)[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]

Page 21: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

DictionariesDictionaries Dictionaries are lookup tables. They map from a “key” to a “value”.

symbol_to_name = {"H": "hydrogen","He": "helium","Li": "lithium","C": "carbon","O": "oxygen","N": "nitrogen"

} Duplicate keys are not allowed Duplicate values are just fine

Page 22: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Keys can be any immutable valueKeys can be any immutable valuenumbers, strings, tuples, frozensetnumbers, strings, tuples, frozenset, ,

not list, dictionary, set, ...not list, dictionary, set, ...atomic_number_to_name = {atomic_number_to_name = {1: "hydrogen"1: "hydrogen"6: "carbon",6: "carbon",7: "nitrogen"7: "nitrogen"8: "oxygen",8: "oxygen",}}nobel_prize_winners = {nobel_prize_winners = {(1979, "physics"): ["Glashow", "Salam", "Weinberg"],(1979, "physics"): ["Glashow", "Salam", "Weinberg"],(1962, "chemistry"): ["Hodgkin"],(1962, "chemistry"): ["Hodgkin"],(1984, "biology"): ["McClintock"],(1984, "biology"): ["McClintock"],}}

A set is an unordered collection with no duplicate elements.

Page 23: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

DictionaryDictionary

>>> symbol_to_name["C"]>>> symbol_to_name["C"]'carbon''carbon'>>> "O" in symbol_to_name, "U" in symbol_to_name>>> "O" in symbol_to_name, "U" in symbol_to_name(True, False)(True, False)>>> "oxygen" in symbol_to_name>>> "oxygen" in symbol_to_nameFalseFalse>>> symbol_to_name["P"]>>> symbol_to_name["P"]Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>KeyError: 'P'KeyError: 'P'>>> symbol_to_name.get("P", "unknown")>>> symbol_to_name.get("P", "unknown")'unknown''unknown'>>> symbol_to_name.get("C", "unknown")>>> symbol_to_name.get("C", "unknown")'carbon''carbon'

Get the value for a given key

Test if the key exists(“in” only checks the keys,not the values.)

[] lookup failures raise an exception.Use “.get()” if you wantto return a default value.

Page 24: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Some useful dictionary methodsSome useful dictionary methods

>>> symbol_to_name.keys()>>> symbol_to_name.keys()['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']

>>> symbol_to_name.values()>>> symbol_to_name.values()['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']

>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )>>> symbol_to_name.items()>>> symbol_to_name.items()[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P', [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',

'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]

>>> del symbol_to_name['C']>>> del symbol_to_name['C']>>> symbol_to_name>>> symbol_to_name{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}

Page 25: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/Structure

list, string, tuple, dictionarylist, string, tuple, dictionaryControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 26: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Control FlowControl Flow

Things that are FalseThings that are False The boolean value False The numbers 0 (integer), 0.0 (float) and 0j (complex). The empty string "". The empty list [], empty dictionary {} and empty set set().Things that are TrueThings that are True The boolean value TrueThe boolean value True All non-zero numbers.All non-zero numbers. Any string containing at least one character.Any string containing at least one character. A non-empty data structure.A non-empty data structure.

Page 27: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

IfIf

>>> smiles = "BrC1=CC=C(C=C1)NN.Cl">>> smiles = "BrC1=CC=C(C=C1)NN.Cl">>> bool(smiles)>>> bool(smiles)TrueTrue>>> not bool(smiles)>>> not bool(smiles)FalseFalse>>> if not smiles>>> if not smiles::... print "The SMILES string is empty"... print "The SMILES string is empty"...... The “else” case is always optional

Page 28: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Use “elif” to chain subsequent testsUse “elif” to chain subsequent tests

>>> mode = "absolute">>> mode = "absolute">>> if mode == "canonical":>>> if mode == "canonical":... ... smiles = "canonical"smiles = "canonical"... elif mode == "isomeric":... elif mode == "isomeric":... ... smiles = "isomeric”smiles = "isomeric”... ... elif mode == "absolute": elif mode == "absolute":... ... smiles = "absolute"smiles = "absolute"... else:... else:... ... raise TypeError("unknown mode")raise TypeError("unknown mode")......>>> smiles>>> smiles' absolute '' absolute '>>>>>>

“raise” is the Python way to raise exceptions

Page 29: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Boolean logicBoolean logic

Python expressions can have “and”s and Python expressions can have “and”s and “or”s:“or”s:

if (ben if (ben <=<= 5 and chen 5 and chen >=>= 10 or 10 or

chen chen ==== 500 and ben 500 and ben !=!= 5): 5):

print “Ben and Chen“print “Ben and Chen“

Page 30: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Range TestRange Test

if (3 if (3 <= Time <=<= Time <= 5): 5):

print “Office Hour"print “Office Hour"

Page 31: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

ForFor

>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]

>>> for name in names:>>> for name in names:

... ... print smilesprint smiles

......

BenBen

ChenChen

YaqinYaqin

Page 32: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Tuple assignment in for loopsTuple assignment in for loops

data = [ ("C20H20O3", 308.371),data = [ ("C20H20O3", 308.371),("C22H20O2", 316.393),("C22H20O2", 316.393),("C24H40N4O2", 416.6),("C24H40N4O2", 416.6),("C14H25N5O3", 311.38),("C14H25N5O3", 311.38),("C15H20O2", 232.3181)]("C15H20O2", 232.3181)]

for for (formula, mw)(formula, mw) in data: in data:print "The molecular weight of %s is %s" % (formula, mw)print "The molecular weight of %s is %s" % (formula, mw)

The molecular weight of C20H20O3 is 308.371The molecular weight of C20H20O3 is 308.371The molecular weight of C22H20O2 is 316.393The molecular weight of C22H20O2 is 316.393The molecular weight of C24H40N4O2 is 416.6The molecular weight of C24H40N4O2 is 416.6The molecular weight of C14H25N5O3 is 311.38The molecular weight of C14H25N5O3 is 311.38The molecular weight of C15H20O2 is 232.3181The molecular weight of C15H20O2 is 232.3181

Page 33: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Break, continueBreak, continue

>>> for value in [3, 1, 4, 1, 5, 9, 2]:>>> for value in [3, 1, 4, 1, 5, 9, 2]:... ... print "Checking", value print "Checking", value... ... if value > 8: if value > 8:... ... print "Exiting for loop"print "Exiting for loop"... ... breakbreak... ... elif value < 3: elif value < 3:... ... print "Ignoring"print "Ignoring"... ... continuecontinue... ... print "The square is", value**2 print "The square is", value**2......

Use “break” to stopUse “break” to stopthe for loopthe for loop

Use “continue” to stopUse “continue” to stopprocessing the current itemprocessing the current item

Checking 3Checking 3The square is 9The square is 9Checking 1Checking 1IgnoringIgnoringChecking 4Checking 4The square is 16The square is 16Checking 1Checking 1IgnoringIgnoringChecking 5Checking 5The square is 25The square is 25Checking 9Checking 9Exiting for loopExiting for loop>>>>>>

Page 34: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Range()Range() ““range” creates a list of numbers in a specified rangerange” creates a list of numbers in a specified range range([start,] stop[, step]) -> list of integersrange([start,] stop[, step]) -> list of integers When step is given, it specifies the increment (or decrement).When step is given, it specifies the increment (or decrement).>>> range(5)>>> range(5)[0, 1, 2, 3, 4][0, 1, 2, 3, 4]>>> range(5, 10)>>> range(5, 10)[5, 6, 7, 8, 9][5, 6, 7, 8, 9]>>> range(0, 10, 2)>>> range(0, 10, 2)[0, 2, 4, 6, 8][0, 2, 4, 6, 8]

How to get every second element in a list?for i in range(0, len(data), 2):

print data[i]

Page 35: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/StructureControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 36: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Reading filesReading files

>>> f = open(“names.txt")>>> f = open(“names.txt")

>>> f.readline()>>> f.readline()

'Yaqin\n''Yaqin\n'

Page 37: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Quick WayQuick Way

>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst>>> lst['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office ['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office

Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', '[email protected]\n', 'Volen 110\n', 'Offiche Hour: '[email protected]\n', 'Volen 110\n', 'Offiche Hour: Tues. 3-5\n']Tues. 3-5\n']

Ignore the header?Ignore the header?for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):for (i,line) in enumerate(open(‘text.txt’,"r").readlines()): if i == 0: continueif i == 0: continue print lineprint line

Page 38: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Using dictionaries to count Using dictionaries to count occurrencesoccurrences

>>> for line in open('names.txt'):>>> for line in open('names.txt'):... ... name = line.strip()name = line.strip()... ... name_count[name] = name_count.get(name,0)+ name_count[name] = name_count.get(name,0)+

11... ... >>> for (name, count) in name_count.items():>>> for (name, count) in name_count.items():... ... print name, countprint name, count... ... Chen 3Chen 3Ben 3Ben 3Yaqin 3Yaqin 3

Page 39: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

File OutputFile Output

input_file = open(“in.txt")input_file = open(“in.txt")

output_file = open(“out.txt", "w")output_file = open(“out.txt", "w")

for line in input_file:for line in input_file:

output_file.write(line)output_file.write(line)“w” = “write mode”“a” = “append mode”“wb” = “write in binary”“r” = “read mode” (default)“rb” = “read in binary”“U” = “read files with Unixor Windows line endings”

Page 40: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/StructureControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 41: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

ModulesModules

When a Python program starts it only has access to a basic functions and classes.

(“int”, “dict”, “len”, “sum”, “range”, ...)“Modules” contain additional functionality.Use “import” to tell Python to load a

module.

>>> import math

>>> import nltk

Page 42: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

import the math moduleimport the math module>>> import math>>> import math>>> math.pi>>> math.pi3.14159265358979313.1415926535897931>>> math.cos(0)>>> math.cos(0)1.01.0>>> math.cos(math.pi)>>> math.cos(math.pi)-1.0-1.0>>> dir(math)>>> dir(math)['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','tanh', 'trunc']'tanh', 'trunc']>>> help(math)>>> help(math)>>> help(math.cos)>>> help(math.cos)

Page 43: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

““import” and “from ... import ...”import” and “from ... import ...”

>>> import math>>> import math

math.cosmath.cos

>>> from math import cos, pi

cos

>>> from math import *

Page 44: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/StructureControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 45: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

ClassesClassesclass ClassName(object): class ClassName(object):

<statement-1> <statement-1> . . . . . . <statement-N> <statement-N>

class MyClass(object): class MyClass(object): """A simple example class""" """A simple example class""" i = 12345 12345 def f(self): def f(self): return self.i return self.i

class DerivedClassName(BaseClassName): class DerivedClassName(BaseClassName): <statement-1> <statement-1> . . . . . . <statement-N> <statement-N>

Page 46: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

BackgroundBackgroundData Types/StructureData Types/StructureControl flowControl flowFile I/OFile I/OModulesModulesClassClassNLTKNLTK

Page 47: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

http://www.nltk.org/bookhttp://www.nltk.org/bookNLTK is on berry patch machines!NLTK is on berry patch machines!

>>>from nltk.book import * >>>from nltk.book import * >>> text1 >>> text1 <Text: Moby Dick by Herman Melville 1851><Text: Moby Dick by Herman Melville 1851>>>> text1.name>>> text1.name'Moby Dick by Herman Melville 1851''Moby Dick by Herman Melville 1851'>>> text1.concordance("monstrous") >>> text1.concordance("monstrous") >>> dir(text1)>>> dir(text1)>>> text1.tokens>>> text1.tokens>>> text1.index("my")>>> text1.index("my")46474647>>> sent2>>> sent2['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', ['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',

'Sussex', '.'] 'Sussex', '.']

Page 48: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Classify TextClassify Text

>>> def gender_features(word): >>> def gender_features(word):

... ... return {'last_letter': word[-1]} return {'last_letter': word[-1]}

>>> gender_features('Shrek') >>> gender_features('Shrek')

{'last_letter': 'k'} {'last_letter': 'k'}

>>> from nltk.corpus import names >>> from nltk.corpus import names

>>> import random >>> import random >>> names = ([(name, 'male') for name in names.words('male.txt')] + >>> names = ([(name, 'male') for name in names.words('male.txt')] +

... [(name, 'female') for name in names.words('female.txt')])... [(name, 'female') for name in names.words('female.txt')])

>>> random.shuffle(names) >>> random.shuffle(names)

Page 49: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

Featurize, train, test, predictFeaturize, train, test, predict

>>> featuresets = [(gender_features(n), g) for (n,g) in names] >>> featuresets = [(gender_features(n), g) for (n,g) in names]

>>> train_set, test_set = featuresets[500:], featuresets[:500] >>> train_set, test_set = featuresets[500:], featuresets[:500]

>>> classifier = nltk.NaiveBayesClassifier.train(train_set)>>> classifier = nltk.NaiveBayesClassifier.train(train_set)

>>> print nltk.classify.accuracy(classifier, test_set) >>> print nltk.classify.accuracy(classifier, test_set)

0.7260.726

>>> classifier.classify(gender_features('Neo')) >>> classifier.classify(gender_features('Neo'))

'male''male'

Page 50: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

from from nltknltk.corpus import .corpus import reutersreuters

Reuters Corpus:Reuters Corpus:10,788 news10,788 news 1.3 million words.1.3 million words. Been classified into Been classified into 9090 topics topicsGrouped into 2 sets, "training" and "test“Grouped into 2 sets, "training" and "test“Categories overlap with each other Categories overlap with each other

http://nltk.googlecode.com/svn/trunk/doc/http://nltk.googlecode.com/svn/trunk/doc/book/ch02.htmlbook/ch02.html

Page 51: Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

ReutersReuters

>>> from nltk.corpus import reuters >>> from nltk.corpus import reuters

>>> reuters.fileids() >>> reuters.fileids()

['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]

>>> reuters.categories() >>> reuters.categories() ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',

'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-

oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]