Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
TEXT MINING
INTRO TO PYTHON
Mattias Villani
StatisticsDept. of Computer and Information Science
Linköping University
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 1 / 21
OVERVIEW
I What is Python? How is it special?I Python’s objectsI If-else, loops and list comprehensionsI FunctionsI ClassesI Modules
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 2 / 21
WHAT IS PYTHON?
I First version in 1991I High-level languageI Emphasizes readabilityI Interpreted (bytecode .py and .pyc) [can be compiled via C/Java]I Automatic memory managementI Strongly dynamically typedI Functional and/or object-orientedI Glue to other programs (interface to C/C++ or Java etc)
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 3 / 21
THE BENEVOLENT DICTATOR FOR LIFE (BDFL)GIUDO VAN ROSSUM
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 4 / 21
PYTHON PECULIARITES (COMPARED TO R/MATLAB)
I Counting begins at 0.I myVector[0:2] returns the first and second element, but not the third.I 3/2 = 1. Integer division. from __future__ import division.I Indentation matters!I Can import specific functions from a module.I Some variable assignments are by reference, others are by copy.I a = b = 1 assigns 1 to both a and b.
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 5 / 21
PYTHON’S OBJECTS
I Built-in types: numbers, strings, lists, dictionaries, tuples and files.I Vectors, arrays and matrices are available in the numpy/scipy
modules.I Python is a strongly typed language. ’mattias’ + 3 gives an error.I Python is a dynamically typed language. No need to declare a
variables type before it is used. Python figures out the object’s type.
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 6 / 21
STRINGS
I s = ’Spam’I s[0] returns first letter, s[-2] return next to last letter. s[0:2]
returns first two letter.I len(s) returns the number of letters.I s.lower(), s.upper(), s.count(’m’), s.endswith(’am’),
...I Which methods are available for my object? Try in Spyder: type
s. followed by TAB.I + operator concatenates strings.I (behind the scenes: the string object has an __add__ method:
s.__add__(anotherString))I sentence = ’Guido is the benevolent dictator for life’. sentence.split()I s*3 returns ’SpamSpamSpam’
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 7 / 21
THE LIST OBJECT
I A list is a container of several variables, possibly of different types.I myList = [’spam’,’spam’,’bacon’,2]I The list object has several associated methods
I myList.append(’egg’)I myList.count(’spam’)I myList.sort()
I + operator concatenates lists: myList + myOtherList merges thetwo lists as one list.
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 8 / 21
THE LIST OBJECT
I Extract elements from a list: myList[1]I Lists inside lists:
I myOtherList = [’monty’,’Python’]I myList[1] = myOtherListI myList[1] returns the list [’monty’,’Python’]I myList[1][1] returns the string ’Python’
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 9 / 21
PYTHON’S OBJECTS: VECTORS AND ARRAYS (AND
MATRICES)I from scipy import *I x = array([1,7,3])I 2-dimensional array (matrix): X = array([[2,3],[4,5]])I Indexing matrices
I First row: X[0,]I Second column: X[,1]I Element in position 1,2: X[0,1]
I Array multiplication (*) is element-wise.I There is also a matrix object: X = array([[2,3],[4,5]])I For matrix objects multplication (*) is matrix multiplication.I Arrays are recommended (not matrices).I Submodule scipy.linalg contains a lot of matrix-functions (det(),
inv(), eig() etc). I recommend: from scipy.linalg import *
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 10 / 21
PYTHON’S OBJECTS: DICTIONARIES
I Unordered collection of objects (elements).I myDict = {’Leif’:23, ’Dag’:17, ’Lyam’:12}I Elements are accessed by keyword not by index (offset):
myDict[’Dag’] returns 17.I Can contain any object: myDict = {’Leif’:[23,14],
’Dag’:17, ’Lyam’:[12,29]}. myDict[’Leif’][1] returns 14.I Numbers can also be used as keys: myDict = {2:’contents of
box2’, 4:’content of box 4’, ’blackbox’:10}I myDict.keys()I myDict.values()I myDict.items()
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 11 / 21
PYTHON’S OBJECTS: TUPLES
I myTuple = (3,4,’mattias’)I Like lists, but immutable (cannot change elements are creation)I Why?
I Faster than listsI Protected from changeI Can be used as keys in dictionariesI Multiple return object from functionI Swapping variable content (a, b) = (b, a) [a,b = b,a also works]I String formatting: name = "Mattias"; age = 39; "My name is%s and I am %d years old" % (name , age)
I Sequence unpacking a , b, c = myTuple
I list(myTuple) returns myTuple as a list. tuple(myList) does theopposite.
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 12 / 21
PYTHON’S OBJECTS: SETS
I Set. Contains objects in no order with no identification.I With a sequence, elements are ordered and identified by position.myVector[2]
I With a dictionary, elements are unordered but identified by some key.myDict[’myKey’]
I With a set, elements stand for themselves. No indexing, nokey-reference.
I Declaration: fib=set( (1,1,2,3,5,8,13) ) returns the set ([1,2, 3, 5, 8, 13])
I Supported methods: len(s), x in s, set1 < set2, union,intersection, add, remove, pop ...
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 13 / 21
BOOLEAN OPERATORS
I True/FalseI andI orI notI a = True; b = False; a and b [returns False].
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 14 / 21
IF-ELSE CONSTRUCTS
IF-ELSE STATEMENTa =1if a==1:
print(’a is one’)elif a==2:
print(’a is one’)else:
print(’a is not one or two’)}
I Switch statements via dictionaries (see Jackson’s Python book).
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 15 / 21
WHILE LOOPS
WHILE LOOPa =10while a>1:
print(’bigger than one’)a = a - 1
else:print(’smaller than one’)
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 16 / 21
LOOPS
I for loops can iterate over any iterable.I iterables: strings, lists, tuples
FOR LOOPword = ’mattias’for letter in word:
print(letter)
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 17 / 21
LOOPS, CONT.
FOR LOOP 2myList = [”]*10for i in range(10):
myList = ’mattias’ + str(i)
FOR LOOP 2myList = [”]*10for i in range(10):
myList = ’mattias’ + str(i)
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 18 / 21
LIST COMPREHENSIONS
I Set definition in mathematics
{x for x ∈ X}
where X is some a finite set.
{f (x) for x ∈ X}
I List comprehension in Python:I myList = [x for x in range(10)]I myList = [sin(x) for x in range(10)] (don’t forget from mathimport sin)
I myList = [x + y for x in linspace(0.1,1,10) for y inlinspace(10,100,10)] (from scipy import linspace)
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 19 / 21
DEFINING FUNCTIONS AND CLASSES
DEFINING FUNCTIONSdef mySquare(x):
return x**2
I Calling the function: mySquare(x)I Classes are defined similarly using the self object.I Make you own module by putting several functions in a .py file. Then
import what you need.
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 20 / 21
MISC
I Comments one individual lines starts with #I Comments spanning over multiple lines """This is a looooong
comment” ” ”
MATTIAS VILLANI (STATISTICS, LIU) TEXT MINING 21 / 21