Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Perl for Linguists
Michael Hammond
University of Arizona
Perl for Linguists – p.1/38
Perl for linguists
Perl for Linguists – p.2/38
Perl for linguists
� Why programming?
Perl for Linguists – p.2/38
Perl for linguists
� Why programming? Why Perl?
Perl for Linguists – p.2/38
Perl for linguists
� Why programming? Why Perl?
� Learn a little Perl.
Perl for Linguists – p.2/38
Perl for linguists
� Why programming? Why Perl?
� Learn a little Perl.
� More advanced Perl. . .
Perl for Linguists – p.2/38
Why programming?
Perl for Linguists – p.3/38
Why programming?
� Collect data
Perl for Linguists – p.3/38
Why programming?
� Collect data
� Analyze data
Perl for Linguists – p.3/38
Why programming?
� Collect data
� Analyze data
� Model theory
Perl for Linguists – p.3/38
Why programming?
� Collect data
� Analyze data
� Model theory
� General professional skills
Perl for Linguists – p.3/38
Code for this tutorial
All the code for this presentation is available overthe web at the following URL:
http://linguistics.arizona.edu/~hammond/berkeley.html
Perl for Linguists – p.4/38
Unzipping the programs
1. Download the file programfiles.zip byright-clicking on the link, selecting ‘save targetas’, and selecting your desktop as thedestination.
2. Unzip the file by double-clicking on thedownloaded file on your desktop and movingall the files there to a new directory on thedesktop.
3. Open the MS-DOS prompt in the newdirectory by double-clicking on the filedoswindow.bat.
Perl for Linguists – p.5/38
Collecting data
� Running experiments locally (expprog.pl)
Perl for Linguists – p.6/38
Collecting data
Running experiments locally (expprog.pl)
Running experiments locally with a GUI(tkexp.pl)
Perl for Linguists – p.6/38
Collecting data
Running experiments locally (expprog.pl)
Running experiments locally with a GUI(tkexp.pl)
Running experiments remotely(Bailey & Hahn replication, bhrep.cgi)
Perl for Linguists – p.6/38
Collecting data
� Running experiments locally (expprog.pl)
� Running experiments locally with a GUI(tkexp.pl)
� Running experiments remotely(Bailey & Hahn replication, bhrep.cgi)
� Assembling corpora from local staticresources (makecorpus.pl)
Perl for Linguists – p.6/38
Collecting data
� Running experiments locally (expprog.pl)
� Running experiments locally with a GUI(tkexp.pl)
� Running experiments remotely(Bailey & Hahn replication, bhrep.cgi)
� Assembling corpora from local staticresources (makecorpus.pl)
� Assembling corpora from nonlocal dynamicresources: the web (websearch.pl)
Perl for Linguists – p.6/38
Analyzing data
Perl for Linguists – p.7/38
Analyzing data
Looking for patterns (visgrep.pl)
Perl for Linguists – p.7/38
Analyzing data
� Looking for patterns (visgrep.pl)
� Counting things (neightk.pl)
Perl for Linguists – p.7/38
Analyzing data
� Looking for patterns (visgrep.pl)
� Counting things (neightk.pl)
� Finding verbs (verbs.pl)
Perl for Linguists – p.7/38
Modeling theory
Perl for Linguists – p.8/38
Modeling theory
� Optimality Theory (web interface,sylpars.pl)
Perl for Linguists – p.8/38
Modeling theory
� Optimality Theory (web interface,sylpars.pl)
� N-gram models (a bunch of examples from acourse on Statistical NLP that I did recently)
Perl for Linguists – p.8/38
General professional skills
Perl for Linguists – p.9/38
General professional skills
� General programming skills
Perl for Linguists – p.9/38
General professional skills
� General programming skills
� Web programming
Perl for Linguists – p.9/38
Why Perl?
Perl for Linguists – p.10/38
Why Perl?
� Free
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
� Multiple dialects
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
� Multiple dialects
� Powerful regular expression tools
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
� Multiple dialects
� Powerful regular expression tools
� Written by a “linguist”
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
� Multiple dialects
� Powerful regular expression tools
� Written by a “linguist”
� Perl poetry
Perl for Linguists – p.10/38
Why Perl?
� Free
� Multi-platform
� Easy
� Multiple dialects
� Powerful regular expression tools
� Written by a “linguist”
� Perl poetry
� Obfuscated perl, “japhs”, etc.
Perl for Linguists – p.10/38
Learn a little perl. . .
Perl for Linguists – p.11/38
Learn a little perl. . .
� Windows basics
Perl for Linguists – p.11/38
Learn a little perl. . .
� Windows basics
� Perl syntax
Perl for Linguists – p.11/38
Learn a little perl. . .
� Windows basics
� Perl syntax
� IO
Perl for Linguists – p.11/38
Learn a little perl. . .
� Windows basics
� Perl syntax
� IO
� Regular expressions
Perl for Linguists – p.11/38
Learn a little perl. . .
Windows basics
Perl syntax
IO
Regular expressions
Where to find out more
Perl for Linguists – p.11/38
Windows basics
Perl for Linguists – p.12/38
Windows basics
! It is easiest to invoke perl programs from theDOS prompt. (If configured properly, you canjust double-click them though.)
Perl for Linguists – p.12/38
Windows basics
" It is easiest to invoke perl programs from theDOS prompt. (If configured properly, you canjust double-click them though.)
" perl myprogram.pl: invokes the perlinterpreter with a program myprogram.pl.
Perl for Linguists – p.12/38
Windows basics
# It is easiest to invoke perl programs from theDOS prompt. (If configured properly, you canjust double-click them though.)
# perl myprogram.pl: invokes the perlinterpreter with a program myprogram.pl.
# ctrl-c: stops the current program
Perl for Linguists – p.12/38
A goal
$ We will build our efforts around an exampleprogram: a program that parses a text file inEnglish into sentences and then tries to findall the verbs (verbs.pl).
$ This exemplifies simple data and controlstructures and what we might call“computational linguistic reasoning”.
$ We won’t get to the full version of thisprogram—it’s infinitely expandableactually—but we can get a good start.
Perl for Linguists – p.13/38
Programming overview
Perl for Linguists – p.14/38
Programming overview
% Implement some idea as perl code (write theprogram).
Perl for Linguists – p.14/38
Programming overview
& Implement some idea as perl code (write theprogram).
& Convert code into something the computercan execute (run perl on your code).
Perl for Linguists – p.14/38
Programming overview
' Implement some idea as perl code (write theprogram).
' Convert code into something the computercan execute (run perl on your code).
' Program execution (run perl on your code).
Perl for Linguists – p.14/38
Programming overview
( Implement some idea as perl code (write theprogram).
( Convert code into something the computercan execute (run perl on your code).
( Program execution (run perl on your code).
( Results (what happens as a consequence).
Perl for Linguists – p.14/38
Perl syntax
Perl for Linguists – p.15/38
Perl syntax
) A perl program is a series of statements.
Perl for Linguists – p.15/38
Perl syntax
* A perl program is a series of statements.
* Statements can be organized into groups.
Perl for Linguists – p.15/38
Perl syntax
+ A perl program is a series of statements.
+ Statements can be organized into groups.
+ Statements are operations on some bit ofdata, e.g. “print this string”, “add thesenumbers”, etc.
Perl for Linguists – p.15/38
Perl syntax
, A perl program is a series of statements.
, Statements can be organized into groups.
, Statements are operations on some bit ofdata, e.g. “print this string”, “add thesenumbers”, etc.
, Groups of statements can apply:1. only when specific conditions hold, or2. more than once, etc.
Perl for Linguists – p.15/38
Statements
Perl for Linguists – p.16/38
Statements
- A statement includes some operation(typically marked with following parentheses),ends with a semicolon, and may containmore. For example:
Perl for Linguists – p.16/38
Statements
. A statement includes some operation(typically marked with following parentheses),ends with a semicolon, and may containmore. For example:
. print("Hello!"); (prints the string“Hello!”)
Perl for Linguists – p.16/38
Statements
/ A statement includes some operation(typically marked with following parentheses),ends with a semicolon, and may containmore. For example:
/ print("Hello!"); (prints the string“Hello!”)
/ rand(5); (gets a random number between 0and 5)
Perl for Linguists – p.16/38
Statements
0 A statement includes some operation(typically marked with following parentheses),ends with a semicolon, and may containmore. For example:
0 print("Hello!"); (prints the string“Hello!”)
0 rand(5); (gets a random number between 0and 5)
0 localtime(); (gets the current time)
Perl for Linguists – p.16/38
Try it
Perl for Linguists – p.17/38
Try it
1. Open the DOS window (through the startmenu or by clicking on the doswindow.baticon.)
Perl for Linguists – p.17/38
Try it
1. Open the DOS window (through the startmenu or by clicking on the doswindow.baticon.)
2. Type edit myprogram.pl
Perl for Linguists – p.17/38
Try it
1. Open the DOS window (through the startmenu or by clicking on the doswindow.baticon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
Perl for Linguists – p.17/38
Try it
1. Open the DOS window (through the startmenu or by clicking on the doswindow.baticon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the Filemenu.
Perl for Linguists – p.17/38
Try it
1. Open the DOS window (through the startmenu or by clicking on the doswindow.baticon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the Filemenu.
5. Type perl myprogram.pl at the prompt.
Perl for Linguists – p.17/38
Variables
Perl for Linguists – p.18/38
Variables
1 Notice how rand() and localtime() don’tseem to do anything.
Perl for Linguists – p.18/38
Variables
2 Notice how rand() and localtime() don’tseem to do anything.
2 What they do is return a string. You can savethat result and then print it:
Perl for Linguists – p.18/38
Variables
3 Notice how rand() and localtime() don’tseem to do anything.
3 What they do is return a string. You can savethat result and then print it:
$myvariable = localtime();print($myvariable);
Perl for Linguists – p.18/38
Variables
Perl for Linguists – p.19/38
Variables
4 Simple “scalar” variables: $myvar,$aVariable, $x.
Perl for Linguists – p.19/38
Variables
5 Simple “scalar” variables: $myvar,$aVariable, $x.
5 A value is assigned to a variable with theassignment operator =.
Perl for Linguists – p.19/38
Variables
6 Simple “scalar” variables: $myvar,$aVariable, $x.
6 A value is assigned to a variable with theassignment operator =.
6 Variables can hold numbers, strings, etc.
Perl for Linguists – p.19/38
Reading a file
Perl for Linguists – p.20/38
Reading a file
7 First use open() to open a file and associateit with a handle, e.g.open(F, "myfile.txt");.
Perl for Linguists – p.20/38
Reading a file
8 First use open() to open a file and associateit with a handle, e.g.open(F, "myfile.txt");.
8 Read a line from the file with record-readingoperator: 9
F
: .
Perl for Linguists – p.20/38
Reading a file
; First use open() to open a file and associateit with a handle, e.g.open(F, "myfile.txt");.
; Read a line from the file with record-readingoperator: <
F
= .
; When you’re done, close the handle:close(F);.
Perl for Linguists – p.20/38
Sample code
open(F,"myfile.txt");$line = <F>;print($line);close(F);
Perl for Linguists – p.21/38
Control structures
> if: conditional application
> while: iteration as long as some condition istrue
> for: iteration for a specific number of times
> foreach: iteration for every member of a list
Perl for Linguists – p.22/38
More sample code
open(F,"myfile.txt");while ($line = <F>) {print($line);
}close(F);
Perl for Linguists – p.23/38
Arrays
Perl for Linguists – p.24/38
Arrays
? Array variables: a list of variables groupedtogether: @myarray, @thisBigArray,@them.
Perl for Linguists – p.24/38
Arrays
@ Array variables: a list of variables groupedtogether: @myarray, @thisBigArray,@them.
@ Adding an element to the end of an array:push(@myarray, "hat");.
Perl for Linguists – p.24/38
Arrays
A Array variables: a list of variables groupedtogether: @myarray, @thisBigArray,@them.
A Adding an element to the end of an array:push(@myarray, "hat");.
A Retrieving an element from the end of anarray: $it = pop(@myarray);.
Perl for Linguists – p.24/38
Arrays
B Array variables: a list of variables groupedtogether: @myarray, @thisBigArray,@them.
B Adding an element to the end of an array:push(@myarray, "hat");.
B Retrieving an element from the end of anarray: $it = pop(@myarray);.
B Retrieving a specific element:print($myarray[4]); (array indices startat 0).
Perl for Linguists – p.24/38
Sample code again
open(F,"myfile.txt");while ($line = <F>) {push(@mylines, $line);
}close(F);foreach $theline (@mylines) {print($theline);
}
Perl for Linguists – p.25/38
Regular expressions
Perl for Linguists – p.26/38
Regular expressions
C “Regular expression”: a restrictive way ofindicating string patterns.
Perl for Linguists – p.26/38
Regular expressions
D “Regular expression”: a restrictive way ofindicating string patterns.
D $myvar =~ /regexp/: does $myvarcontain the regular expression?
Perl for Linguists – p.26/38
Regular expressions
E “Regular expression”: a restrictive way ofindicating string patterns.
E $myvar =~ /regexp/: does $myvarcontain the regular expression?
E $myvar =~ s/regexp/string/: if$myvar contains the regular expression,replace it with the string.
Perl for Linguists – p.26/38
The verb-finding program
Perl for Linguists – p.27/38
The verb-finding program
F Recall that the program breaks an Englishtext file into sentences and does its best atfinding all the verbs.
Perl for Linguists – p.27/38
The verb-finding program
G Recall that the program breaks an Englishtext file into sentences and does its best atfinding all the verbs.
G An hour is not enough time to figure out howto do all of this, but just the bits we’ve coveredare a lot of it (verbs.pl).
Perl for Linguists – p.27/38
Where to find out more
H The free ActiveState Perl we’re using comeswith extensive web-based documentation.
Perl for Linguists – p.28/38
Where to find out more
I The free ActiveState Perl we’re using comeswith extensive web-based documentation.
I In any perl implementation the perldoccommand can be used to find out lots andlots of stuff.
Perl for Linguists – p.28/38
Where to find out more
J The free ActiveState Perl we’re using comeswith extensive web-based documentation.
J In any perl implementation the perldoccommand can be used to find out lots andlots of stuff.
J The official and best perl website iswww.cpan.org, but see also www.perl.org.
Perl for Linguists – p.28/38
Advanced stuff
Perl for Linguists – p.29/38
Advanced stuff
K Let’s now do some more advanced stuff.
Perl for Linguists – p.29/38
Advanced stuff
L Let’s now do some more advanced stuff.
L ‘Advanced’ in terms of perl.
Perl for Linguists – p.29/38
Advanced stuff
M Let’s now do some more advanced stuff.
M ‘Advanced’ in terms of perl.
M ‘Advanced’ in terms of linguistics.
Perl for Linguists – p.29/38
Advanced perl
Perl for Linguists – p.30/38
Advanced perl
N Object-oriented programming and public perlmodules
Perl for Linguists – p.30/38
Advanced perl
O Object-oriented programming and public perlmodules
O Tk (editperl.pl, visgrep.pl)
Perl for Linguists – p.30/38
Advanced perl
P Object-oriented programming and public perlmodules
P Tk (editperl.pl, visgrep.pl)
P Remote computing (bhexp.cgi, dbiex.pl,websearch.pl)
Perl for Linguists – p.30/38
Object-oriented programming
Perl for Linguists – p.31/38
Object-oriented programming
Q Object-oriented (OO) programming is anotherstyle of programming.
Perl for Linguists – p.31/38
Object-oriented programming
R Object-oriented (OO) programming is anotherstyle of programming.
R OO Programs are a network of things orobjects, not a list of commands.
Perl for Linguists – p.31/38
Object-oriented programming
S Object-oriented (OO) programming is anotherstyle of programming.
S OO Programs are a network of things orobjects, not a list of commands.
S Objects have their own data and specializedfunctions for dealing with their own data.
Perl for Linguists – p.31/38
Object-oriented programming
T Object-oriented (OO) programming is anotherstyle of programming.
T OO Programs are a network of things orobjects, not a list of commands.
T Objects have their own data and specializedfunctions for dealing with their own data.
T Objects can be refer to other objects or inheritthe properties of other objects.
Perl for Linguists – p.31/38
Translating to OO
Perl for Linguists – p.32/38
Translating to OO
U Original verb-finding program: verbs.pl.
Perl for Linguists – p.32/38
Translating to OO
V Original verb-finding program: verbs.pl.
V OO verb-finding program: verbsOO.pl.
Perl for Linguists – p.32/38
Caveats
Perl for Linguists – p.33/38
Caveats
W OO programs are longer.
Perl for Linguists – p.33/38
Caveats
X OO programs are longer.
X OO programs are slower.
Perl for Linguists – p.33/38
Caveats
Y OO programs are longer.
Y OO programs are slower.
Y OO programming in perl is not orthodox OO(lots of “oddments”).
Perl for Linguists – p.33/38
Caveats
Z OO programs are longer.
Z OO programs are slower.
Z OO programming in perl is not orthodox OO(lots of “oddments”).
Z 00 programming isn’t intuitive for most folks.
Perl for Linguists – p.33/38
Caveats
[ OO programs are longer.
[ OO programs are slower.
[ OO programming in perl is not orthodox OO(lots of “oddments”).
[ 00 programming isn’t intuitive for most folks.
[ Unfortunately, 00 programming is essential forsome modules.
Perl for Linguists – p.33/38
Tk
Perl for Linguists – p.34/38
Tk
\ Tk is an independent language for makinggraphical user interfaces.
Perl for Linguists – p.34/38
Tk
] Tk is an independent language for makinggraphical user interfaces.
] There is a special perl module so that youcan build GUIs indirectly using Tk.
Perl for Linguists – p.34/38
Tk
^ Tk is an independent language for makinggraphical user interfaces.
^ There is a special perl module so that youcan build GUIs indirectly using Tk.
^ This module is widely available for Unix/Linuxand Windows, but not clear if it’s available forMacs.
Perl for Linguists – p.34/38
Tk
_ Tk is an independent language for makinggraphical user interfaces.
_ There is a special perl module so that youcan build GUIs indirectly using Tk.
_ This module is widely available for Unix/Linuxand Windows, but not clear if it’s available forMacs.
_ If you want to write programs in Perl thatreally look like modern computer programs,then you need to use Tk.
Perl for Linguists – p.34/38
GUI coding
Perl for Linguists – p.35/38
GUI coding
` Tk programs use OO programming style;buttons, windows, etc. are all “objects”.
Perl for Linguists – p.35/38
GUI coding
a Tk programs use OO programming style;buttons, windows, etc. are all “objects”.
a You create these objects and then yourprogram waits for the user to interact withthem.
Perl for Linguists – p.35/38
GUI coding
b Tk programs use OO programming style;buttons, windows, etc. are all “objects”.
b You create these objects and then yourprogram waits for the user to interact withthem.
b makecorpus.pl makecorpusTk.pl
Perl for Linguists – p.35/38
Remote computing
Perl for Linguists – p.36/38
Remote computing
c CGI (“Common Gateway Interface”):programs that run remotely (generatingjavascript and HTML: bhexp.cgi)
Perl for Linguists – p.36/38
Remote computing
d CGI (“Common Gateway Interface”):programs that run remotely (generatingjavascript and HTML: bhexp.cgi)
d Interacting with local or remote databases(generating sql: dbiex.pl)
Perl for Linguists – p.36/38
Remote computing
e CGI (“Common Gateway Interface”):programs that run remotely (generatingjavascript and HTML: bhexp.cgi)
e Interacting with local or remote databases(generating sql: dbiex.pl)
e Interacting with the web generally(websearch.pl)
Perl for Linguists – p.36/38
Modeling
Perl for Linguists – p.37/38
Modeling
f Perl not good for computational clarity; notwell-defined
Perl for Linguists – p.37/38
Modeling
g Perl not good for computational clarity; notwell-defined
g Perl exceptionally good at string processing
Perl for Linguists – p.37/38
Modeling
h Perl not good for computational clarity; notwell-defined
h Perl exceptionally good at string processing
h Computational tasks as string processing:sylpars.pl
Perl for Linguists – p.37/38
What now?
Perl for Linguists – p.38/38
What now?
i Programming and perl hopefullydemystified. . .
Perl for Linguists – p.38/38
What now?
j Programming and perl hopefullydemystified. . .
j Some ideas about what you can do with it ifyou’re thinking that you need to program.
Perl for Linguists – p.38/38
What now?
k Programming and perl hopefullydemystified. . .
k Some ideas about what you can do with it ifyou’re thinking that you need to program.
k If you already know some perl, perhaps someother ideas about what to do with what youalready know.
Perl for Linguists – p.38/38