1 Introduction to Perl scripting Part 1 basic perl

Preview:

Citation preview

1

Introduction to Perl scripting

Part 1 basic perl

2

What is Perl?

Scripting language Practical Extraction and Reporting

Language Pathologically Eclectic Rubbish Lister 病态折中式电子列表器

3

How do I use Perl?

$ vi hello.plprint “hello world\n”;

$ perl hello.plhello world

$ vi add.plprint $ARGV[0] + $ARGV[1], “\n”;

$ perl add.pl 17 2542

4

Why Perl?

FAST text processing Simple Scripting language Cross-platform Many extensions for Biological data

5

TMTOWTDI

Motto: TMTOWTDI (There’s More Than One Way To Do It)

This can be frustrating to new users Focus on understanding what you are

doing, don’t worry about all the other ways yet.

6

Getting started

Primitives– String - “string”, ‘string’– Numeric - 10, 12e4, 1e-3, 120.0123

Data types– scalars - $var = “a”; $num = 10;– lists - @lst = (‘apple’, ‘orange’)– hashes - %hash=(1:’apple’, 2:’orange’)

7

Starter Code

# assign a variable$var = 12;print “var is $var\n”;

# concatenate strings$x = “Alice”;$y = $x . “ & Alex are cousins\n”;print $y;

# print can print lists of variablesprint $y, “var is “, $var, “\n“;

8

Tidbits

To print to screen– print “string”

Special chars – newline - “\n”– tab “\t”

strings and numeric conversion automatic All about context

9

Math

Standard arithmetic +, -, *, / mod operator %

- 4 % 2 = 0; 5 % 2 = 1 Operate on in place: $num += 3 Increment variable, $a++, $a-- power ** 25 = 2**5 sqrt(9) loge(5) = log(5)

- log10(100) = log(100) / log(10)

10

Precision

Round down int ($x) Round up POSIX::ceil( $x ) Round down POSIX::floor( $x ) Formatted printing printf/sprintf

– %d, %f, %5.2f, %g, %e– More coverage later one

11

Some Math Code

# Pythagorean theoremmy $a = 3; my $b = 4;my $c = sqrt($a**2 + $b**2);

# what’s left over from the divisionmy $x = 22; my $y = 6;my $div = int ( $x / $y );my $mod = $x % $y;print $div, “ “, $mod, “\n”;

output: 3 4

12

Logic & Equality

if / unless / elsif / else– if( TEST ) { DO SOMETHING }

elsif( TEST ) { SOMETHING ELSE }else { DO SOMETHING ELSE IN CASE }

Equality: == (numbers) and eq (strings) Less/Greater than: <, <=, >, >=

– lt, le, gt, ge for string (lexical) comparisons

13

Testing equality

$str1 = “mumbo”;$str2 = “jumbo”;

if( $str1 eq $str2 ) { print “strings are equal\n”;}if( $str1 lt $str2 ) { print “less” }else { print “more\n”;

if( $y >= $x ) { print “y is greater or equal\n”;}

14

Boolean Logic

AND – && and

OR– || or

NOT– ! not

if( $a > 10 && $a <= 20) { }

15

Loops

while( TEST ) { }until( ! TEST ) { }

for( $i = 0 ; $i < 10; $i++ ) {}

foreach $item ( @list ) { } for $item ( @list ) { }

16

Using logic

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 { print “$i is 0\n”; } elsif( $i / 2 == 0) { print “$i is even\n”; } else { print “$i is odd }}

17

What is truth?

True– if( “zero” ) {}– if( 23 || -1 || ! 0) {}– $x = “0 or none”; if( $x )

False– if( 0 || undef || ‘’ || “0” ) { }

18

Special variables

This is why many people dislike Perl Too many little silly things to remember

perldoc perlvar for detailed info

19

Some special variables

$! - error messages here $, - separator when doing print “@array”; $/ - record delimiter (“\n” usually) $a,$b - used in sorting $_ - implicit variable perldoc perlvar for more info

20

The Implicit variable

Implicit variable is $_ for ( @list ) { print $_ } while(<IN>) { print $_}

21

Input/Output: Getting and Writing Data

22

Getting Data from Files

open(HANDLE, “filename”) || die $!$line1 = <HANDLE>;while(defined($line = <HANDLE>)) { if( $line eq ‘line stuff’ ) { }}

open(HANDLE, “filename”) || die $!while(<HANDLE>){ print “line is $_”;}

open(HANDLE, “filename”) || die $!@slurp = <HANDLE>;

23

Data from Streams

while(<STDIN>) { print “stdin read: $_”;}

open(GREP, “grep ‘>’ $filename”) || die $!;my $i = 0;while(<GREP>) { $i++;}close(GREP);print “$i sequences in file\n”;

24

Can pass data into a program

while(<STDIN>) { print “stdin read: $_”;}open(GREP, “grep ‘>’ $filename”) || die $!;my $i = 0;while(<GREP>) { $i++;}close(GREP);print “$i sequences in file\n”;

25

Writing out data

open(OUT, “>outname”) || die $!;print OUT “sequence report\n”;close(OUT);

# appending with >>open(OUT, “>>outname”) || die $!;print OUT “appended this\n”;close(OUT);

26

Filehandles as variables

$var = \*STDIN open($fh, “>report.txt”) || die $!;

print $fh “line 1\n”;

open($fh2, “report”) || die $!;$fh = $fh2while(<$fh>) { }

27

String manipulation

28

Some string functions

. - concatenate strings– $together = $one . “ “. $two;

reverse - reverse a string (or array) length - get length of a string uc - uppercase or lc - lowercase a string

29

split/join

split: separate a string into a list based on a delimiter– @lst = split(“-”, “hello-there-mr-frog”);

join: make string from list using delimiter– $str = join(“ “, @lst);– Solves fencepost problem nicely

(want to put something between each pair of items in a list)

print join(“\t”, @lst),”\n”;

30

index

index(STRING, SUBSTRING, [STARTINGPOS]) Find the position of a substring within a string (left

to right scanning) $codon = ‘ATG’;

$str = AGCGCATCGCATGGCGATGCAGATG$first = index($str,$codon);$second = index($str, $codon, $first + length($codon));

rindex Same as index, but Right to Left scanning

31

substr

substr(STRING, START,[LENGTH],[REPLACE]);

Extract a substring from a larger string $orf = substr($str,10,40);

$end = substr($str,40); # get end Replace string

– substr($str,21,10,’NNNNNNNNNNN’);

32

Zero based economy...

1st number is ‘0’ for an index or 1st character in a string

– most programming languages Biologists often number 1st base in a

sequence as ‘1’ (GenBank, BioPerl) Interbase coordinates (Kent-UCSC,

Chado-GMOD)

33

Coordinate systems

Zero based, interbase coordinates A T G G G T A G A0 1 2 3 4 5 6 7 8 9

1 based coordinatesA T G G G T A G A1 2 3 4 5 6 7 8 9

34

Arrays and Lists

Lists are sets of items Can be mixed types of scalars (numbers,

strings, floats) Perl uses lists extensively Variables are prefixed by @

35

List operations

reverse - reverse list order $list[$n] - get the $n-th item

– $two = $list[2]; scalar - get length of array

– $len = scalar @list;– $last_index = $#list

delete $list[10] - delete entry

36

Autovivication

Automatically allocate space for an item $array[0] = ‘apple’;

print scalar @array, “ ”;$array[4] = ‘elephant’;$array[25] = ‘zebra fish’;print scalar @array, “ ”;delete $array[25];print scalar @array, “\n”;output:1 26 5

37

pop,push,shift,unshift

# remove last item$last = pop @list;

# remove first item$first = shift @list;

# add to end of listpush @list, $last;

# add to beginning of listunshift @list, $first;

38

splicing an array

splice ARRAY,OFFSET,LENGTH,LISTsplice ARRAY,OFFSET,LENGTHsplice ARRAY,OFFSETsplice ARRAY

@list = (‘alice’,’chad’,’rod’);($x,$y) = splice(@list,1,2);splice(@list, 1,0, (‘marvin’,’alex’));newlist: (‘alice’,’marvin’,’alex’,’chad’,’rod’);

39

Sorting with sort

@list = (‘tree’,’frog’, ‘log’);@sorted = sort @list;# reverse order@sorted = sort { $b cmp $a } @list;

# sort based on numerics@list = (25,21,12,17,9,8);@sorted = sort { $a <=> $b } @list;

# reverse order of sort@revsorted = sort { $b <=> $a } @list;

40

How would you sort based on part of string in list?

41

@list = (‘E1’,’F3’,‘A2’);@sorted = sort @list; # sort lexical

@sorted = sort { substr($a,1,1) <=> substr($b,1,1) } @list;

42

Filter with grep

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

@sl = grep { length($_) == 3} @list;

@oo = grep { index($_,”oo”) >= 0 } @list;# use it to countmy $ct = grep { substr($_,1,1) eq ‘a’} @list;

43

Transforming with map

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

@lens = map { length($_) } @list;

@upper = map { $fch = substr($_,0,1); substr($_,0,1,uc($fch)) } @list

44

More list action

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}

45

Sort complicated stuff

# want to sort these by gene number@list = (‘CG1000.1’, ‘CG0789.1’, ‘CG0321.1’, ‘CG1227.2’);@sorted = sort { ($locus_a) = split(/\./,$a); ($locus_b) = split(/\./,$b); substr($locus_a,0,2,’’); substr($locus_b,0,2,’’); $locus_a cmp $locus_b; } @list;print “sorted are “,join(“,”,@sorted), “\n”;

46

Scope

The section of program a variable is valid for

Defined by braces { } use strict; Use ‘my’ to declare variables

#!/usr/bin/perl -wuse strict;

my $var = 10;my $var2 = ‘monkey’;print “(outside) var is $var\n”. “(outside) var2 is $var2\n”;{ my $var; $var = 20; print “(inside) var is $var\n”; $var2 = ‘ape’; }print “(outside) var is $var\n”. “(outside) var2 is $var2\n”;

48

Good practices

Declare variables with ‘my’ Always ‘use strict’ ‘use warnings’ to get warnings

49

Let’s practice (old code)

@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}

50

Let’s practice

#!/usr/bin/perluse warningsuse strict;my @list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);

for my $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}

51

Editors

vi filename – begin by using this editor

52

Make a perl script

$ pico hello.pl

#!/usr/bin/perlprint “hello world\n”;

[Control-O , enter, Control-X enter]

$ perl hello.plhello world$ chmod +x hello.pl$ ./hello.pl

Recommended