55
BINF634 FALL013 LECTURE 1 1 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: [email protected] Office Hours: By appointment Required texts: Beginning Perl for Bioinformatics by Tisdall and Waliszewski Programming Perl (3rd Edition) by Wall, Christiansen and Orwant Course Meeting Place: Ocaquan Prince William Rm. 304B Course Meeting Times: M: 4:30 pm – 7:10 pm Course webpage http://binf.gmu.edu/~jsolka/fall13/binf634/ Fall_2013BINF_634_Syllabus_rev1.html

BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: [email protected]

Embed Size (px)

Citation preview

Page 1: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 1

BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: [email protected] Office Hours: By appointment Required texts:

Beginning Perl for Bioinformatics by Tisdall and Waliszewski Programming Perl (3rd Edition) by Wall, Christiansen and Orwant

Course Meeting Place: Ocaquan Prince William Rm. 304B Course Meeting Times: M: 4:30 pm – 7:10 pm Course webpage

http://binf.gmu.edu/~jsolka/fall13/binf634/Fall_2013BINF_634_Syllabus_rev1.html

Page 2: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 2

Acknowledgements

Some of the material used in this course was previously developed by

John Grefenstette John Kopecky

Page 3: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 3

Experimental Biology Computational Biology and Bioinformatics

Database

Problem Statement Simulation Results

Problem Statement Experiment Results

Analysis Tools

SIMS

LIMS

Computational Biology

Experimental BiologyRick Stevens

Page 4: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 4

Bioinformatics Programming Tasks Manage large experimental data sets

Sequence data Microarray data (gene expression) Mass spec data (proteomics) Genotype project data (HapMap) Clinical data

Build tools for Knowledge Discovery Find motifs in sequence data Data clustering Visualization

Build analysis pipelines Glue several analysis steps together into a single automated

process "Munge" data: Take data from one application or database and

format it for input to another application of database

Page 5: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 5

Objectives Programming skills

Problem solving and Debugging Reading and Writing Documentation Data Munging: Data filtering and transformation Pattern matching and data mining Visualization and web presentation Object-oriented programming

Bioinformatics skills Biological sequence analysis Interacting with biological databases Using Bioperl

Page 6: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 6

Background and Prerequisites Molecular Biology

BIOL 482 or similar course Recombinant DNA - Watson, Gilman, Witlowski,

Zoller http://www.amazon.com/Recombinant-DNA-Genes-Genomes-

Course/dp/0716728664/ref=dp_ob_title_bk Online Tutorials

http://www.biology-online.org/1/5_DNA.htm

Computer Science IT 108, CS 112 or similar Previous programming experience

Page 7: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 7

Course Policies Programming assignments (50%)

3-4 graded programming assignments Exams: Midterm (20%) and Final (20%)

May include both closed-book section and open-book programming problems

In-class Quizzes (10%) Weekly homework assignments

All HW assignments must be submitted to me via email by the beginning of the next class. HW assignments will not be graded individually, but you may be called upon to discuss your work during the next class. Therefore, late assignments will not be accepted.

Grading will be on the following scale. 93-100  (A), 90-92 (A-), 87-89 (B+), 83-86 (B), 80-82(B-), 77-79 (C+), 73-76 (C), Below 70 F. Student averages will be rounded to the closest integer to determine final letter grades.

Keep an eye on the webpage http://binf.gmu.edu/~jsolka/fall13/binf634/

Fall_2013BINF_634_Syllabus_rev1.html

Page 8: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 8

Honor Code Policies I take honor code violations very seriously.

Programming assignments must be your work. Each assignment will specify whether you may use code from other sources. Any material you take from another source must be acknowledged within the program documentation. You must read and understand the honor code handout. Violations of the honor code WILL be referred to the Honor Council.

All students must adhere to the GMU Honor Code: See: http://honorcode.gmu.edu/

Page 9: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 9

Pragmatics

Assignments and Announcement Will be posted on course wepage; check daily Class email will be sent to your email address from Patriot Web

Accounts You should have an account on the server binf.gmu.edu Systems administrator: Chris Ryan, [email protected]

Accessing perl: Login from Rooms 304B or 320 Login from off-campus using ssh

Go to ftp://ftp.ssh.com/pub/ssh/ for academic Windows client Alternatively go to

http://www.chiark.greenend.org.uk/~sgtatham/putty/ Install perl on your own computer -- see textbooks and backup slide

materials

Page 10: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

Pragmatics Unix

This class will focus on using the Unix operating system We will be using Mac OS X (at least in the classroom)

There are numerous UNIX tutorials http://www.unixtools.com/tutorials.html

Text Editors Perl program are stored in plain text files I recommend emacs or vim for a Unix text editor (see links for windows

support) http://www.claremontmckenna.edu/math/ALee/emacs/emacs.html http://www.vim.org

If you are interested in an integrated development environment I recommend Eclipse (see backup slides)

www.eclipse.org There is a tutorials for each online

http://www.gnu.org/software/emacs/tour/ http://www.yolinux.com/TUTORIALS/LinuxTutorialAdvanced_vi.html

10BINF634 FALL013 LECTURE 1

Page 11: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 11

Review: Molecular Biology Life evolved from common

origin about 3.5 billion years ago

All life shares similar biochemistry

Proteins: active elements Nucleic acids:

informational elements

Molecular Biology: the study of structure and function of proteins and nucleic acids

Page 12: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 12

Proteins Functions:

Structural proteins Enzymes Transport Antibody defense

Structure: Chains of amino acids Typical size ~300 residues Range from about 100 to

over 5000 residues

N.B. – A residue is one of the 20 building blocks of proteins also called an amino acid.

Page 13: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 13

DNA Double stranded Four bases: adenine (A),

guanine (G), cytosine (C) and thymine (T)

A and G are purines C and T are pyrimidines A always paired with T

(complementary) C always paired with G

(complementary)=> Watson-Crick base pairs

(bp) DNA may consist of

hundreds of millions bp A short sequence (<100) is

called an oligonucleotide

Page 14: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 14

RNA:

single stranded

uses U (uracil) instead of T

less stable than DNA

also used in functional molecules (e.g. rRNA, tRNA)

rRNA = ribosomal RNA

tRNA = transfer RNA

important regulatory functions (siRNA)

siRNA = small interferring RNA

•introns are not translated

•exons are translated

Page 15: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 15

Translation

Translation involves mRNA and ribosomes

Ribosomes made of protein and ribosomal RNA (rRNA)

Transfer RNA (tRNA) make connection between specific codons in mRNA and amino acids

As tRNA binds to the next codon in mRNA, its amino acid is bound to the last amino acid in the protein chain

When a STOP codon is encountered, the ribosome releases the mRNA and synthesis ends

Page 16: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 16

Page 17: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 17

DNA Structure DNA contains:

Genes "a locatable region of genomic sequence,

corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions, and or other

functional sequence regions ".[1] Promoters

“a promoter is a region of DNA that facilitates the transcription of a particular gene”

Non-coding regions DNA which does not contain instructions

for making proteins

Reading frames An open reading frames

(ORF): a contiguous sequence of DNA starting at a start codon and ending at a STOP codon

Page 18: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 18

Shotgun DNA Sequencing

More discussions can be found herehttp://en.wikipedia.org/wiki/Shotgun_sequencing

Page 19: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

>gi|40457238|HIV-1 isolate 97KE128 from Kenya gag gene, partial cdsCTTTTGAATGCATGGGTAAAAGTAATAGAAGAAAGAGGTTTCAGTCCAGAAGTAATACCCATGTTCTCAGCATTATCAGAAGGAGCCACCCCACAAGATTTAAATACGATGCTGAACATAGTGGGGGGACACCAGGCAGCTATGCAAATGCTAAAGGATACCATCAATGAGGAAGCTGCAGAATGGGACAGGTTACATCCAGTACATGCAGGGCCTATTCCGCCAGGCCAGATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCCTCAAGAACAAGTAGGATGGATGACAAACAATCCACCTATCCCAGTGGGAGACATCTATAAAAGATGGATCATCCTGGGCTTAAATAAAATAGTAAGAATGTATAGCCCTGTTAGCATTTTGGACATAAAACAAGGGCCAAAAGAACCCTTTAGAGACTATGTAGATAGGTTCTTTAAAACTCTCAGAGCCGAACAAGCTT

>gi|40457236| HIV-1 isolate 97KE127 from Kenya gag gene, partial cdsTTGAATGCATGGGTGAAAGTAATAGAAGAAAAGGCTTTCAGCCCAGAAGTAATACCCATGTTCTCAGCATTATCAGAAGGAGCCACCCCACAAGATTTAAATATGATGCTGAATATAGTGGGGGGACACCAGGCAGCTATGCAAATGTTAAAAGATACCATCAATGAGGAAGCTGCAGAATGGGACAGGTTACATCCAATACATGCAGGGCCTATTCCACCAGGCCAAATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCCTCAAGAGCAAATAGGATGGATGACAAGCAACCCACCTATCCCAGTGGGAGACATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAGCCCTGTTAGCATTTTGGACATAAAACAAGGGCCAAAAGAACCTTTCAGAGACTATGTAGATAGGTTTTTTAAAACTCTCAGAGCCGAACAAGCTT

>gi|40457234| HIV-1 isolate 97KE126 from Kenya gag gene, partial cdsCCTTTGAATGCATGGGTGAAAGTAATAGAAGAAAAGGCTTTCAGCCCAGAAGTAATACCCATGTTTTCAGCATTATCAGAAGGAGCCACCCCACAAGATTTAAATATGATGCTGAACATAGTGGGGGGGCACCAGGCAGCTATGCAAATGTTAAAAGATACCATCAATGAGGAAGCTGCAGAATGGGACAGGCTACATCCAGCACAGGCAGGGCCTATTGCACCAGGCCAGATAAGAGAACCAAGGGGAAGTGATATAGCAGGAACTACTAGTACCCCTCAAGAACAAATAGCATGGATGACAGGCAACCCGCCTATCCCAGTGGGAGACATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAGCCCTGTTAGCATTTTGGATATAAAACAAGGGCCAAAAGAACCATTCAGAGACTATGTAGACAGGTTCTTTAAAACTCTCAGAGCCGAACAAGCTT

Sequence Files -- FASTA Format

19BINF634 FALL013 LECTURE 1

Page 20: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

GenBank RecordLOCUS AK091721 2234 bp mRNA linear PRI 20-JAN-2006DEFINITION Homo sapiens cDNA FLJ34402 fis, clone HCHON2001505.ACCESSION AK091721VERSION AK091721.1 GI:21750158KEYWORDS oligo capping; fis (full insert sequence).SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo. TITLE Complete sequencing and characterization of 21,243 full-length human cDNAs JOURNAL Nat. Genet. 36 (1), 40-45 (2004)FEATURES Location/Qualifiers source 1..2234 /organism="Homo sapiens" /mol_type="mRNA" CDS 529..1995 /note="unnamed protein product" /codon_start=1 /protein_id="BAC03731.1" /db_xref="GI:21750159" /translation="MVAERSPARSPGSWLFPGLWLLVLSGPGGLLRAQEQPSCRRAFD

... RLDALWALLRRQYDRVSLMRPQEGDEGRCINFSRVPSQ"ORIGIN 1 gttttcggag tgcggaggga gttggggccg ccggaggaga agagtctcca ctcctagttt 61 gttctgccgt cgccgcgtcc cagggacccc ttgtcccgaa gcgcacggca gcggggggaa

... 20BINF634 FALL013 LECTURE 1

Page 21: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

Why Perl? Widely used in Bioinformatics

Bioperl http://www.bioperl.org/wiki/Main_Page

Ease of Programming Excellent pattern matching features Good for gluing other programs together Easy to learn (enough to get started)

Rapid Prototyping Few lines of code needed for many problems One-liners

Portability Runs on Unix, Windows, Macs

Open Source Culture Many sources of help ( try: %perldoc perldoc)

%perldoc –f print http://perldoc.perl.org/index-tutorials.html Many sources of useful modules ( http://www.cpan.org/ )

21BINF634 FALL013 LECTURE 1

Page 22: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 22

VariablesThe types of Perl variables are indicated by the initial symbol:

$var stores a scalar (a single string or number)$x = 10;$s = "ATTGCGT";$x = 3.1417;

@var stores an array (a list of values)@a = (10, 20, 30);@a = (100, $x, "Jones", $s);print "@a\n"; # prints "100 3.1417 Jones ATTGCGT"

%var stores a hash (associative array)%ages = { John => 30, Mary => 22, Lakshmi => 27 };print $age{"Mary"}, "\n"; # prints 22

Page 23: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 23

Declaring Variables

use strict; Putting use strict; at the top of your programs will tell perl to

slap your hands with a fatal error whenever you break certain rules.

Requires us to declare all variables Avoids creating variable by typos

variables may be declaring using my, our or local for now, we only need to use my:

my $a; # value of $a is undefmy ($a, $b, $c); # $a, $b, $c are all undefmy @array; # value of @array is ()

Can combine declaration and initialization:my @array = qw/A list of words/;

my $a = "A string";

Page 24: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 24

How Things Can Go Wrong

http://www.perlmonks.org/?node_id=269642

Come back and examine this after we have discussed references.

Page 25: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 25

Scalar and List Context All operations in Perl are evaluated in either scalar or list

context, and may behave differently depending on context

@array = ('one', 'two', 'three');$a = @array; # scalar context for assignment, return sizeprint $a; # prints 3

($a) = @array; # list context for assignmentprint $a; # prints 'one'

($a, $b) = @array;print "$a, $b"; # prints 'one, two'($a, $b, $c, $d) = @array; # $d is undefined

In computer science a list is an ordered collection of values

Page 26: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 26

String Operations Ways to concatenate strings

$DNA1 = "ATG";$DNA2 = "CCC";$DNA3 = $DNA1 . $DNA2; # concatenation operator$DNA3 = "$DNA1$DNA2"; # string interpolationprint "$DNA3"; # prints ATGCCC

$DNA3 = '$DNA1$DNA2'; # no string interpolationprint "$DNA3"; # prints $DNA1$DNA2

Page 27: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 27

ArraysAn array stores an ordered list of scalars:

@gene_array = (‘EGF1’, ‘TFEC’, ‘CFTR’, ‘LOC1691’);print “@gene_array\n”;

Output:EGF1 TFEC CFTR LOC1691

# there’s more than one way to do it (see previous slide on declaring variables)

@gene_array = qw/EGF1 TFEC CFTR LOC1691/;

http://www.perlmeme.org/howtos/perlfunc/qw_function.html

Page 28: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 28

Arrays

An array stores an ordered list of scalars:@a = (‘one’, ‘two’, ‘three’, ‘four’);

The array is indexed by integers starting with 0:

print “$a[1] $a[0] $a[3]\n”;

prints:

two one four

Notice: $a[i] is a scalar since we used the $ method of addressing the variable

Page 29: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 29

Unix Commands I cat --- for creating and displaying short files chmod --- change permissions cd --- change directory cp --- for copying files date --- display date echo --- echo argument ftp --- connect to a remote machine to download or upload files grep --- search file head --- display first part of file ls --- see what files you have lpr --- standard print command more --- use to read files mkdir --- create directory mv --- for moving and renaming files

Page 30: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 30

Unix Commands II pwd --- find out what directory you are in rm --- remove a file rmdir --- remove directory setenv --- set an environment variable sort --- sort file tail --- display last part of file tar --- create an archive, add or extract files ssh --- log in to another machine wc --- count characters, words, lines

This site has a nice reference card http://www.digilife.be/quickreferences/QRC/UNIX

%20commands%20reference%20card.pdf

Page 31: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 31

chmod and tar chmod

There is a nice tutorial here http://www.perlfect.com/articles/chmod.shtml

tar There is a nice tutorial here

http://www.apl.jhu.edu/Misc/Unix-info/tar/tar_2.html

Page 32: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 32

Running perl on binf.gmu.edu

% ssh binf.gmu.eduPassword: ******

-- Create binf634 directory (don't type stuff in red)% mkdir binf634% cd binf634% ls

-- Copy a file to current directory-- (the "." means :current directory")% cp ~jsolka/public_html/fall13/binf634/bookcode/examples/example4-1.pl .% ls% ls -l% l

Page 33: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 33

Running perl on binf.gmu.edu% cat example4-1.pl #!/usr/bin/perl -w# Example 4-1 Storing DNA in a variable, and printing it out

# First we store the DNA in a variable called $DNA$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screenprint $DNA;

# Finally, we'll specifically tell the program to exit.exit;

-- Changing permissions% chmod 755 example4-1.pl

-- Running a perl script% example4-1.pl

Page 34: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 34

Editing a Perl Script-- Read the Emacs or vi tutorial.-- Make a copy and edit the copy% cp example4-1.pl first.pl% l% e first.pl-- 1. Change 'print $DNA;' to 'print $DNA, "\n";'-- 2. Now add a comment:# Author: your name% cat first.pl#!/usr/bin/perl -w# Author: Jeff Solka# Example 4-1 Storing DNA in a variable, and printing it out

# First we store the DNA in a variable called $DNA$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screenprint $DNA, "\n";

# Finally, we'll specifically tell the program to exit.exit;

Page 35: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

35

For Next Week Read Tisdall chapters 1-5.

Be ready to ask questions Be ready to answer questions

HW 1: Write programs as described in the following exercises from "Beginning Perl for Bioinformatics" by Tisdall:

4.3, 4.4, 4.5, 5.2, 5.4 and 5.6 For each exercise, create a perl script called exX.Y.pl, for

example, ex4.3.pl for the first exercise. email me the assignments at [email protected] Use the following format

Binf634.initialoffirstname.lastname.ex.4.3 No class next week because of labor day

BINF634 FALL013 LECTURE 1

Page 36: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 36

Some of the Details

Page 37: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 37

Alternative Development Environments

Page 38: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 38

What is Eclipse? Eclipse is a multi-language software development platform

comprising an IDE and a plug-in system to extend it. It is written primarily in Java and is used to develop applications in this language and, by means of the various plug-ins, in other languages as well—C/C++, Cobol, Python, Perl, PHP and more.

The initial codebase originated from VisualAge.[1] In its default form it is meant for Java developers, consisting of the Java Development Tools (JDT). Users can extend its capabilities by installing plug-ins written for the Eclipse software framework, such as development toolkits for other programming languages, and can write and contribute their own plug-in modules. Language packs provide translations into over a dozen natural languages.[2]

Released under the terms of the Eclipse Public License, Eclipse is free and open source software.

http://en.wikipedia.org/wiki/Eclipse_(software)

Page 39: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 39

What Operating Systems Does Eclipse Run Under? LINUX

MAC OSX

WINDOWS XP Vista

Page 40: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 40

Languages Supported by the Eclipse IDE JAVA

Out of the box PERL

Via EPIC library Note one must also have a PERL compiler

PYTHON Via PyDev library

Note one must also have a PYTHON compiler installed

Page 41: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 41

Advantages and Disadvantages of the Eclipse Development Environment

Advantages Support for a plethora of languages Industrial strength

Used by many professional software developer Has support for configuration management

Disadvantages Can be slow when developing in languages other than JAVA

(may be mere anecdotal evidence)

Page 42: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 42

Installing Eclipse Under Windows XP - I First make sure that you have a Java Runtime Environment

installed Microsoft Windows XP [Version 5.1.2600](C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Owner>java -versionjava version "1.5.0_05"Java(TM) 2 Runtime Environment, Standard Edition (build

1.5.0_05-b05)Java HotSpot(TM) Client VM (build 1.5.0_05-b05, mixed

mode)

C:\Documents and Settings\Owner> If you don’t have a JRE installed go to

http://www.oracle.com/technetwork/java/javase/downloads/java-se-jre-7-download-432155.html

Page 43: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 43

Installing Eclipse Under Windows XP - II Obtain the Eclipse zipped file from the Eclipse

downloads link at http://www.eclipse.org/downloads/ I believe that I chose this one

Eclipse IDE for Java Developers (85 MB) I think that the current version of Eclipse is 4.3

Unzip it into an eclipse folder under your windows Program Files directory

In my case here C:\Program Files\eclipse

Note that Eclipse does not modify your system’s registry

Page 44: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 44

Installing Eclipse Under Windows XP - III Once installed (unzipped)

Double click on the eclipse.exe icon There is a “hello world” java tutorial

There are a number of other tutorials Eclipse3-1.pdf

(http://www.cs.umanitoba.ca/~eclipse/Eclipse3-1.pdf)

Page 45: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 45

Downloading ActiveStates ActivePerl Go here and click on the Windows download link

http://www.activestate.com/activeperl/ I previously set this up using version 5.10

Use this self extracting binary to install the program This takes a long time (30 minutes or more, go enjoy your

favorite beverage)

Page 46: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 46

Installing the Eclipse EPIC Library This is my synopsis of this EPIC webpage tutorial

http://www.epic-ide.org/download.php This is also a helpful site

http://www.epic-ide.org/faq.php Under Eclipse user the Help->Software Updates Tab Switch to the Available Software tab Choose Add Site and choose

http://e-p-i-c.sf.net/updates Tick the newly created site and click the install button

Page 47: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 47

Creating Your First PERL Program Under the Eclipse IDE - I Under Eclipse go to Window -> Open Perspective ->

Other Choose PERL

Under Eclipse go to Window -> Preferences Click on the PERL + and enter in the full path to the

ActiveStates PERL executable In my case it is "C:\Perl\bin\perl5.10.0.exe"

Page 48: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 48

Creating Your First PERL Program Under the Eclipse IDE - II Click on File -> New PERL Project

Call it something like HelloWorld Click on File -> New PERL File

Call it something like HelloWorldPerl Left click on this file symbol and make sure its extension is .pl

(Now it should have a camel symbol) Enter in your code

print "Hello from ActivePerl!\n"; Now you should be able to choose Run from the top menu or

left click on the program symbol and choose Run As Perl Local If all goes well a console window with the output

Hello from ActivePerl!

should show up

Page 49: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 49

Debugging With Eclipse and PERL The Perl PPM package PadWalker has to be installed

before one can debug your PERL programs under Eclipse

Follow the steps on the next two slides to install PadWalker within ActiveStates PERL

Page 50: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 50

First Find the Package (PadWalker) Find a package. To find a package in the repository: Click the All packages button, Enter text from the package's name or abstract in the

Filter field As text is entered in the Filter field, the list of packages

is automatically updated as the substring match becomes more precise. Click the magnifying glass icon to filter on different meta-data (e.g. Author).

Alternatively, just start typing the name of the package. The Package List will highlight the first package that matches the string you have typed.

Page 51: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 51

Next Install the Package (PadWalker) Install a package. To install a package from the repository: Click on the desired package in the Package List to

select it. Mark the package by:

Clicking the Mark for install button or, Hitting the "+" key or, Selecting Install <package-name> from the Action menu

or, Right-clicking the selection and choosing Install <package-

name> from the context menu. Click the Run marked actions button or select Run

Marked Actions (Ctrl-Enter) from the File menu. In my case I installed PadWalker 1.7

Page 52: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 52

Installing PadWalker Via ppm There are other interesting discussions here but they

seem to have been somewhat relegated by the gui-based ActiveStates PERL ppm interface

http://trouchelle.com/perl/ppmrepview.pl

Page 53: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 53

Editors

Page 54: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 54

http://www.viemu.com/vi-vim-cheat-sheet.gif

Page 55: BINF634 FALL013 LECTURE 11 BINF 634 Bioinformatics Programming Instructor: Jeff Solka Ph.D. Office: Room 312C OB Phone: 540-809-9799 Email: jlsolka@gmail.com

BINF634 FALL013 LECTURE 1 55

http://refcards.com/docs/gildeas/gnu-emacs/emacs-refcard-a4.pdf