38
Perl Part I: A Biology Primer

Perl Part I: A Biology Primer. Conceptual Biology H. sapiens did not create the genetic code – but they did invent the transistor Biological life is not

Embed Size (px)

Citation preview

Perl

Part I: A Biology Primer

Conceptual Biology

H. sapiens did not create the genetic code – but they did invent the transistor

Biological life is not optimized – the modern synthesis

Nature vs. Nurture What are the best ways to understand

the important differences the make the difference?

A Molecular Primer

Hierarchy of the eukaryote• Organism > System > Organ > Tissue > Cell

> Organelle > Protein > RNA > DNA

Put Simply: DNA → RNA → Protein

The Building Blocks

DNA is composed of four building blocks• Nucleic acids, nucleotides, bases

• Adenine, Cytosine, Guanine, Thymine

RNA also has four building blocks• Adenine, Cytosine, Guanine, Uracil

Proteins are composed of 20 building blocks• Amino acids, residues

• Fragments of proteins are called peptides

DNA, RNA and Proteins are polymers

Code Nucleic Acid(s)

w/ Sugar w/P

A Adenine Adenosine Adenylic Acid

C Cytosine Cytodine Cytidylic Acid

G Guanine Guanosine Guanylic Acid

T Thymine Tymidine Thymidylic Acid

U Uracil Uridine Uridylic Acid

M A or C (amino) Code Nucleic Acid

R A or G (purine) V A or C or G

W A or T (weak) H A or C or T

S C or G (strong) D A or G or T

Y C or T (pyrimidine)

B C or G or T

K G or T (keto) N A, G, C, T (any)

Code Nucleic Acid(s)

w/ Sugar

w/P

A Adenine Adenosine Adenylic Acid

C Cytosine Cytodine Cytidylic Acid

G Guanine Guanosine

Guanylic Acid

T Thymine Tymidine Thymidylic Acid

U Uracil Uridine Uridylic Acid

M A or C (amino)

Code Nucleic Acid

R A or G (purine)

V A or C or G

W A or T (weak)

H A or C or T

S C or G (strong)

D A or G or T

Y C or T (pyrimidine)

B C or G or T

K G or T (keto)

N A, G, C, T (any)

DNA RNA

A = T → A

C = G → C

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

DNA RNA

A = T → A

C = G → C

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

•One Dimensional

•Two Dimensional

•Three Dimensional

DNA RNA

A = T → A

C = G → C

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

DNA RNA

A = T → A

T = A → U

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

DNA RNA

A = T → A

T = A → U

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

One-Letter Code

Amino Acid Three-Letter Code

One-Letter Code

Amino Acid Three-Letter Code

C Cysteine Cys D Aspartic acid

Asp

E Glutamic Acid

Glu F Phenylalanin Phe

G Glycine Gly H Histidine His

I Isoleucine Ile K Lysine Lys

L Leucine Leu M Methionine Met

N Asparagine Asn P Proline Pro

Q Glutamine Gln R Argine Arg

S Serine Ser T Threonine Thr

V Valine Val W Tryptophan Trp

X Unknown Xxx Y Tyrosine Tyr

Z Glutamic acid or Glutimine Glx

DNA RNA

A = T → A

T = A → U

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

Met (Start)

Leu

AA?, AU?, CA?, CU? -> Asn, Lys, Ile, Met, His, Gln, Val

Pro

UU?, UG?, UC?, CU?, CG?, CC? -> Phe, Leu,Cys, Stop, Trp, Ser, Leu, Arg, Pro

UCU, UGU, GCU, GGU -> Ser, Cys, Ala, Gly

DNA RNA

A = T → A

T = A → U

G = C → G

C = G → C

T = A → U

T = A → U

M = K → M

W = W → ?

N = N → N

C = G → C

C = G → C

T = A → U

Y = R → ?

B = V → ?

N = N → N

K = M → ?

S = S → S

T = A → U

T = A → U

Cys

Phe, Leu

A?C, U?C -> Ile, Thr, Asn, Ser, Phe, Ser, Tyr,Cys

Leu

U?U, U?G, C?U, C?G -> Phe, Ser, Tyr, Cys,Leu, Stop, Trp, Leu, Pro, His, Arg, Gln

GUU, CUU -> Val, Leu

DNA

RNA

Protein

Lecture II

Part II: One-Dimensional Strings

Hello World…

A few perls of wisdom Concatenating Sequences Making a reverse complement Read sequences from data files

Every journey starts with a first 10bp

#!/usr/bin/perl –w

#storing DNA in a variable, and printing it out

#First, storing DNA in a variable called $DNA

$DNA = ‘CGGGCTATTC’;

#Next, print the DNA onto the screen

print $DNA;

#Finally, specifically tell the program to end

exit;

Every journey starts with a first 10bp

#!/usr/bin/perl –w

#storing DNA in a variable, and printing it out

#First, storing DNA in a variable called $DNA

$DNA = ‘CGGGCTATTC’;

#Next, print the DNA onto the screen

print $DNA;

#Finally, specifically tell the program to end

exit;

Every journey starts with a first 10bp

#!/usr/bin/perl –w

#storing DNA in a variable, and printing it out

#First, storing DNA in a variable called $DNA

$DNA = ‘CGGGCTATTC’;

#Next, print the DNA onto the screen

print $DNA;

#Finally, specifically tell the program to end

exit;

Every journey starts with a first 10bp

#!/usr/bin/perl –w

#storing DNA in a variable, and printing it out

#First, storing DNA in a variable called $DNA

$DNA = ‘CGGGCTATTC’;

#Next, print the DNA onto the screen

print $DNA;

#Finally, specifically tell the program to end

exit;

Concatenating DNA Fragments#!/usr/bin/perl –w

#Store DNA in 2 variables

$DNA1 = ‘AGTGCGTCGCTAG’;

$DNA2 = ‘ACCGCATGCATTG’;

#using string interpolation

$DNA3 = “$DNA1$DNA2”;

print “$DNA3\n\n”;

#dot operator

$DNA3 = $DNA1 . $DNA2;

print “$DNA3\n\n”;

Print $DNA1,$DNA2,”\n”;

exit;

Transcription: DNA to RNA

#!/usr/bin/perl –w

$DNA = ‘ACGACTGCACGATCGTACG’;

#print the DNA onto the screen

print “$DNA\n\n”;

#Transcribe the DNA->RNA by substituting all T’s with U’s

$RNA = $DNA;

$RNA =~ s/T/U/g;

#print the result to the screen

print “Here is the result of DNA->RNA:\t$RNA\n\n”;

exit;

$RNA =~ s/T/U/g;

Variable Binding Operator

Delimiters to separate the operator

Substituteoperator

Pattern to bereplaced

ReplacementText of replacepattern

Pattern modifier

g = globally

i = case insensititve

m = multiline

s = single line

x = permit comments

o = compile only once for

speed

e = treat replacement as Perl code

Calculating the Reverse Complement#!usr/bin/perl –w

$DNA = ‘ACGTCAGTCGAGCT’;

#print the starting DNA onto the screen

print “Here is the starting DNA:\t$DNA\n\n”;

#Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom

$revcom = reverse $DNA;

#substitute all bases by their complement

$revcom =~ s/A/T/g;

$revcom =~ s/T/A/g;

$revcom =~ s/C/G/g;

$revcom =~ s/G/C/g;

print “$revcom\n”;

Calculating the Reverse Complement#!usr/bin/perl –w

$DNA = ‘ACGTCAGTCGAGCT’;

#print the starting DNA onto the screen

print “Here is the starting DNA:\t$DNA\n\n”;

#Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom

$revcom = reverse $DNA;

#substitute all bases by their complement

$revcom =~ tr/ACGTacgt/TGCAtgca/;

print “$revcom\n”;

Reading Data from Files

#### Sample Data in FASTA Format ####

>NM_012345 | Sample Data | Muppet Stuffing Protein

MNIDDKLEFGDEMGOSSRTMV

FGDLVRSMPHOEILAADEVLISHEE

GLOYAKLEFGDEMGOGHDDEFGVY

Reading Files

#!/usr/bin/perl –w

#The filename of the file containing the sequence data

$proteinFilename = ‘NM_012345.pep’;

#open the file, and associate a ‘filehandle’ with it

open (PROTEINFILE {IN}, $proteinFilename);

#assign file with an input operator

$muppetProtein = <PROTEINFILE>;

#print the protein file

print “Here is the protein:\t$muppetProtein\n\n”;

exit;

Reading Data from Files

#### Sample Data in FASTA Format ####

>NM_012345 | Sample Data | Muppet Stuffing Protein

MNIDDKLEFGDEMGOSSRTMV

FGDLVRSMPHOEILAADEVLISHEE

GLOYAKLEFGDEMGOGHDDEFGVY

Lets try this again …

#!usr/bin/perl –w

$proteinFilename = ‘NM_012345.pep’;

open(PROTEINFILE, $proteinFilename);

$muppetProtein = <PROTEINFILE>;

print “Here is the first line:\t$muppetProtein\n\n”;

$muppetProtein = <PROTEINFILE>;print “Here is the second line:\t$muppetProtein\n\n”;

$muppetProtein = <PROTEINFILE>;print “Here is the third line:\t$muppetProtein\n\n”;

close PROTEINFILE;

exit;

Using Arrays to Read Files

#!usr/bin/perl –w

$proteinFilename = ‘NM_012345’;

#open the file

open(PROTEINFILE, $proteinFilename);

#Read the sequence data from the file, and store it in the array #variable @protein

@protein = <PROTEINFILE>;

#print the protein onto the screen

print @protein;

close PROTEINFILE;

exit;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Now print each element of the array

print “\nFirst element: “ , $bases[0];

print “\nSecond Element: “ , $bases[1];

print “\nThird Element: “ , $bases[2];

print “\nFourth Element: “ , $bases[3];

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Now print each element of the array in a row

print “\nHere are all of the bases: “ , @bases;

#This prints out: ‘Here are all of the bases: ACGT’

#But, you can print them out with spaces in between

print “\nHere they are with spaces” , “@bases”;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how to take an element off of the end

$base1 = pop @bases;

print “Here’s the last element: “, $base1, “\n\n”;

#The other elements still remain

print “\nHere are the remaining elements: ” , “@bases”;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how to take an element off of the front

$base2 = shift @bases;

print “Here’s the first element: “, $base2, “\n\n”;

#The other elements still remain

print “\nHere are the remaining elements: ” , “@bases”;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how you put an element at the beginning of an array

#Our example will put the last element at the beginning

$base1 = pop @bases;

unshift (@bases, $base1);

print “Here’s the last element put first: “ , “@bases\n\n”;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how you put an element at the end of an array

#Our example will put the first element at the end

$base1 = shift @bases;

push (@bases, $base1);

print “Here’s the first element put last: “ , “@bases\n\n”;

Arrays

#Here’s one way to declare an array

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how to reverse an array

@reverse = reverse @bases;

#Here’s how to get the length

print scaler @bases, “\n\n”;

#Here’s how to insert an element at an arbitrary place

splice (@bases, 2, 0, ‘X’);

Arrays

#Arrays can be evaluated as lists and scalers

@bases = (‘A’,’C’,’G’,’T’);

#Here’s how to print the array

print “@bases\n”;

#Here’s how to assign it to a scaler

$a = @bases; print $a;

#Here’s how to assign an array to a list

($a) = @bases; print $a;