82
Proteins Determine Function Proteins make us tick Problems occur when Proteins are missing Proteins are malfunctioning Proteins are present that should not be there Controlling disease Understanding protein function Stopping protein function – Supplying missing/desired protein function 06/27/22 1 Sequence Alignment

Proteins Determine Function

  • Upload
    jimbo

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Proteins Determine Function. Proteins make us tick Problems occur when Proteins are missing Proteins are malfunctioning Proteins are present that should not be there Controlling disease Understanding protein function Stopping protein function Supplying missing/desired protein function. - PowerPoint PPT Presentation

Citation preview

Page 1: Proteins Determine Function

Proteins Determine Function• Proteins make us tick• Problems occur when

– Proteins are missing– Proteins are malfunctioning– Proteins are present that

should not be there• Controlling disease

– Understanding protein function

– Stopping protein function– Supplying missing/desired

protein function

04/21/23 1Sequence Alignment

Page 2: Proteins Determine Function

The Central Dogma

04/21/23 2Sequence Alignment

Page 3: Proteins Determine Function

The Human Genome

• The entire collection of our DNA– Consists of about

3.5x109 base pairs

• Our genome is split across chromosomes– Makes packaging

easier/more efficient

04/21/23 3Sequence Alignment

Page 4: Proteins Determine Function

Human Genome Project• Began in 1990, a 13-year effort coordinated by the

DOE and the NIH. Project goals included:– identify all the approximately 30,000 genes in human DNA,– determine the sequences of the 3 billion chemical base

pairs that make up human DNA,– store this information in databases,– improve tools for data analysis,– transfer related technologies to the private sector, and.– address the ethical, legal, and social issues.

• The Human Genome Project ended in 2003 with the completion of the human genetic sequence.

04/21/23 4Sequence Alignment

Page 5: Proteins Determine Function

DNA Sequencing

• Determining the sequence of nucleotides in a strand of DNA– 1978 Sanger

• Determined the DNA sequence for phi-x 174• 3.5x103 base pairs• Took about 2 years to sequence

– 2001 Venter/Celera• Sequenced the human genome• 3x109 base pairs• Took about 9 months to sequence

04/21/23 5Sequence Alignment

Page 6: Proteins Determine Function

Sequence DataTCTAGGAGGGAAGCACCCACCTCCCCTAAGCTCCATCTCCCTGAGCACTCATTTCCCAATGACCATACCAGGTTTTGGCCCTAGAGAGTTTATTACAAAATAAGAAAGAGAAGTCTGGGGAAGGTTCACTCATCATAGAATTTTGGCAGTTCATTGCCCAAGATGACTCGATGGTCCACACCGGCAGCTGTAATAGTGACCAGGTAGATGACACCCCCGCTTGAGCCATCCCGGCTCATGGCCAGAGCAATAGCTGCAGAGGGTTTCAAGTTGGAGAGAGGGAGAGAGAGGATGGCTTAGCTTCAAAAATCTTTTTACTCCCCCTCCATCCATATGCCTACTACCACTTTCACCTCAAAACTCATCTTCCAGGAAGGCATATTTAGTGGTGTGCTGGTAAATCAGTTTTTTTACAAAAAGGCTTCCATATGTGGCATCTGCTGATGTCCGTGGTGTAAATGCTCCCGCTATGATGAATTGCAAGTTACAAATAGCTAAGCAGTTCACAAATCCTTGACTATTTAACAGTCCGCTCTCATGAGTGGTCCCAAGCCAGCCTCAGCACACCTCAGCACACCACTGGTTCTTTTTTTTTTTTTTTTTCTCCAGACAGGGTCTCTCTCTGTCACCTAGGCTGCAGCGCAGTGGTGCAATCACCGCTCACTACAGCCTTGATCTCCCCGGCTCAGATGATCTTTCCACCTCAGCCTCCTGAGTAGCTGGGACTACAGGTGTGCACCACTATGCCCAGTTCATTTTTTTTTTTACTTTTTTTTATTGTTTTTTGTGGAGACAGGGTTTCACCATCTTGCCTAGGCTGGCCTCAAACTCCTGGGCTCAAGTAATCCTCCTGCCTCAGCCTCCCAAATTGTTGGCATTACAGGTGTGAGCCACTGTGCTTAGCACACCACTGGTTCTCACAGTGACTGTGTATCCTCATTTGATTTACTCAGAACAGCCCTGGTTTATCCGTATTGCCCAAGAACCCCATTGAGCTTTGCATTTGTCCTGCCCCTTTTCACTCTTAAAAGTGTACCAGGCCCGGCATTAACTTAAATGGCCACCCCTGTATTTCTCTTCCTGTTCCTCATAATCTACTTCCTTCCCATGTTTCAAAGCCCTCCCCAGGTACCCTTCCACTTGGCTGGTTACCGTCTGTGGTGAAGCGCCTGCACTCCTCGGGAGACATGCCTGGCTTATATGCTGCATCCACATAACCATAGATAAAGGTGCTGCCGGAGCCACCAATGGCAAAAGGCTGTCGAGTCAGCATTCCTCCCAGGGTTCCATATACCTGGGAAAGGGATCCTCAGGTTAAAGAATCATCAAGCCCTTCCTTCCCACTGAGACATTAAGTGGTCTCTGCACCCTGCAATGAAGCCCTGGTATCTCATATCCCCAAAGTACTATGCTTTCAGAGGTAGTGTCCTTGGAACTCATTGCTAGAATGACATAGGACTTCCATCTTCCTCTGCAGGAGAGTGGGGAAGCCCAGAGGAGAGAGTGCTTTGGGAGAAACTCACCTGACCTCCTTCACGTTGGTCCCAGCCAGCTACCATGAGATGTGCAGACAAGTCCTCTCGATATTTATAGCTGATATTTCTCACCACATTTGCAGCAGCCAAAACAAGTGGAGGTTCCTCCAGTTCTATCCTGAGGGAAATATTAGGAATAAAGGTTGATAGAATTTTAAGTCTCATTCTCCTATACTGTTACCATCATCCCTGCTAAACGACCCCTGAAAACTGTAACTGCAATAGCTCAAACTGCAGCCTCCCTCCCACATGTACAGGGGAACCAGAGTCCCACACCACCAACTGGTAAGAAGCTTTCAATTGCTCACTCTTTTGCTCAGCCCCACCCACATAACTTTCTTTTGGCTGCAAGGACCCTGCTCTTATGGGGAAAAGCAGATAAGGTTCACTCGGTTCACCACCGCCTCGCTGTCAGGAGGGAGTCAACAGTCACCAAGTTAAAACTCAGGTTTTTTTTTTTTTTTTTTTTTTTTGAGACAGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGATCAATCTTGGGCTCACTGCAAACTTCGCCTCCCTGGTTCAAGTGATTCTCCTGCCTCAGCCTCCCGAATAGCTGGGATTACAGGCACCCACCACCAAGCCCAGCTAATGTTTGTATTTTCAGTAGAGACAAGGTCCCAACATGTTGGCCAGGCTGGTCTCAAACTCCTGACCTCAAATATCTGCCCACCTCGGCATCCCAAAGTGCTGAGATTATAGATGTGAGCCACTGCACCCAACCAGAACTCAGGAATTTTTGAGGGTGATCATTCAATGTCTCTCAAATTTCTTTGACAAGAGAATAGCATGAAGTTTAATGCTTGGATTAAAGCAGGAGGCAAATAATCATCTCAGATATTATTAATCACTGCAGATGTTAATCAAAATTAGGCTTATTTTTCAGGCTTAGATTTTATAACAAAGCAAAAAATGCTAAGGTAAGAAAAATATGCCTCATCAATTTTCTTTGCTATTAACAATCTTGAGAGAGTTATGTTCTATGGAACATAATGTCAGTAATATTGACCTAACCCCATATACTCATTTTGCATGTGAGGAAATTGGTTAGGAGTGGGAGAAGAGACAAAATAGTTCAATATATGGTAAATGAGAAACCAGGTATCTGCTTGACAGAATCATCTTTTTGATCCCTAAGCACAGATGGAAAGAAGACCCTCAAAAATCTATCTCCTGTCCCCCTCTCAGACCCTATTCCTTTACTCATCCCTGTACACTACTGGGACAGGTCACATACACATTCAGACCCCAGATCCTCCTCCACAAATTCAGAGACCCAAGCACCCACCAAATAGCTTATCATAGTGGCTTTTGGGGAAGGTCAACTCCATTCCTCCAAGGCTCCAGTTTGCCAGTCTTTTCATGAATGGGTAAGGAAAGTGTGTATTTGAGGCCATTAGCTTCTTTCCAAATGCATACATCTTCACTTTTACTCACCCTGCAGACACTCGGGAATCAGAACCCATCACAACGCCCCCGTCAAACTCCACTGCCATGATGGTGGTCTGCAGAGACACAGAATATGGAATGTCAGGGCAAGAACAGCCTTGATGCCCTCATGTTAGAGAAGAAGAAACATTCCCAGAGAGGCGAAGTGACTGGCTCAAAGATTACACAGTAACAGGCCAGAGCTGACTGTCAGTACAGGCTTTTTTTCCCTTCATCTTTCCACTTTCTCTATTGCTTCATCCGGCTGCAGGGGAATGCCACAGCCCAGCTGTGATACAACACAGAAAGAACTGTGTCCCTAAGTTCCAACTTGCCTAGTGGAATCCTCTCCACTGTAGAGAGGTGGAG…….. 19,000 base pairs omitted!!!

04/21/23 6Sequence Alignment

Page 7: Proteins Determine Function

Genes

• Definition varies– To a geneticist a gene is the region of a

chromosome that confers a particular trait– To a molecular biologist a gene is a sequence of

DNA which encode a protein or RNA and includes all of the relevant regulatory sequences• Genes can be turned on/off, up/down

• Not all parts of the DNA sequence are involved with the production of proteins

04/21/23 7Sequence Alignment

Page 8: Proteins Determine Function

Gene Structure

04/21/23 8Sequence Alignment

Page 9: Proteins Determine Function

Locate GenesGn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------ 2.00 Prom + 5833 5872 40 -14.22 2.01 Init + 6023 6620 598 1 1 57 87 371 0.621 27.34 2.02 Intr + 7157 7271 115 0 1 122 41 76 0.997 5.81 2.03 Intr + 7420 7550 131 1 2 83 9 151 0.979 7.04 2.04 Intr + 8510 8715 206 0 2 123 91 98 0.826 12.52 2.05 Intr + 9142 9339 198 0 0 69 39 276 0.998 20.35 2.06 Intr + 10541 10669 129 1 0 89 78 131 0.992 13.09 2.07 Intr + 10819 11007 189 0 0 137 87 125 0.999 17.08 2.08 Intr + 11567 11740 174 1 0 101 80 233 0.966 23.94 2.09 Intr + 11984 12146 163 1 1 103 78 108 0.999 10.85 2.10 Intr + 12455 12591 137 0 2 101 94 127 0.999 14.89 2.11 Intr + 13874 14050 177 1 0 113 16 188 0.043 14.22 2.12 Intr + 16570 16717 148 0 1 98 64 116 0.984 10.01 2.13 Intr + 16876 16987 112 2 1 103 115 134 0.999 17.04 2.14 Intr + 17396 17525 130 2 1 100 98 174 0.999 20.30 2.15 Intr + 17924 18128 205 1 1 81 55 295 0.999 24.27 2.16 Term + 18612 18700 89 1 2 96 39 148 0.999 8.52 2.17 PlyA + 18919 18924 6 1.05

04/21/23 9Sequence Alignment

Page 10: Proteins Determine Function

Codons

Alanine, Arginine, Aspartic Acid, Asparagine, Cystinine, Glutamic Acid, Glutamine, Glycine, Histidine, Isoleucine, Leucine, Lysine, Methionine, Phenylalanine, Proline, Serine, Threonine, Tryptophan, Tyrosine, Valine

04/21/23 10Sequence Alignment

Page 11: Proteins Determine Function

Conversion to Protein

04/21/23 11Sequence Alignment

Page 12: Proteins Determine Function

Predict Protein Sequence>18:21:37|GENSCAN_predicted_peptide_2|966_aaMASSRCPAPRGCRCLPGASLAWLGTVLLLLADWVLLRTALPRIFSLLVPTALPLLRVWAVGLSRWAVLWLGACGVLRATVGSKSENAGAQGWLAALKPLAAALGLALPGLALFRELISWGAPGSADSTRLLHWGSHPTAFVVSYAAALPAAALWHKLGSLWVPGGQGGSGNPVRRLLGCLGSETRRLSLFLVLVVLSSLGEMAIPFFTGRLTDWILQDGSADTFTRNLTLMSILTIASAVLEFVGDGIYNNTMGHVHSHLQGEVFGAVLRQETEFFQQNQTGNIMSRVTEDTSTLSDSLSENLSLFLWYLVRGLCLLGIMLWGSVSLTMVTLITLPLLFLLPKKVGKWYQLLEVQVRESLAKSSQVAIEALSAMPTVRSFANEEGEAQKFREKLQEIKTLNQKEAVAYAVNSWTTSISGMLLKVGILYIGGQLVTSGAVSSGNLVTFVLYQMQFTQAVEVLLSIYPRVQKAVGSSEKIFEYLDRTPRCPPSGLLTPLHLEGLVQFQDVSFAYPNRPDVLVLQGLTFTLRPGEVTALVGPNGSGKSTVAALLQNLYQPTGGQLLLDGKPLPQYEHRYLHRQVAAVGQEPQVFGRSLQENIAYGLTQKPTMEEITAAAVKSGAHSFISGLPQGYDTEVDEAGSQLSGGQRQAVALARALIRKPCVLILDDATSALDANSQLQVEQLLYESPERYSRSVLLITQHLSLVEQADHILFLEGGAIREGGTHQQLMEKKGCYWAMPTEFFQSLGGDGERNVQIEMAHGTTTLAFKFQHGVIAAVDSRASAGSYISALRVNKVIEINPYLLGTMSGCAADCQYWERLLAKECRLYYLRNGERISVSAASKLLSNMMCQYRGMGLSMGSMICGWDKKGPGLYYVDEHGTRLSGNMFSTGSGNTYAYGVMDSGYRPNLSPEEAYDLGRRAIAYATHRDSYSGGVVNMYHMKEDGWVKVESTDVSDLLHQYREANQ

04/21/23 12Sequence Alignment

Page 13: Proteins Determine Function

hello.exe011110100100101111111110000001111110010101000011110100010001101010100011101110101101110100000010101100011010011101001110001010011010000001000100011101

100010001001011110010100111000101101101000010101010111001101000110000000101101001011111100000111011111001011100110011010101111111010101011111101111010

001010101000010101011100011110110110001010010110101011001111100010010110100110110110100011001000111101001111011110110101101111111110101010110011010011

001011011010101001010000011011101101100010110100001110110011001001000101101111001010111001101000011000110001110111011111111001101110000100110100011001

111010000011111011000010011010011110010010001100101110111001000100100011000001000111101100110100110101001111000111001100000010000010000101110110010110

111011100110110100010000010100001001001010001100010010000010101000000110111011111111111010110110000001010001011110000101011011101101000011000110010101

011001110001001100100101111100110001010000011000010010110011010000111011110100000011000110111000000010000101010011101101010010111000011110101100101101

111010001101011111011100000001011101010101000101110101011110000011001010000011001000000101011000101100011101101010100011111100010000010001001100011100

101111000101000101100110010001100100110010011010100100111001010001010000101000011001101110100001010001010010010111000101110010100000111100100011111100

001100110010001010001100000100100111111100001110001010111000110000011100011100001101100010000111111001111011111101001011000100100100011001011011111111

100011000000111101010000110010110110001011101110101111000001111101111110111001011010110111001010000101001010101001100100000111010001101100100101101001

101111111011000101011111010010011001010110111100111111011100100010000101110100000100111110001001111010110101000001000111101110001100100011010011100111

011101100101100001111001110111010010101001111111001011111101100101110110111000000100101100110110100010001001011011101011001100001000000110100010011001

000010110110111111101000011000010001010000111000010001111001000111111011110001011100100011010111001011101000111011101110000100001011000011010110001110

111001100101111010110111001010110010010010001010100010001100010000001001110101100011011010100111000010001010000011101011111110111010010101010110001101

010111110010011101010111010011110110000101011110001111000101010011101110101001101100011010001000000101001010011001000110101001010011010110010111100101

001110100001010101010111000100000001101110111000010001101001111111100010100010011111001000001101001110101000110010110010001010100100011111100010111010

101011101100111010101010101001001000110111110011001100100101001011000101100111000110110011100101100111010001010101010100111101011100011010011001101100

000001100100001000010110001100111001101010001001001110101100110010011101001101001011110110100111000110011001100001011011000111011111000000011010010100

111011000001101001001101010101000110110110100011101011001000100111100111111111010000000011100010100110010110010111110011011010100000111000001010010010

000110000101101110111010110010011010001111100110001000100001011011001000101011101111010111111100001111110100111001001000101100110100000100001100110101

101011010110100010111011111010101100101001110100101100011010001111110110001000110100111010110010100010010001100100110000101111001100100011000100010010

04/21/23 13Sequence Alignment

Page 14: Proteins Determine Function

Find Similar Proteins

gi|549042|sp|Q03518|TAP1_HUMAN ANTIGEN PEPTIDE TRANSPORTER ... 1221 0.0gi|2506117|sp|P36370|TAP1_RAT ANTIGEN PEPTIDE TRANSPORTER 1... 831 0.0gi|2506116|sp|P21958|TAP1_MOUSE ANTIGEN PEPTIDE TRANSPORTER... 827 0.0gi|1172602|sp|P28062|PRCY_HUMAN PROTEASOME COMPONENT C13 PR... 455 e-127

04/21/23 14Sequence Alignment

Page 15: Proteins Determine Function

Sequence Alignment

• Sequences of genes and proteins are compared to infer– Structural, functional, and evolutionary

relationships between the sequences

• Over time evolution causes– Substitutions which change residues in a sequence– Insertions/deletions add or remove residues

(gaps)

04/21/23 15Sequence Alignment

Page 16: Proteins Determine Function

Sequence Alignment• Aligning two sequences is the cornerstone of

Bioinformatics

• What are we looking for when aligning sequences?– Identity: Two sequence that have a certain number of

positions in common at aligned positions– Similarity: Often a number of positions will be

replaced by ones of similar chemical properties– Homology: Two sequences that are evolutionarily

related and stem from a common ancestor

04/21/23 16Sequence Alignment

Page 17: Proteins Determine Function

Possible Alignments

HEAGAWGHE-E-PA--W-HEAE

HEAGAWGHE-E---PAW-HEAE

HEAGAWGHE-E--P-AW-HEAE

substitution

insertion

deletion

04/21/23 17Sequence Alignment

Page 18: Proteins Determine Function

How Do We Choose?• Obviously there are many alignments, how do we

choose?

• The Biologists provide a scoring mechanism which can be used to determine which alignment is best

• A simple scoring mechanism:– 0 for a match– 1 mismatch or gap– Lowest score is the best

04/21/23 18Sequence Alignment

Page 19: Proteins Determine Function

Alignments With Score

HEAGAWGHE-E-PA--W-HEAE

Score: 6

04/21/23 19Sequence Alignment

Page 20: Proteins Determine Function

Computing The Result

• We know how to generate different alignments

• We know how to score alignments

• One algorithm might be to generate all possible alignments and choose the one with the best score– Not feasible!!

04/21/23 20Sequence Alignment

Page 21: Proteins Determine Function

Alignment

• At every step in the process of aligning two sequences, S1 and S2, you have to make one of three decisions– Match/mismatch– Add gap to S1

– Add gap to S2

• For example, when aligning ABC with BCABC -ABC ABCBC BC -BC

04/21/23 21Sequence Alignment

Page 22: Proteins Determine Function

Using An Array

• An array could be used to keep track of the possible moves– Diagonal: match/mismatch– Across: gap in sequence 2– Down: gap in sequence 1

04/21/23 22Sequence Alignment

Page 23: Proteins Determine Function

Example

- A B C

- --

B

C

04/21/23 23Sequence Alignment

Page 24: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

C --BC

04/21/23 24Sequence Alignment

Page 25: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A-

-B

C --BC

A- bc-B c

04/21/23 25Sequence Alignment

Page 26: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A

B

C --BC

A bcB c

04/21/23 26Sequence Alignment

Page 27: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

-A

B-

C --BC

-A bcB- c

04/21/23 27Sequence Alignment

Page 28: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A--B

A

B

-A

B-

C --BC

A- bc (2)-B c

A bc (1)B c

-A bc (2)B- c

04/21/23 28Sequence Alignment

Page 29: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A--B

A

B

-A

B-

AB-

--B

C --BC

AB- c--B c

04/21/23 29Sequence Alignment

Page 30: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A--B

A

B

-A

-B

AB

-B

C --BC

AB c-B c

04/21/23 30Sequence Alignment

Page 31: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A--B

A

B

-A

-B

A-B-B-

AB

B-

-AB

-B-

C --BC

A-B c-B- c

AB cB- c

-AB c-B- c

04/21/23 31Sequence Alignment

Page 32: Proteins Determine Function

Example

- A B C

- --

A-

AB--

ABC---

B -B

A--B

A

B

-A

-B

A-B AB--B- --B

AB AB

B- -B

-AB

-B-

C --BC

AB- c (3)--B c

A-B c (3)-B- c

AB c (2)B- c

-AB c (2)-B- c

AB c (1)-B c

04/21/23 32Sequence Alignment

Page 33: Proteins Determine Function

Dynamic Programming

• The word Programming in the name has nothing to do with writing computer programs.– Mathematicians use the word to describe a set

of rules which anyone can follow to solve a problem.– They do not have to be written in a computer language.

• Dynamic programming was the brainchild of an American Mathematician Richard Bellman– Store the results for small sub-problems and looks them

up, rather than recomputing them, when they are needed later to solve larger sub-problems

04/21/23 33Sequence Alignment

Page 34: Proteins Determine Function

Fib(6)

0 1

2

1 2

3

4

0 1

5

1 2

3

0 1 0 1

2

1 2

3

4

0 1

6

04/21/23 34Sequence Alignment

Page 35: Proteins Determine Function

Keep Only the Best Match

• There is no need to remember every possible extension

• We only need to keep the best one– There may be ties which is okay

• How do you determine which one is the best?

04/21/23 35Sequence Alignment

Page 36: Proteins Determine Function

Keep Only the Best Match

• There is no need to remember every possible extension

• We only need to keep the best one– There may be ties which is okay

• How do you determine which one is the best?– Ask a Biologist!!!!

04/21/23 36Sequence Alignment

Page 37: Proteins Determine Function

Example

- A B C

- 0 1 2 3

B 1 1

C 2

A- Score==2-B

A Score==1B

-A Score==2B-

04/21/23 37Sequence Alignment

Page 38: Proteins Determine Function

Example

- A B C

- 0 1 2 3

B 1 1 1

C 2

AB- Score==3--B

AB Score==1-B

AB Score==2B-

04/21/23 38Sequence Alignment

Page 39: Proteins Determine Function

Example

- A B C

- 0 1 2 3

B 1 1 1 2

C 2

ABC- Score==4---B

ABC Score==3--B

ABC Score==2-B-

04/21/23 39Sequence Alignment

Page 40: Proteins Determine Function

Example

- A B C

- 0 1 2 3

B 1 1 1 2

C 2 2 2 1

Note:

Down or across always adds one to the score

Diagonal will add either one or a zero depending on whether or not the bases matchABC Score==1

-BC

04/21/23 40Sequence Alignment

Page 41: Proteins Determine Function

Dynamic Programming

• Three steps– Initialization– Matrix Fill– Traceback

• These steps apply in general whether you are doing sequence alignment or folding predictions

04/21/23 41Sequence Alignment

Page 42: Proteins Determine Function

Recurrence

• A mathematical relationship that defines fn as some combination of fi with i<n

• For our next alignment we will use the recurrence– Mi,j = Maximum of

• Mi-1,j-1 + Si,j

• Mi,j-1 + w (gap in sequence 1)• Mi-1,j + w (gap in sequence 2)

– Where• Si,j (1 if match, 0 if mismatch)• w = 0 (gap penalty)

04/21/23 42Sequence Alignment

Page 43: Proteins Determine Function

InitializationG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0

G 0

A 0

T 0

C 0

G 0

A 0

04/21/23 43Sequence Alignment

Page 44: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1

G 0

A 0

T 0

C 0

G 0

A 0

04/21/23 44Sequence Alignment

Page 45: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1

G 0 1

A 0 1

T 0 1

C 0 1

G 0 1

A 0 1

04/21/23 45Sequence Alignment

Page 46: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1

G 0 1 1

A 0 1

T 0 1

C 0 1

G 0 1

A 0 1

04/21/23 46Sequence Alignment

Page 47: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1

G 0 1 1

A 0 1 2

T 0 1 2

C 0 1 2

G 0 1 2

A 0 1 2

04/21/23 47Sequence Alignment

Page 48: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1

G 0 1 1 1

A 0 1 2 2

T 0 1 2 2

C 0 1 2 2

G 0 1 2 2

A 0 1 2 3

04/21/23 48Sequence Alignment

Page 49: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1

G 0 1 1 1 1

A 0 1 2 2 2

T 0 1 2 2 3

C 0 1 2 2 3

G 0 1 2 2 3

A 0 1 2 3 3

04/21/23 49Sequence Alignment

Page 50: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1

G 0 1 1 1 1 1

A 0 1 2 2 2 2

T 0 1 2 2 3 3

C 0 1 2 2 3 3

G 0 1 2 2 3 3

A 0 1 2 3 3 3

04/21/23 50Sequence Alignment

Page 51: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1

G 0 1 1 1 1 1 1

A 0 1 2 2 2 2 2

T 0 1 2 2 3 3 3

C 0 1 2 2 3 3 4

G 0 1 2 2 3 3 4

A 0 1 2 3 3 3 4

04/21/23 51Sequence Alignment

Page 52: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1

A 0 1 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3

C 0 1 2 2 3 3 4 4

G 0 1 2 2 3 3 4 4

A 0 1 2 3 3 3 4 5

04/21/23 52Sequence Alignment

Page 53: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2

A 0 1 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4

G 0 1 2 2 3 3 4 4 5

A 0 1 2 3 3 3 4 5 5

04/21/23 53Sequence Alignment

Page 54: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2

A 0 1 2 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5

A 0 1 2 3 3 3 4 5 5 5

04/21/23 54Sequence Alignment

Page 55: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2 2

A 0 1 2 2 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5

A 0 1 2 3 3 3 4 5 5 5 5

04/21/23 55Sequence Alignment

Page 56: Proteins Determine Function

Matrix FillG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2 2 2

A 0 1 2 2 2 2 2 2 2 2 2 3

T 0 1 2 2 3 3 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5 5

A 0 1 2 3 3 3 4 5 5 5 5 6

04/21/23 56Sequence Alignment

Page 57: Proteins Determine Function

Traceback

• We now know the score of the alignment• Need to work back to find the actual

alignment– Look at possible predecessors– Pick a valid one– Repeat until at position 0,0

04/21/23 57Sequence Alignment

Page 58: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2 2 2

A 0 1 2 2 2 2 2 2 2 2 2 3

T 0 1 2 2 3 3 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5 5

A 0 1 2 3 3 3 4 5 5 5 5 6

04/21/23 58Sequence Alignment

Page 59: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2 2

A 0 1 2 2 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5

A 6

04/21/23 59Sequence Alignment

Page 60: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2 2

A 0 1 2 2 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5

A 6

04/21/23 60Sequence Alignment

Page 61: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2 2

A 0 1 2 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5

A 6

04/21/23 61Sequence Alignment

Page 62: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1 2

A 0 1 2 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3 3

C 0 1 2 2 3 3 4 4 4

G 0 1 2 2 3 3 4 4 5 5 5

A 6

04/21/23 62Sequence Alignment

Page 63: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0 0

G 0 1 1 1 1 1 1 1

G 0 1 1 1 1 1 1 1

A 0 1 2 2 2 2 2 2

T 0 1 2 2 3 3 3 3

C 0 1 2 2 3 3 4 4

G 5 5 5

A 6

04/21/23 63Sequence Alignment

Page 64: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0 0

G 0 1 1 1 1 1 1

G 0 1 1 1 1 1 1

A 0 1 2 2 2 2 2

T 0 1 2 2 3 3 3

C 0 1 2 2 3 3 4 4

G 5 5 5

A 6

04/21/23 64Sequence Alignment

Page 65: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0 0

G 0 1 1 1 1 1

G 0 1 1 1 1 1

A 0 1 2 2 2 2

T 0 1 2 2 3 3

C 4 4

G 5 5 5

A 6

04/21/23 65Sequence Alignment

Page 66: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0 0

G 0 1 1 1 1

G 0 1 1 1 1

A 0 1 2 2 2

T 3

C 4 4

G 5 5 5

A 6

04/21/23 66Sequence Alignment

Page 67: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0 0

G 0 1 1 1

G 0 1 1 1

A 0 1 2 2 2

T 3

C 4 4

G 5 5 5

A 6

04/21/23 67Sequence Alignment

Page 68: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0 0

G 0 1 1

G 0 1 1

A 2 2

T 3

C 4 4

G 5 5 5

A 6

04/21/23 68Sequence Alignment

Page 69: Proteins Determine Function

TracebackG A A T T C A G T T A

0 0

G 0 1

G 1

A 2 2

T 3

C 4 4

G 5 5 5

A 6

04/21/23 69Sequence Alignment

Page 70: Proteins Determine Function

TracebackG A A T T C A G T T A

0

G 1

G 1

A 2 2

T 3

C 4 4

G 5 5 5

A 6

04/21/23 70Sequence Alignment

Page 71: Proteins Determine Function

An Answer

• So one possible alignment is:

G _ A A T T C A G T T A

G G _ A _ T C _ G _ _ A

• There are other possible answers

04/21/23 71Sequence Alignment

Page 72: Proteins Determine Function

RNA Folding• It is not the number of genes that matters, it is

how we use them– RNA editing– Splicing– RNA Folding

• Predicting how RNA folds is extremely computational intensive

• Many algorithms have been written to do this• Want to look at ways to do this on high

performance computing platforms

04/21/23 72Sequence Alignment

Page 73: Proteins Determine Function

A U G C C U GC G

U C C U G G CUC

AACAUCAAA

UACAGGCAU

AA

ACA

UCGC

A

A

CU

AG

CA

AC

AAGGAGGAUGG

UUUUA

GUACG

UAG

GCAUUGC

G G A A C C C U CA A C

GU

GAAGAAGGUUC

AGAU

AGA

GCAAUG

AAU

CGUGCA

UGCUAGAGUCAUU

GG

U U CG A C C U A G U A

U

CU

U UC

GA

AG A

U U UC C A U U C C U U

CGCGAUCAAAA

C U G A G GCGCU

UG

A UAU

AGUGA U U A5 ’

3 ’

RNA Secondary Structure

04/21/23 73Sequence Alignment

Page 74: Proteins Determine Function

Dynalign (a 4-D Dynamic Programming Algorithm):

Sankoff, D. “Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems”. Siam Journal on Applied Mathematics. 45: 810-825 (1985)

Mathews & Turner. Journal of Molecular Biology. 317: 191-203 (2002)

Zuker Algorithm forSecondary Structure Prediction

(2D dynamic programming algorithm)

Algorithm forSequence Alignment

(2D dynamic programming algorithm)

Simultaneously finds the sequence alignment and thermodynamically favorable common secondary structure.

04/21/23 74Sequence Alignment

Page 75: Proteins Determine Function

Inputs, Optimization, and Outputs:

Input: Sequence A Sequence B

Optimization (minimize G°total):

G°total = G°sequence A + G°sequence B + (G°gap)(number of gaps)

Output: Sequence Alignment, Structure of A, Structure of Bwhere each BP in A must be homologous to a BP in B

04/21/23 75Sequence Alignment

Page 76: Proteins Determine Function

V Array

V(i,j,k,l) = min

04/21/23 76Sequence Alignment

Page 77: Proteins Determine Function

A Total of 16 Energy Equations

04/21/23 77Sequence Alignment

Page 78: Proteins Determine Function

Why V?

• For any set of pairs i to j and k to l, the lowest free energy is – V(i,j,k,l) + V(j,I+N,l,N2)

04/21/23 78Sequence Alignment

Page 79: Proteins Determine Function

Traceback

04/21/23 79Sequence Alignment

Page 80: Proteins Determine Function

Traceback

04/21/23 80Sequence Alignment

Page 81: Proteins Determine Function

Simplifications to Make Calculation Tractable:

Two sequences at most are considered.

The maximum distance allowed between aligned nucleotidesin the two sequences is restricted by a parameter, M.

O(M3N3)Storage increases proportionally to M2N2 where N is the length of the shorter of the two sequences.

(*Pentium III 600 MHz, 512 MB RAM.)04/21/23 81Sequence Alignment

Page 82: Proteins Determine Function

Parallelization

• Reduce array sizes– Arrays are sparse– Pre filter canonical pairs as unimportant• N=388, M=75, less than 4GB

• Fill routine can be easily parallelized– Distributed memory means larger sequences

• Traceback is a problem

04/21/23 82Sequence Alignment