31
Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It

Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Theory and Application of Multiple Sequence Alignments

Brett Pickett, PhD

a.k.a What is a Multiple Sequence Alignment,

How to Make One, and What to Do With It

Page 2: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

History

• Structure of DNA discovered (1953)

• First (phage) genome determined in 1977

• Human genome project begun in 1990

• First living organism (H.i.) sequenced in 1995

• Human “Rough draft” completed in 2000

– NHGRI (public) vs. J. Craig Venter (private)

• Used “super” computer to put human genome together in right order

Page 3: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is a Genome?

• Genetic material required for organism to replicate – Eukaryotes (Humans): # chromosomes

– Prokaryotes (Bacteria): 1 chromosome

– Viruses: “what’s a chromosome?”

– 10 trillion cells in human body X 2m = 3.2 Gb • 780,000 times around Earth

• 67.8 roundtrips to the sun

– Bacteria (580 kb- 10 Mb)

– Virus (3.5 kb – 1.3 Mb)

http://www.rsc.org/chemsoc/timeline/pages/2001.html

Page 4: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Why are Genomes so Important?

• Encode all organismal functions

– DNA -> RNA -> protein

• Unique to each organism

– Find differences (mutations) only by comparing genomes with each other

www.thednastore.com/images/cells/mrdna1.jpg

Page 5: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How are Sequences Made? 1. Make lots of copies of original sequence (PCR)

2. Put the copies into a machine to make even more copies

3. Fluorescent (glow-in-the-dark) bases get incorporated randomly into new DNA molecule

4. Laser detects glowing bases and tells the computer the order of bases = sequence

http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg

Page 6: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What’s the Next Step?

• After sequence is determined, then what?

• Make sense of it by comparing with other related (homologous) sequences

– Multiple Sequence Alignment

Page 7: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is an Alignment?

• Lining up related (homologous) positions

– Allows comparison

Unaligned

Aligned

Page 8: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Comparing Sequences (Genomes)

• All DNA contains a unique genetic “fingerprint”

• Similarity reveals

– Related function

– Shared evolutionary history

education.vetmed.vt.edu/.../FINGERPRINT.jpg

Page 9: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Aligning with Computational Methods

• Computers can’t “see” patterns

– Use math to find best alignment by assigning scores

– Match

– Mismatch

– Gap

• Internal – Insertion / deletion (indel)

• Terminal – Missing information?

Page 10: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is a Gap?

• Allows bases to be lined up even if sequences are different lengths

– Insertions / deletions (indels)

• Impossible to tell which sequence has lost (gained) information

– Terminal gaps

• Sequence is either naturally shorter or artificially cutoff

Page 11: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Mismatches Gaps

Nucleotide Alignment

• Custom Scores – Match – Mismatch – Gap-opening penalty

• Penalized for not having letter (begin a gap) • Why?

– Gap-extension penalty • Little or no penalty for lengthening a gap • Why?

– Scores balance between mismatch &

gap

Page 12: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Dynamic Programming

• Used to calculate alignment

– Breaks a very complicated process into smaller steps

– Helps computers to solve the problem faster

Sequence 1

Sequ

en

ce 2

Math

Read

http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm

Page 13: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Manual Alignment

Sequence A A T C

0 0 0 0 0

A 0

-4 5 -4

5

1 5 -4

5

1 -2 -4

1

-3 -2 -4

-2

T 0

-4 -2 1

1

-3 3 1

3

-1 10 -3

10

6 -1 -6

6

C 0

-4 -2 -3

-2

-6 -1 -1

-1

-5 1 6

6

2 15 2

15

Match = 5 Mismatch = -2 Gap Opening = -4 Gap Extension = 0

Traceback: Follow the highest scores back to the beginning Up or sideways = gap, diagonal = homology (line up)

A

A

A

-

T

T

C

C

Page 14: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Computer-Generated Alignment

• Much faster than we are

– 2 GHz = 2B calculations per second

– Don’t get tired, make mistakes, or get handcramps

Page 15: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Alignment Process

Page 16: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Types of Alignment

• Global

– Aligns entire sequence

– Permits gaps

– Forced even if sequences not homologous

• Local

– Aligns longest region possible with minimal (no) gaps

Page 17: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Beware!

• The computer is not always right

– Alignments

• Optimal: highest score

• True: evolutionarily correct

– Can be improved

• Hard for computer to accurately place indels (gaps) – Apply prior knowledge--codons

- AAA CCC

Lys Pro

AA- ACC C

??? Thr ?

Asn

Lys

vs. Nucleotide Sequence Amino Acid Sequence

Page 18: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST

• Basic Local Alignment Search Tool

– Most frequently used alignment tool

– Local alignment of 1 sequence (query) against all known sequences (subjects) in database

• Uses a “heuristic” to reduce number of sequences it actually has to align – Like using “Google” to find most homologous sequences

Page 19: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST Input

Page 20: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST Output

Page 21: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does This Impact Me?

• Human Microbiome project – Sequence all bacteria in intestines

• Millions of bacteria in each gram of excrement – Which ones make us sick? How different is flora between people?

• Ocean Virus Metagenomics project – Try to get an idea of virus diversity across the globe

• Boat goes around N.A. collecting samples – Billions of viruses in each gallon of seawater

Page 22: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does This Impact Me (cont’d)?

• Used to take swabs, grow colonies on agar

– Antimicrobial resistance in turkeys

• Sequencing removes middle step

• How to quickly assign genus and species to new sequences?

– BLAST

• Project: New Phage from ponds

Page 23: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Other Uses for Alignments

Page 24: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

SNP Detection

• Single Nucleotide Polymorphism

– Genetic changes occurring in at least one sequence

– May have biological significance

• Antibiotic resistance

• Changes could avoid detection by immune system

• Cause of genetic disease (CF)

Page 25: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Phylogenetic Trees

• Computer generated by: – Examining alignment

– Looking for shared mutations

• Show relationship(s) between sequences – History of sequences

• Where they came from

• Genetic changes that have occurred

CY065067

CY061195

CY065107

GU562458

CY065059

CY098563

CY098130

CY065011

CY061578

Clade

Node

Leaf

iOS Phylogram App (Free)

Branch

Page 26: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Recombination

• Can occur in all types of organisms – Eukaryotes – Prokaryotes – Viruses

• May change characteristic of organism – Make you sick (or not) – Not recognized by immune system – Fast way of getting lots of genetic changes

Breakpoint

RdRP

Genome 1

Genome 2

Daughter Sequence

Major Parent

Minor Parent

Page 27: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Reassortment

• Chromosomes (segments) from one organism replace those from another

– May change characteristic of organism

• Make you sick (or not)

• Not recognized by immune system

• Fast way of getting lots of genetic changes

+ =

Page 28: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Other Analysis Options

• Align Sequences

• Look for genetic changes (genotype) that are associated with traits (phenotype) – Host

– How sick it makes you

– Drug resistance

– Inherited disease

• Do any mutations consistently accompany the traits? – Genome Wide Association

Studies

http://lovestats.wordpress.com/dman/

Page 29: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to
Page 30: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does an Alignment Get a Score?

• Amino acids

– Identical >> Similar >> Dissimilar

Page 31: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Score Lookup Table (Matrix)

Symmetrical Positive Scores on Diagonal (Matches)

Some Mismatches get Negative Scores

Some Mismatches don’t