17
mmer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center [email protected]

Sequence Alignments

  • Upload
    liona

  • View
    76

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequence Alignments. Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center [email protected]. Sequence Alignments. Cornerstone of bioinformatics What is a sequence? Nucleotide sequence Amino acid sequence - PowerPoint PPT Presentation

Citation preview

Page 1: Sequence Alignments

Summer Bioinformatics Workshop 2008

Sequence Alignments

Chi-Cheng Lin, Ph.D.Associate Professor

Department of Computer ScienceWinona State University – Rochester Center

[email protected]

Page 2: Sequence Alignments

2

Summer Bioinformatics Workshop 2008

Sequence Alignments Cornerstone of bioinformatics What is a sequence?

Nucleotide sequence Amino acid sequence

Pairwise and multiple sequence alignments What alignments can help

Determine function of a newly discovered gene sequence

Determine evolutionary relationships among genes, proteins, and species

Predict structure and function of protein

Page 3: Sequence Alignments

3

Summer Bioinformatics Workshop 2008

Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA

What does it do?What does it do? One approach: Is there a similar gene in

another species? Align sequences with known genes Find the gene with the “best” match

Page 4: Sequence Alignments

4

Summer Bioinformatics Workshop 2008

Visualization of Sequence Alignment Dot Plot One of the simplest and oldest methods for

sequence alignment Visualization of regions of similarity

Assign one sequence on the horizontal axis Assign the other on the vertical axis Place dots on the space of matches Diagonal lines means adjacent regions of

identity

Page 5: Sequence Alignments

5

Summer Bioinformatics Workshop 2008

A Simple Example Construct a simple

dot plot for

TAGTCGATGTGGTCATC

The alignment isTAGTCGATGTGGTC-ATC

T A G T C G A T GT * * *G * * *G * * *T * * *C *A * *T * * *C *

Page 6: Sequence Alignments

6

Summer Bioinformatics Workshop 2008

Genes Accumulate Mutations over Time Mistakes in gene replication

or repair Deletions, duplications Insertions, inversions Translocations Point mutations

Environmental factors Radiation Oxidation

Page 7: Sequence Alignments

7

Summer Bioinformatics Workshop 2008

Codon deletion:ACG ATA GCG TAT GTA TAG CCG… Effect depends on the protein, position, etc. Almost always deleterious Sometimes lethal

Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… Almost always lethal

Deletions

Page 8: Sequence Alignments

8

Summer Bioinformatics Workshop 2008

Indels Comparing two genes it is generally

impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:

ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT

Page 9: Sequence Alignments

9

Summer Bioinformatics Workshop 2008

The Genetic Code

SubstitutionsSubstitutions are mutations accepted by natural selection.

Synonymous: CGC CGA

Non-synonymous: GAU GAA

Page 10: Sequence Alignments

10

Summer Bioinformatics Workshop 2008

Point Mutation Example: Sickle-cell Disease

Wild-type hemoglobin DNA3’----CTT----5’

mRNA5’----GAA----3’

Normal hemoglobin------[Glu]------

Mutant hemoglobin DNA3’----CAT----5’

mRNA5’----GUA----3’

Mutant hemoglobin------[Val]------

Page 11: Sequence Alignments

11

Summer Bioinformatics Workshop 2008

image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.

Page 12: Sequence Alignments

12

Summer Bioinformatics Workshop 2008

Comparing Two Sequences Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT

Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCT

ACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT

Page 13: Sequence Alignments

13

Summer Bioinformatics Workshop 2008

Scoring a Sequence Alignment Example

Match score: +1 Mismatch score: +0 Gap penalty: –1

ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1)

Various scoring scheme exist.

Score = 18 + 0 + (-7) = +11Score = 18 + 0 + (-7) = +11

Page 14: Sequence Alignments

14

Summer Bioinformatics Workshop 2008

How can we find an optimal alignment?

Finding the alignment is computationally hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT

There are ~888,000 possibilities to align the two sequences given above.

Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.

Page 15: Sequence Alignments

15

Summer Bioinformatics Workshop 2008

Global and Local Alignments Global alignments – score the entire alignment Local alignment – find the best matching

subsequence Why local sequence alignment?

Global alignment is useful only if the sequences to be aligned are very similar

Subsequence comparison between a DNA sequence and a genome

Identify Conserved regions Protein function domains

Page 16: Sequence Alignments

16

Summer Bioinformatics Workshop 2008

Example Compare the two sequences:TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG

Global alignment (does it look good?)TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG

Local alignment (does it look good?)---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------

Page 17: Sequence Alignments

17

Summer Bioinformatics Workshop 2008

Where do we get sequences to work with? Biological databases

NCBI Entrez (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?term=)

Wet labs Simulations Other people’s results On-line education resources

BEDROCK (http://www.bioquest.org/bedrock/) BLAST results