102
خدا بنامDNA Sequencing مطهریفضل ابوال سید

DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

بنام خداDNA Sequencing

سید ابوالفضل مطهری

Page 2: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Problem

Page 3: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Problem

S = X1X2X3X4 . . . XG�1XG

Page 4: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Problem

finding the sequence

S = X1X2X3X4 . . . XG�1XG

Page 5: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Problem

finding the sequenceor finding some parts of the sequence

S = X1X2X3X4 . . . XG�1XG

Page 6: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Problem

finding the sequenceor finding some parts of the sequence

DNA Sequencing sequence is the genome

S = X1X2X3X4 . . . XG�1XG

Page 7: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Genome

Page 8: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Genome

Human Genome consists of 23 chromosomes

Page 9: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Genome

Human Genome consists of 23 chromosomesEach chromosome contains a DNA sequence

Page 10: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Genome

Human Genome consists of 23 chromosomes Each chromosome contains a DNA sequence The total length isG = 3.2⇥ 109

Page 11: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Genome

Page 12: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

DexoyriboNucleic Acid (DNA)

pPhosphate

Sugar

Bases

GuanineCytosine

AdenineThymine

T A

C G

Page 13: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

DexoyriboNucleic Acid (DNA)

pPhosphate

Sugar

Bases

GuanineCytosine

AdenineThymine

T A

C G

pCovalent

Page 14: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

DexoyriboNucleic Acid (DNA)

pPhosphate

Sugar

Bases

GuanineCytosine

AdenineThymine

T A

C G

pCovalent

G C

A THydrogen

Page 15: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp

Page 16: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp p

Page 17: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp p p

Page 18: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp p p p

Page 19: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp p p p p

Page 20: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Single Stranded DNAp p p p p

TA

CG G

TACGG.....

Page 21: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Fred Sanger

1918-2013

Won the Nobel Prize for Chemistry twice Sequencing Insulin DNA Sequencing

Page 22: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Phi X 174 Bactriophage

First Genome to be Sequenced Genome Size: 5386 bp Fred Sanger (1977)

Page 23: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Phi X 174 Bactriophage

First Genome to be Sequenced Genome Size: 5386 bp Fred Sanger (1977)

GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGCATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAACCTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGCCGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGAATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGTTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATGTTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAGGCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA

Page 24: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Lambda Phage

Genome Size: 48,502 bp Fred Sanger, et al (1982) Method: Shotgun Sequencing

Page 25: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Page 26: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Page 27: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flute

Page 28: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flute reed-flute, how it complains,

it complains, Lamenting its

Lamenting its banishment frombanishment from its home

Page 29: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flute

reed-flute, how it complains,

it complains, Lamenting its

Lamenting its banishment frombanishment from its home

Page 30: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flutereed-flute, how it complains,

it complains, Lamenting its

Lamenting its banishment frombanishment from its home

Page 31: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flutereed-flute, how it complains,it complains, Lamenting its

Lamenting its banishment frombanishment from its home

Page 32: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flutereed-flute, how it complains,it complains, Lamenting itsLamenting its banishment from

banishment from its home

Page 33: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Shotgun Sequencing

Reads

Hearken to the reed-flutereed-flute, how it complains,it complains, Lamenting itsLamenting its banishment frombanishment from its home

Page 34: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Human Genome

DNA Shotgun Sequencing

Page 35: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

DNA Shotgun Sequencing

Page 36: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

DNA Shotgun Sequencing

Page 37: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

DNA Shotgun Sequencing

Page 38: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

ACCGTCCTTAAGG

DNA Shotgun Sequencing

Page 39: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

ACCGTCCTTAAGG

ACCGTCGTTAAGG

DNA Shotgun Sequencing

Page 40: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

ACCGTCCTTAAGG

ACCGTCGTTAAGG

ACCGTCGTTAAGG

DNA Shotgun Sequencing

Page 41: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

{A,C,G, T}

Reads

GTTTCCTAAGTGGACCGTCGTTAAGG

ACTGCTTCGTCCTA

CCTTAAGCGGTTCACTGCTTCGTCCTA

DNA Shotgun Sequencing

Page 42: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Starting Big Projects

Page 43: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Starting Big Projects

Genome of s. cerevisiae Size of Genome: 12 Mb 16 Chromosomes Finished 1996

1989

Page 44: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Starting Big Projects

1990E. Coli: 4.6 Mb M. Capricolum: 1 Mb C. elegans: 100 Mb

Genome of s. cerevisiae Size of Genome: 12 Mb 16 Chromosomes Finished 1996

1989

Page 45: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Starting Big Projects

1990E. Coli: 4.6 Mb M. Capricolum: 1 Mb C. elegans: 100 Mb

1990Human Genome Project (HGP) Size of Genome: 3.2 Gb 23 Chromosomes Finished 2003

Genome of s. cerevisiae Size of Genome: 12 Mb 16 Chromosomes Finished 1996

1989

Page 46: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Strategy: Hierarchical SG

Page 47: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Strategy: Hierarchical SG

50-200 kbp

Random Fragments

Page 48: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Strategy: Hierarchical SG

Tiling path

Physical Map

50-200 kbp

Random Fragments

Page 49: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Strategy: Hierarchical SG

Tiling path

Physical Map

50-200 kbp

Random Fragments

Shotgun

500 bp

Page 50: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

H Influenzae

The first sequenced free-living organism Genome Size: 1.8 Mb Strategy: Whole Genome Shotgun Sequencing The Institute for Genomic Research TIGR Finished 1995 Done in 13 months Cost: 50 cents per base

Page 51: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Whole Genome Shotgun Sequencing

Page 52: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Whole Genome Shotgun Sequencing

Shotgun

Number of Reads: N Read Length: L ATCCGGTACGGATCGTACA

Page 53: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

HGP FINISHED

Page 54: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

There is Never Enough Sequencing

courtesy: Batzoglou

Page 55: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

There is Never Enough Sequencing

100 million species

courtesy: Batzoglou

Page 56: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

There is Never Enough Sequencing

100 million species

7 billion individuals courtesy: Batzoglou

Page 57: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

There is Never Enough Sequencing

100 million species

7 billion individuals

Somatic mutations (e.g., HIV, cancer)

courtesy: Batzoglou

Page 58: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Sequencing Growth

Cost of one human genome • 2004: $30,000,000 • 2008: $100,000 • 2010: $10,000 • 2013: $4,000

(today) • ???: $300

courtesy: Batzoglou

Page 59: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Next Generation Sequencing

Very High Throughput Higher error rate Shorter Length Very Cheap

Page 60: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational Challenges

courtesy: Batzoglou

Page 61: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational ChallengesHow to assemble the reads?

courtesy: Batzoglou

Page 62: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational ChallengesHow to assemble the reads?For given genome size G, what are the requirements

courtesy: Batzoglou

Page 63: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational ChallengesHow to assemble the reads?For given genome size G, what are the requirements

Number of Reads: N

courtesy: Batzoglou

Page 64: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational ChallengesHow to assemble the reads?For given genome size G, what are the requirements

Number of Reads: NLength of Reads: L

courtesy: Batzoglou

Page 65: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Computational ChallengesHow to assemble the reads?For given genome size G, what are the requirements

Number of Reads: NLength of Reads: L

N

L

Possible or not

courtesy: Batzoglou

Page 66: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Coverage Condition

Page 67: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Coverage Condition

Ncov

is the minimum number of reads required to cover the sequence

Page 68: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Coverage Condition

Ncov

is the minimum number of reads required to cover the sequence

N

L

Ncov

impossible

Page 69: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

courtesy: Tse

Page 70: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

L-spectrum is given

Page 71: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

L-spectrum is given

ATAGCCTAGCGAT

Page 72: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

L-spectrum is given

ATAGCCTAGCGATATAGC

TAGCCAGCCTGCCTA

CCTAGCTAGC

TAGCGAGCGA

GCGAT

Page 73: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

de Bruijn Graph Add a node for each (L-1)- mer in a read Add edges for adjacent (L-1)-mers

Page 74: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

TAGC

AGCC

AGCG

GCCC

GCGA

CCCT CCTA

CTAG

ATAG

CGAT

de Bruijn Graph Add a node for each (L-1)- mer in a read Add edges for adjacent (L-1)-mers

Page 75: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

TAGC

AGCC

AGCG

GCCC

GCGA

CCCT CCTA

CTAG

ATAG

CGAT

de Bruijn Graph Add a node for each (L-1)- mer in a read Add edges for adjacent (L-1)-mers

Find an Eulerian Path

Page 76: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

ATAGCCCTAGCGAT

TAGC

AGCC

AGCG

GCCC

GCGA

CCCT CCTA

CTAG

ATAG

CGAT

de Bruijn Graph Add a node for each (L-1)- mer in a read Add edges for adjacent (L-1)-mers

Find an Eulerian Path

Page 77: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

A

B

A

Page 78: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

A

B

A

A AB B

Interleaved repeats

Page 79: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

A

B

A

A AB B

Interleaved repeats

A A A

Triple repeat

Page 80: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

A

B

A

A AB B

Interleaved repeats

A A A

Triple repeat

Lcritis the minimum read length where there exists a unique solution for the problem

Page 81: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Repeat Condition

N

L

L > Lcrit

Lcrit

impo

ssib

le

Page 82: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Putting together

N

Ncov

L̄Normalized Read Length

Normalized number of reads

Page 83: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Putting together

N

Ncov

1covered

not covered

Normalized Read Length

Normalized number of reads

Page 84: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Putting together

N

Ncov

1covered

not covered

L̄crit

triple orinterleaved repeats

no triple orinterleaved repeats

Normalized Read Length

Normalized number of reads

Page 85: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Putting together

N

Ncov

1covered

not covered

L̄crit

triple orinterleaved repeats

no triple orinterleaved repeats

Normalized Read Length

Normalized number of reads

Page 86: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 87: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 88: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 89: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 90: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 91: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 92: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 93: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 94: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 95: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 96: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

what about the rest?

I. Find two reads with the largest overlap and merge them into one read.II. Repeat 1 until only one contig remains.

Algorithm

The Greedy Algorithm

Page 97: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

RANDOM Sequence

N

Ncov

1

Page 98: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

RANDOM Sequence

N

Ncov

1

Is feasible

Motahari, Bresler, Tse (2012)

Page 99: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Human Chromosome 19

500 1000 1500 2000 2500 3000 3500 4000 45000

1

2

3

4

5

6

7

8

9

10

N

Ncov

L̄crit

Longest triple repeat Longest interleaved repeat

Longest repeat

(Bresler, Bresler & T. 12)

Page 100: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

Human Chromosome 19

500 1000 1500 2000 2500 3000 3500 4000 45000

1

2

3

4

5

6

7

8

9

10

N

Ncov

L̄crit

Longest triple repeat Longest interleaved repeat

Longest repeat

Greedy Algorithm

Modified de Bruijn Algorithm

(Bresler, Bresler & T. 12)

Page 101: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

WHY?

(Bresler, Bresler & T. 12)

Page 102: DNA Sequencing - Sharificpc.sharif.edu/acmicpc14/docs/talks/Motahari - DNA Sequencing.pdfDNA Sequencing sequence is the genome S = X 1 X 2 X 3 X 4...X G1 X G. Genome. Genome Human

WHY?

0 1000 2000 3000 40000

5

10

15

data

i.i.d. fit

log ( number of repeats) Human chromosome 19

(Bresler, Bresler & T. 12)