8
BA AA AB BB AAB ABB BBA BBB AAA 2 1 1 1 2 1 2 2 1 1 2 Overlap graph De Bruijn graph Overlap-Layout-Consensus (OLC) assembly De Bruijn Graph based (DBG) assembly

AAB ABB 2 AA BA BB

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AAB ABB 2 AA BA BB

BAAAAB

BB

AAB

ABB

BBABBB

AAA

2

1112

1

2

2 1

12

Overlap graph De Bruijn graph

Overlap-Layout-Consensus (OLC) assembly

De Bruijn Graph based (DBG) assembly

Page 2: AAB ABB 2 AA BA BB

CGAAAAA

GAAAAAC

AAAAAAC

AAAAACC

AAAAACTAAAAACA

GAAAAAGAAAAAGG

TAAAAAT

AAAAATT

GGAAACA

GAAACAC

AAAACCA

AAAAAAA

AAAAAAG

TGAAAACGAAAACT

AAAAAGA

AAAAGAG

AAAAAGC

AAAAGCC

AAAAGGC

GAAAATGAAAATGT

TAAAATT

AAAATTT

GTATCTATATCTAC

AAACACA

CGCCATT

GCCATTA

AAAACAT

AAACATG

AAACCAT

TGACGGTGACGGTA

AAAACCGAAACCGC

AGCCCGC

GCCCGCA

GAAACGC

AAACGCA

AGAACGTGAACGTT

AAAACTG

AAACTGC

AAAACTTAAACTTT

AAGGTAA

AGGTAAC

AAAGAGT

GAAAGCA

AAAGCAA

AAAGCAG

AAAGCCC

AGAAGCG

GAAGCGC

AAAGGCG

TGAAGTT

GAAGTTC

GAAGTTA

CAAATAA

AAATAAA

TAAATAC

AAATACT

CAGCGAT AGCGATG

AAAATCA

AAATCAC

AAATCAT

CTTAGGT

TTAGGTC

GGAAAGC

CAAATGC

AAATGCA

AAATGTC

AAATGTA

AAAATTA

AAATTAC

GGAATTT

GAATTTG

AAACAAAAACAAAG

TAACAAC

AACAACC

AACAACG

AAGCGCG

AGCGCGT

AACACAG

AGAAAAA

GAAAAAA

AGACAGA

GACAGAT

CTAAATA

TGACAGTGACAGTG

CTGGTAA

TGGTAACCAACATC

AACATCC

AACATGT

TGACATTGACATTG

CGACCAA

GACCAAA

TGACCAC

GACCACA

GACCACC

CGACCAG

GACCAGG

CAACCAT

AACCATC

AACCATG

ATGGTGC

TGGTGCT

GTGGGAC

TGGGACG

CTGACGG

TGACGGG

TGACCTGGACCTGG

CCCCGCC

CCCGCCA

AAAAAATAAAAATG

CTGCTGG

TGCTGGC

TGCTGGT

TGACGAG

GACGAGA

CAACGATAACGATG

GGACGCA GACGCAA

TGACGCG

GACGCGT

GGACGCTGACGCTA

CAACGGG

AACGGGC

TAACGGT

AACGGTG

GAACGTA

AACGTAT

AAAACAGAAACAGC

ACGGTTC

CGGTTCC

CGACTAC

GACTACT

CGCAATGGCAATGG

CAGGTTG

AGGTTGC

GGACTCGGACTCGC

TAACTGCAACTGCC

GAACTGG

AACTGGT

TGACTTA

GACTTAG

GAACTTC

AACTTCT

AACTTTC

CGAGAAAGAGAAAA

GAGAAAT

GCGTAAACGTAAAT

TGCAACG

GCAACGG

AACCGGTACCGGTC

ACCACCA

CCACCAT

CCACCAC

CACCGTAACCGTAT

AAGAGTG

TAAATTA

AAATTAA

AAGCAAT

TAGTGCG

AGTGCGG

AAGCAGA

CTGCGATTGCGATG

AAGCCCG

CAAGCCG

AAGCCGC

ACACAGA

CACAGAA

GTAACGG

CGAGCTG

GAGCTGG

AGAAAAT

GTTCTGATTCTGAT

CGAACGTGAACGTC

AGAGGCAGAGGCAG

AAGGCGA

AAAGGTA

GAAGGTTAAGGTTT

TGAGTAA

GAGTAAA

AGAGTAC

GAGTACA

TGAACTG

ACCGCTG

CCGCTGA

CCTATATCTATATA

CGAGTGTGAGTGTT

CGAGTTAGAGTTAG

AAGTTCG

AATAAAA

CGAGGTA

GAGGTAA

AATACTT

ACATCAG

CATCAGT

TGATAGC

GATAGCA

GTCCTCT

TCCTCTC

CAATATAAATATAGCAATATC

AATATCA

CAATATGAATATGT

CGATATT

GATATTC

AATCACC

AAACAGTAACAGTT

GAATCCA

AATCCAC

CAATCCGAATCCGC

TTCCTGCTCCTGCG

CGATCGC

GATCGCC

ACGTATTCGTATTT

GAATCTA

AATCTAC

CAAAGCC

TAATGAA

AATGAAA

TGCGGGT

GCGGGTT

GCGGGTG

TAATGAGAATGAGA

CGATGAT

GATGATT

AATGCAG

CGATGCC

GATGCCG

CACACCAACACCAT

CTTTTTT

TTTTTTT

TGATGGC

GATGGCT

ACGGGGCCGGGGCT

AATGTCG

AATTAAA

AATTACA

GGCGCTGGCGCTGT

GAATTCCAATTCCG

AAATTCGAATTCGG

CAAAGGCAAAGGCA

TGATTGA

GATTGAA

TGATTTAGATTTAG

GGATTTC

GATTTCC

AATTTGC

AAATTTT

AATTTTA

GACAAAAACAAAAG

GGCAAAT

GCAAATG

TACAACA

ACAACAG

ACAACCA

GGCAAGC

GCAAGCC

ATGTCCT

TGTCCTG

GGCAATA

GCAATAT

AGCAATG

GCAATGC

CACAATT

ACAATTG

TACACAA ACACAAC

TCGTCGA

CGTCGAC

AACACATACACATT

AGCACCA

GCACCAC

GCACCAG

TAGCGAGAGCGAGA

CGCACCGGCACCGA

CGCACCT

GCACCTG

GACACGCACACGCC

CACCGAAACCGAAA

GTCGACCTCGACCA

ACAGAAA

CACAGAC

ACAGACA

TACAGAG

ACAGAGT

ACAGATA

AGCAGCGGCAGCGT

CGCAGCTGCAGCTG

TACAGGA

ACAGGAA

GGCAGGG

GCAGGGG

GCAGGGC

CACAGGT

ACAGGTA

TGCAGTAGCAGTAC

ACAGTGC

GACATACACATACA

GGCATAGGCATAGC

TACATCA

ACATCCA

CCGTTAT

CGTTATC

CACATGAACATGAT

GCTATATCTATATG

TGCATGG

GCATGGC

ACATGTC

CGCATTA

GCATTAG

CGCATTC

GCATTCC

GCATTCG

GGCATTGGCATTGC

TGAATCC

CGCCAAA

GCCAAAA

CACCAAC

ACCAACC

AACCAAT

ACCAATA

TACCACA

ACCACAG

CACCACC

TAAAAACCACCACT

ACCACTA

TACCAGCACCAGCA

ACCAGGA

CGAATCT

ACCATCT

ACCATGC

AACCATT

ACCATTA

TACCCAA

ACCCAAT

TACGGCCACGGCCG

CGCCCAG

GCCCAGC

TGCCCCC

GCCCCCA

GCCCCCG

GAAATGC

CACCCGCACCCGCC

ACCCGCA

TGCCCGG

GCCCGGA

AGCGATTGCGATTA

CGTCGAAGTCGAAT

CGCCCTGGCCCTGC

CAGTGCC

AGTGCCC

TGCCGAA

GCCGAAC

TACCGAGACCGAGG

GGAATGTGAATGTG

AGCCGCA

GCCGCAT

CGCCGCC

GCCGCCC

GCCGCCG

ACGGGCA

CGATCCG

GATCCGG

GTCAGCG

TCAGCGA

GGCCGGC

GCCGGCG

AGCCGGG

GCCGGGA

TGCCGGT

GCCGGTA

GCCGGTC

CGCCGTA

GCCGTAT

TACCGTCACCGTCA

TGCCGTG

GCCGTGA

GCCGTGG

TACCGTT

ACCGTTA

ACCGTTG

TTAATTGTAATTGA

TACCTCG

ACCTCGA

CAGTGGC

AGTGGCA

CACCTGA

ACCTGAC

ATCGCCA

TCGCCAT

TGCCTGGGCCTGGT

GGCGCGTGCGCGTT

CAGCGTGAGCGTGT

CACCTTTACCTTTA

GCGGGTCCGGGTCG

TCACAAT

TCGTGGT

CGTGGTC

CGTGGTT

CCCGCCG

CCGCCGT

TAAAAAA

CACAACA

GGCGAGA

GCGAGAA

GGCGAGC

GCGAGCT

AACGAGG

ACGAGGT

TGCGAGT GCGAGTG

AGCGATAGCGATAA

TAGCGCG

GCGATGA

GCGATGC

ACGGTGC

CGGTGCG

ACGCAAC

AGCGCAC

GCGCACA

AACGCAT

ACGCATT

CAAAAGCAAAAGCA

CCGTGTGCGTGTGG

CGCGCCG

GCGCCGA

TACGCGC

ACGCGCC

TTACACA

AACGCGTACGCGTG

AACGCTG

ACGCTGC

TACGGAGACGGAGT

ACGCGTA

TGCGGCA

GCGGCAA

GCGGCAG

GGCGGCCGCGGCCT

GGCGGCGGCGGCGT

GGCGGGAGCGGGAC

TGCGGGC

GCGGGCT

TACACAGACACAGT

TTTTTTC

TTTTTCG

GGCGGTA

GCGGTAC

AGCGGTCGCGGTCA

AACGGTT

ACGGTTA

GCACAGA

TGCGCTG

GCGCTGA

CATTACC

ATTACCT

ATTACCA

GGCGTGT

GCGTGTT

AACGTTA

ACGTTAC

GTACAGG

AACGTTTACGTTTT

CACTAAA

ACTAAAT

AAGTTTCAGTTTCG

GAGTTCG

AGTTCGG

TGAAAAA

ACTACTC

CTCTGCC

TCTGCCC

AGCTATGGCTATGA

TTACATCTACATCG

CCGTTGA

CGTTGAT

TACTCCAACTCCAC

TACTGAAACTGAAA

TACTCCG

ACTCCGC

AACTCCTACTCCTG

ACTCGCC

TGCAGAA

GCAGAAC

CAACATTAACATTC

AGCTGGT

GCTGGTG

TGCTGAA

GCTGAAT

GGCTGAC

GCTGACG

GCTGACC

GACTGAGACTGAGG

CGCTGAT

GCTGATT

GCTGATC

GACTGCA

ACTGCAA

CACTGCC

ACTGCCG

ACTGCCT

CGCTGCG

GCTGCGC

GCTGCGG

TTACCAC

TACCACC

CGCTGGC

GCTGGCA

ACTGGTT

TCCGCCCCCGCCCC

AACACCAACACCAC

AGCTGTTGCTGTTG

GTGCTGG

GACGGGA

ACGGGAC

ACTTAGG

CACCATC

CACCATT

CAGTTTG

AGTTTGT AGTTTGC

ACTTCTG

TCGCCGT

ACAGCGA

CAGAGTG AGAGTGT

TGCTTTA

GCTTTAC

ACTTTCG

TAAAACA

GGCTTTT

GCTTTTT

ATAAAAC

ATAAAAA

GGGAAATGGAAATA

CCACCCG

CAGAACG

GGGAAGCGGAAGCA

AGGAAGGGGAAGGG

CAAAATC

AGGAATT

CAGACAG

AGTAAAT

GTAAATT

ACCAAAG

TGCGGTG

GCGGTGC

GGGACGC

GGACGCG

GGGACTAGGACTAG

GGGACTC

GGAACCAGAACCAC

TTACCGT

CGCGTACGCGTACA

AGTAACTGTAACTT

CAGAGTA

ATGCCAATGCCAAG

TTACCTC

CAGATAA

AGATAAA

AGATAAT

CGGATAG

GGATAGC

TTACCTG

TACCTGC

CTGACGC

TGGATTA GGATTAA

GGGATTCGGATTCC

TGCCCGC

GCCCGCG

GGGATTT

TTAAAATTAAAATA

CGGCAAG

GGGCAAT

TGGCACAGGCACAA

TAGCACC

TGGCACGGGCACGA

TAGCAGC

AGCAGCT

AGGCAGG

TGGCAGTGGCAGTG

AGGCATA

GGCATAC

TAGCATC AGCATCA

CGGCCAA

GGCCAAA

TGGCCAC

GGCCACC

GTGGTCATGGTCAC

GAGCCGAAGCCGAT

TGGCCGG

CTGAGTGTGAGTGC

CGCGTGG

GCGTGGT

CGGCCTGGGCCTGT

AGGCGAG

TAGCGATTAGCGCA

CAGCGCG

CACCGCCACCGCCA

TGGCGGC

TTATCACTATCACC

CGGCGGT

CGTACAG

CGGCGTG

TGCTGAT

GCTGATG

GCCCGAACCCGAAT

TTCGCTGTCGCTGG

GGGCTCTGGCTCTT

GGGCTGA

ACCTGGTCCTGGTA

TTTACCC

TTACCCA

CAGGAAA

AGGAAAC

ACCCCGACCCCGAA

CAACGGT

TCTACCG

CTACCGT

CGGGACT

CAGGATAAGGATAG

CAGGATG

AGGATGC

CGGGATT

CGGGCAA

GGGGCAG

GGGCAGG

GGGCAGT

TAGGCAT

CGGGCCTGGGCCTT

CGGGCTG

CGGGCTT

GGGCTTT

CACCGCT

AGGGGCA

GGGGGCCGGGGCCT

CAGGGGC

CCTATTTCTATTTT

CTACTCC

CAGGTAA

GAGGTACAGGTACA

GGGGTAGGGGTAGC

TAGGTCA

AGGTCAC

TGGGTCA

GGGTCAT

GGGTCAG

GGGGTGAGGGTGAC

CAGGTGG

AGGTGGC

TATACTT

ATACTTT

TGGGTTAGGGTTAT

CAGGTTCAGGTTCT

CGGGTTG

GGGTTGC

GGTAACG

GGTAACA

GGTAACC

GAGTAAGAGTAAGT

CGGTAAT

GGTAATG

AGTACAC

AGTACAT

TTGAGGCTGAGGCA

TGGTAGC

GGTAGCG

GGTCACT

GGTCACC

CGGTCAG

GGTCAGG

GGTCATT

TGACTGC

CGGTCGA

GGTCGAA

TGGTCGTGGTCGTT

ATCGATC

TCGATCC

CCAGGAA

GCAGCTT

GTAGCGA

GGTGCGG

GGTGCTG

TAACCAA

GGTGGCC

CAGTGGG

AGTGGGT

TGGTGGT

GGTGGTT

GAGTGTC

AGTGTCT

CAGTGTGAGTGTGG

AGTGTTG

TGGTTAC

GGTTACC

TCACCACCACCACA

GGTTCCG

TGGTTCT

GGTTCTG

GGTTGCC

GGGTTGTGGTTGTC

TGGTTTC

GGTTTCA

CCAACCA

AGCAGGAGCAGGAA

GATAAAA

CATAAACATAAACG

ATTGAAA

CTCCGCT

TCCGCTG

GATAACAATAACAA

AATAACCATAACCA

GTAACGA

ATAGAAC

TAGAACA

GTAATGA

AATACAAATACAAC

GTACACA

CCATTAC

GGTACAT

GTACATC

GCCTGTGCCTGTGT

AATAGAA

ATAGCAT

ATAGCAG

CATAGCG

ATAGCGC

TATAGGCATAGGCA

GATAGTAATAGTAT

CGCAACG

CTGCCGG

AGTATACGTATACA

ATATAGG

ACGTCACCGTCACT

ATATCAG

CTCTCTG

TCTCTGC

TATATCGATATCGG

GTAACAA

ATATGTC

ATATTCT

GATATTG

ATATTGC

GTATTTT

ACTGCACCTGCACC

CATCAAC

ATCAACG

CAAAGGT

GGTCACA GTCACAA

ATCACCA

TTCTGAC

TCTGACG

TCTGACT

GTCAGGA

ATCAGTG

ATCAGTT

TGCCGAGGCCGAGA

GTCATTA

ATCCACC

ATCCACG

CCACCGT

CACCGTC

CATCCAT

ATCCATG

AGCTTTA

GATCCCGATCCCGA

ACCGTCC

ATCCGGT

CGTTCCGGTTCCGG

CGTCCTC

GTCCTGC

CGCTGCCGCTGCCC

GTCGAAA

TCACCAT

TATCGAT

CCAGCACCAGCACC

CCGTCCT

TGTCGGGGTCGGGG

AACTGCT

ACTGCTG

GATCGTAATCGTAT

TTCACCA

TCACCAA

CGTATTCGTATTCC

ATCTACC

CGTATTG

GTATTGC

GCTGGCG

TGTCTGA

GTCTGAT

ATCAGGTTCAGGTA

CATCTGG

ATCTGGT

CATCTGTATCTGTG

TGTCTTAGTCTTAG

TTTTCTG

TTTCTGC

CTTTCGT

ATGAAAA

ATGAAAG

GTTATGGTTATGGA

GTGCTGATGCTGAC

CGTGAGT

GTGAGTA

GGTGATGGTGATGG

ATGATTG

ATGCAGA

TGTCAATGTCAATC

AATGCCA

ATGCCAG

GTGCCCG

ATGCCGA

CATGCGA ATGCGAG

GTGCGGG

CATCACC

AATGGAAATGGAAA

TGTGGAT GTGGATT

GTGGCAA

GTGGCCA

CGTGGCG

GTGGCGA

ATGGCTG

GTGGGTCCATGGTG

GTGGTTC

GTCACTA

ACGGGTT

CATGTCC

ATGTCGA

TATGTCT

ATGTCTC

CACTCCG

GGGATTAGGATTAC

TGTGTGG

GTGTGGA

CGTGTTA

GTGTTAG

GTGTTGA

GATTAAAATTAAAA

ATTACAG

CGTTACC

GTTACCG

TATTACGATTACGG

CATTACTATTACTA

TGTTAGA

GTTAGAA

CATTAGC

ATTAGCA

ATTAGCG

AGCTTCT

GCTTCTG

AGTTATAGTTATAG

GTTATCG

CATTATG

ATTATGG

CGGTACA

CATTCCG

ATTCCGG

CCAGGCA

CAGGCAG

GTTCGGC

GGTTCTCGTTCTCG

TATTCTG

ATTCTGG

GTTTCGCTTTCGCT

TGTTGAA

GTTGAAG

TATTGACATTGACT

GTTGATA

GTTGCCG

TATTGCG

ATTGCGG

TATTGCT

ATTGCTG

TACTTTAACTTTAA

TTGATAT

TCCGGTC

CCGGTCG

TTACCCGTACCCGA

AGGCTGA

TGTTTACGTTTACG

GTTTCAC

ATTTCCG

AGTTTGAGTTTGAT

GATTTGCATTTGCC

GTTTGTT

ATTTTAT

CGTTTTC

GTTTTCT

TTTGCCG

TTGCCGA

TTGCCGT

TATTTTT

ATTTTTG

TTAAAAA

CCAAAAT

ACAAAGC

CCAAAGG

CCAAATA

TGGAAAG

CAACAGG

ACAACAT

TTAACCA

TAACGAG

TAACGAC

TCAACGC

CAACGCT

CAACGCC

ACAACGTCAACGTT

AATCCGG

CGCCGAT

GCCGATT

TTCCGCT

CGAGTAA

GACTTCT

CCGAACT

CGAACTT

GCGCGTG

TATCAGC

ATCAGCG

GCAATACCAATACG

TAACCGTAACCGTG

CCAATAT

AGCTTTT

GCTTTTC

CAATGCC

TTAATTATAATTAA

CCGAAGGCGAAGGT

CAATTGA

TCACAACCACAACG

GCACAAT

TTACAGA

TACAGAT

CAGAATAAGAATAC

CCACAGG

TCAGTCTCAGTCTG

GTACATATACATAA

CCACATG

CACATGG

TCGAATC

ACACCAGCACCAGC

TGTCGAT

GTCGATC

CCTGCCG

CTGCCGT

GCAGTGC

TTACCGCTACCGCT

GCAGTGG

GGATGCT

GATGCTT

TAGCGGT

AATTGAA

TTACGCG

TATTGATATTGATG

TTACGGTTACGGTG

TTACGTCTACGTCG

TCACTAA

CACTAAC

TGAAACG

GCACTATCACTATT

TACTCCC

GTTCCGA

TCACTGC

CACTGCA

ATACTGGTACTGGA

CAGAAAA

TTAGAAG

TAGAAGC

AGTCGGTGTCGGTC

CGATGGCGATGGCC

ATAGCAATAGCAAC

TTAGCAC

CCAGCCACAGCCAA

CCAGCCG

CAGCCGG

GCGACGCCGACGCC

TTAGCGG

TCGAAAA

GTGACGG

CAGCTTC

CAGGAAT

TCAGGAT

CAGGATC

CACATCA

CATTCTG

ATTCTGA

ACGACTA

ATATTACTATTACA

GCAGGTG

GCAGGTT

GCAGTGACAGTGAG

CAGTGCG

TCAGTGG

ACAGTTGCAGTTGA

TCAGTTT

ATATAAATATAAAA

CGATTCTGATTCTA

TGGCTGG

GGCTGGT

TGTCTCT

GTCTCTG

GCATCAA

TTCTGCG

TTATCGA

TGATATT

GTTCTGG

TTCTGGG

CCATCTG

TCGAGCCCGAGCCG

CCATGAA

CATGAAA

CATGAAC

ACATGACCATGACA

ACATGCACATGCAG

CCATGCG

CCATCAC

CATGGCA

CATGGCT

ACATGGT

GCATCAG

ATTCCGCTTCCGCC

CCCAACGCCAACGT

TCATTAC

CCATTAG

CCATTAT

CATTATC

GGTGCCGGTGCCGA

TCATTCT

TTATTGACGTCTTT

GTCTTTT

ATGAAAC

CCCAAAT

TCTGAAC

CTGAACT

GCGAGTACCCAATACCAATAG

GTCAATTTCAATTA

ACCACAT

TCACCAG

TCCACCC

CCACCCT

GCCACCG

GCCTTTTCCTTTTT

ACCACGA

CCACGAA

CCACGAC

CCATGGT

TTCACTG

TCTGACCCTGACCG

ACCAGAGCCAGAGG

CCCAGCC

GACATCA

GCCAGGC

TTGACTT

TTGACTC

TTGACTG

GTCATAATCATAAC

ACCATCA

CCATCAG

TCCATGA

CTCATGGTCATGGT

TTCATTC

CTCGCCG

TCGCCGC

GCCCAAA

AATGAGTATGAGTG

GCCCACGCCCACGC

CTTCTGA

CTTCTGG

CAACGCAAACGCAG

ATGGCAT

CCCCCAC

CCCCCGC

TGGCAAA

TATCCATATCCATC

ATGATGG

AATGCCC

TTCGGCG

TCGGCGG

TTCCGAC

TCCGACT

TCCGACA

CCCGCAC

CCCGCGC

GGCATCA

CCCGGAT

CCCGGAC

TTCCGGC

TCCGGCT

TTCCGTCTCCGTCG

TTATGGC

TATGGCC

TCCTGCA

AGTCACT

TTTGCCC

TTGCCCACCGAACG

TTCGAAGTCGAAGC

CTCGAAT

TTCGACC

CCGACTA

GCCGATA

CCGATAT

TCGATCG

GTTGCGGTTGCGGA

CCGATTG

CCGCACC

CCGCATT

CCGCCAA

CCGCCCA

CCGCCGC

CTGCGCT

AGGCACA

GCCGCGGCCGCGGT

CGCATGGGCATGGT

CCCCCGACCCCGAC

TTGCGGC

CCGCTGG

CCGCTGC

CCGGATA

CCGGCGT

CCGGCTG

CCGGGAT

CCGGTAA

CCGTATT

TTGCTGA

TTCGTCG

GGACACAGACACAC

CCGTGAG

CCGTGGC

GCCTGTT

GGCAGGT

TGCCGAT

ATGCTTT

TGGCAAGGGCAAGA

ACCTGCC

TCTGCGG

CTGCGGG

TGGCATC GGCATCC

GCCTCAACCTCAAC

GCTGACA

CTGACAG

TACAAAAACAAAAC

ACCTCCACCTCCAA

ATCTCCG

TCTCCGC

CCTCGAA

ACTGGCA

CTGGCAG

CCTCTCT

TCTCTGT

ATGGCCG

TTCTGAA

CCTGACA

TCTGATA

CCTGCAT

TTCTGGA

TCTGGAA

AAATTTAAATTTAT

TCTGGGA

TCTGGTA

CTCTGTG

TCTGTGT

CCTGTTT

CTAAGGTTAAGGTA

TGATTTG

TTGAAAA

TGCCCAA

GGATTTTGATTTTT

GTGACGA

GTGCCCC

GCGCAGTCGCAGTG

CTGCCCG

TGAACTT

TTGAAGATGAAGAA

GCTGGTA

TTGAAGT

CTGAATC

GCCAAATCCAAATT

TCGACCGCGACCGA

CTGACGATGACGAC

CTGGTTA

CTGACTG

AGCGCCAGCGCCAT

GGCAACG

CTGAGGC

TGAGGCG

AATGTAGATGTAGG

ATGATAATGATAAA

CTCAAATTCAAATA

CTGATAG

CTGATGG

CCGATGTCGATGTT

GAGCCGGAGCCGGT

CTGATTT

CGCACAG

TTGCGACTGCGACA

GTGTCTG

ACCAAACCCAAACG

TGGCGAG

CTGCATG

TGCCAGG

CGCCCAA

CTGCCCC

TGCCGGG

CCGCCTACGCCTAA

GCACAAA

CCGCGATCGCGATG

GCTGGTT

CTGCGGT

CCGCTCACGCTCAG

ATTGTTG

TTGTTGG

ACGCTGA

CTGGAAA

TTGCGGTTGCGGTA

ACGACCACGACCAC

CTGGCAC

CGGCTGA

CTGGCCCTGGCCCC

ACGGCGCCGGCGCG

GTTACCT

ACGGGAGCGGGAGT

ACGCCGCCGCCGCA

TTGGGGC

TGGGGCA

GCGGGTACGGGTAA

CTGGTAG

AATTAGC

ACCGATCCCGATCC

CGGTGCT

CTGGTGG

TGGTGGC

ACGGTTGCGGTTGA

CTGGTTT

GCGGCTG

GTTCGAATTCGAAA

ACGTAATCGTAATT

CTGGCGATGGCGAC

GGCAGAGGCAGAGG

GCCAAAG

CGCAGATGCAGATT

CCACTAG

CCACTAT

CAGCGGGAGCGGGC

TTTTATTTTTATTG

GTAACGTTAACGTA

TGTCTGC

GCGTCTTCGTCTTA

CAGCGTCAGCGTCC

CCGTGATCGTGATC

TGGAAGCGGAAGCG

TTGTGCCTGTGCCG

TTTCACT

AGTGAAAGTGAAAA

CTGTGTG

AATATTGATATTGG

CTGCAAC

TGCAACT

TGTTGAC

TGTTGGG

CTGTTTA

TTGTTTCTGTTTCG

CTGTTTGTGTTTGG

GTGGAAATGGAAAA

GGGTCGAGGTCGAC

TTTAACC

TACAGGT

TTGTTAG

TGCGCCG

TTTACGC

AGTGCTG

CCGGTTG

TGGCCGCGGCCGCC

TTTCGTC

TTAGCAA

TTAGCGT

TAACCACAACCACG

GTTAGTATTAGTAG

CTTATAGTTATAGC

TGTTGGCGTTGGCG

TTATCGG

GTTATGATTATGAA

TGCATCCGCATCCA

ACTCAACCTCAACA

TTTCACATTCACAG

GAGGCGT

CCTCAGACTCAGAT

TCGGATA

CTGACCA

TTTCATT

GTTCCAATTCCAAC

CGTAGCG

ACTCCCCCTCCCCG

GTCATGCTCATGCA

TTTCCGC

TTTCCTCTTCCTCG

TTTTGCC

TTTCGAC

GCCACCT

CTTCGGATTCGGAC

GTTCGGGTTCGGGC

CAACTTTAACTTTA

GTCAAAATCAAAAA

GTCATTC

AAAACAAAAACAAT

GTTGGGG

CCTGAAC

CTGATGT

AGTGCGC

CTTGCGATTGCGAG

GGCGAGT

CTGGCGG

ACTGGACCTGGACT

GCTGGATCTGGATT

ACTTTACCTTTACC

CTGGGAC

TTGGGGT

AACCAGCACCAGCT

AATTTCAATTTCAG

ATTGTGGTTGTGGC

TTTGTTG

TTGTTGA

CTTTAAATTTAAAC

CTTTAAC

ACTTAAGCTTAAGT

CCTAGGTCTAGGTC

CTTTTCTTTTTCTT

GGGACCCGGACCCG

AACAATTACAATTA

TTTTCAT

TTTTCGA

TTTTCGGTTTCGGC

AATTTGTATTTGTC

TTTTGCATTTGCAG

TGCCCACGCCCACA

GCTTGGCCTTGGCA

CCTTCTG

CCTTTAGCTTTAGC

TTTTATC

TTTTATG

CTTTTCA

TCAGTGC

TTTTTGC

TTTTTGG

CTTTTTC

Page 3: AAB ABB 2 AA BA BB

CGAAAAA

GAAAAAC

AAAAAAC

AAAAACC

AAAAACTAAAAACA

GAAAAAGAAAAAGG

TAAAAAT

AAAAATT

GGAAACA

GAAACAC

AAAACCA

AAAAAAA

AAAAAAG

TGAAAACGAAAACT

AAAAAGA

AAAAGAG

AAAAAGC

AAAAGCC

AAAAGGC

GAAAATGAAAATGT

TAAAATT

AAAATTT

GTATCTATATCTAC

AAACACA

CGCCATT

GCCATTA

AAAACAT

AAACATG

AAACCAT

TGACGGTGACGGTA

AAAACCGAAACCGC

AGCCCGC

GCCCGCA

GAAACGC

AAACGCA

AGAACGTGAACGTT

AAAACTG

AAACTGC

AAAACTTAAACTTT

AAGGTAA

AGGTAAC

AAAGAGT

GAAAGCA

AAAGCAA

AAAGCAG

AAAGCCC

AGAAGCG

GAAGCGC

AAAGGCG

TGAAGTT

GAAGTTC

GAAGTTA

CAAATAA

AAATAAA

TAAATAC

AAATACT

CAGCGAT AGCGATG

AAAATCA

AAATCAC

AAATCAT

CTTAGGT

TTAGGTC

GGAAAGC

CAAATGC

AAATGCA

AAATGTC

AAATGTA

AAAATTA

AAATTAC

GGAATTT

GAATTTG

AAACAAAAACAAAG

TAACAAC

AACAACC

AACAACG

AAGCGCG

AGCGCGT

AACACAG

AGAAAAA

GAAAAAA

AGACAGA

GACAGAT

CTAAATA

TGACAGTGACAGTG

CTGGTAA

TGGTAACCAACATC

AACATCC

AACATGT

TGACATTGACATTG

CGACCAA

GACCAAA

TGACCAC

GACCACA

GACCACC

CGACCAG

GACCAGG

CAACCAT

AACCATC

AACCATG

ATGGTGC

TGGTGCT

GTGGGAC

TGGGACG

CTGACGG

TGACGGG

TGACCTGGACCTGG

CCCCGCC

CCCGCCA

AAAAAATAAAAATG

CTGCTGG

TGCTGGC

TGCTGGT

TGACGAG

GACGAGA

CAACGATAACGATG

GGACGCA GACGCAA

TGACGCG

GACGCGT

GGACGCTGACGCTA

CAACGGG

AACGGGC

TAACGGT

AACGGTG

GAACGTA

AACGTAT

AAAACAGAAACAGC

ACGGTTC

CGGTTCC

CGACTAC

GACTACT

CGCAATGGCAATGG

CAGGTTG

AGGTTGC

GGACTCGGACTCGC

TAACTGCAACTGCC

GAACTGG

AACTGGT

TGACTTA

GACTTAG

GAACTTC

AACTTCT

AACTTTC

CGAGAAAGAGAAAA

GAGAAAT

GCGTAAACGTAAAT

TGCAACG

GCAACGG

AACCGGTACCGGTC

ACCACCA

CCACCAT

CCACCAC

CACCGTAACCGTAT

AAGAGTG

TAAATTA

AAATTAA

AAGCAAT

TAGTGCG

AGTGCGG

AAGCAGA

CTGCGATTGCGATG

AAGCCCG

CAAGCCG

AAGCCGC

ACACAGA

CACAGAA

GTAACGG

CGAGCTG

GAGCTGG

AGAAAAT

GTTCTGATTCTGAT

CGAACGTGAACGTC

AGAGGCAGAGGCAG

AAGGCGA

AAAGGTA

GAAGGTTAAGGTTT

TGAGTAA

GAGTAAA

AGAGTAC

GAGTACA

TGAACTG

ACCGCTG

CCGCTGA

CCTATATCTATATA

CGAGTGTGAGTGTT

CGAGTTAGAGTTAG

AAGTTCG

AATAAAA

CGAGGTA

GAGGTAA

AATACTT

ACATCAG

CATCAGT

TGATAGC

GATAGCA

GTCCTCT

TCCTCTC

CAATATAAATATAGCAATATC

AATATCA

CAATATGAATATGT

CGATATT

GATATTC

AATCACC

AAACAGTAACAGTT

GAATCCA

AATCCAC

CAATCCGAATCCGC

TTCCTGCTCCTGCG

CGATCGC

GATCGCC

ACGTATTCGTATTT

GAATCTA

AATCTAC

CAAAGCC

TAATGAA

AATGAAA

TGCGGGT

GCGGGTT

GCGGGTG

TAATGAGAATGAGA

CGATGAT

GATGATT

AATGCAG

CGATGCC

GATGCCG

CACACCAACACCAT

CTTTTTT

TTTTTTT

TGATGGC

GATGGCT

ACGGGGCCGGGGCT

AATGTCG

AATTAAA

AATTACA

GGCGCTGGCGCTGT

GAATTCCAATTCCG

AAATTCGAATTCGG

CAAAGGCAAAGGCA

TGATTGA

GATTGAA

TGATTTAGATTTAG

GGATTTC

GATTTCC

AATTTGC

AAATTTT

AATTTTA

GACAAAAACAAAAG

GGCAAAT

GCAAATG

TACAACA

ACAACAG

ACAACCA

GGCAAGC

GCAAGCC

ATGTCCT

TGTCCTG

GGCAATA

GCAATAT

AGCAATG

GCAATGC

CACAATT

ACAATTG

TACACAA ACACAAC

TCGTCGA

CGTCGAC

AACACATACACATT

AGCACCA

GCACCAC

GCACCAG

TAGCGAGAGCGAGA

CGCACCGGCACCGA

CGCACCT

GCACCTG

GACACGCACACGCC

CACCGAAACCGAAA

GTCGACCTCGACCA

ACAGAAA

CACAGAC

ACAGACA

TACAGAG

ACAGAGT

ACAGATA

AGCAGCGGCAGCGT

CGCAGCTGCAGCTG

TACAGGA

ACAGGAA

GGCAGGG

GCAGGGG

GCAGGGC

CACAGGT

ACAGGTA

TGCAGTAGCAGTAC

ACAGTGC

GACATACACATACA

GGCATAGGCATAGC

TACATCA

ACATCCA

CCGTTAT

CGTTATC

CACATGAACATGAT

GCTATATCTATATG

TGCATGG

GCATGGC

ACATGTC

CGCATTA

GCATTAG

CGCATTC

GCATTCC

GCATTCG

GGCATTGGCATTGC

TGAATCC

CGCCAAA

GCCAAAA

CACCAAC

ACCAACC

AACCAAT

ACCAATA

TACCACA

ACCACAG

CACCACC

TAAAAACCACCACT

ACCACTA

TACCAGCACCAGCA

ACCAGGA

CGAATCT

ACCATCT

ACCATGC

AACCATT

ACCATTA

TACCCAA

ACCCAAT

TACGGCCACGGCCG

CGCCCAG

GCCCAGC

TGCCCCC

GCCCCCA

GCCCCCG

GAAATGC

CACCCGCACCCGCC

ACCCGCA

TGCCCGG

GCCCGGA

AGCGATTGCGATTA

CGTCGAAGTCGAAT

CGCCCTGGCCCTGC

CAGTGCC

AGTGCCC

TGCCGAA

GCCGAAC

TACCGAGACCGAGG

GGAATGTGAATGTG

AGCCGCA

GCCGCAT

CGCCGCC

GCCGCCC

GCCGCCG

ACGGGCA

CGATCCG

GATCCGG

GTCAGCG

TCAGCGA

GGCCGGC

GCCGGCG

AGCCGGG

GCCGGGA

TGCCGGT

GCCGGTA

GCCGGTC

CGCCGTA

GCCGTAT

TACCGTCACCGTCA

TGCCGTG

GCCGTGA

GCCGTGG

TACCGTT

ACCGTTA

ACCGTTG

TTAATTGTAATTGA

TACCTCG

ACCTCGA

CAGTGGC

AGTGGCA

CACCTGA

ACCTGAC

ATCGCCA

TCGCCAT

TGCCTGGGCCTGGT

GGCGCGTGCGCGTT

CAGCGTGAGCGTGT

CACCTTTACCTTTA

GCGGGTCCGGGTCG

TCACAAT

TCGTGGT

CGTGGTC

CGTGGTT

CCCGCCG

CCGCCGT

TAAAAAA

CACAACA

GGCGAGA

GCGAGAA

GGCGAGC

GCGAGCT

AACGAGG

ACGAGGT

TGCGAGT GCGAGTG

AGCGATAGCGATAA

TAGCGCG

GCGATGA

GCGATGC

ACGGTGC

CGGTGCG

ACGCAAC

AGCGCAC

GCGCACA

AACGCAT

ACGCATT

CAAAAGCAAAAGCA

CCGTGTGCGTGTGG

CGCGCCG

GCGCCGA

TACGCGC

ACGCGCC

TTACACA

AACGCGTACGCGTG

AACGCTG

ACGCTGC

TACGGAGACGGAGT

ACGCGTA

TGCGGCA

GCGGCAA

GCGGCAG

GGCGGCCGCGGCCT

GGCGGCGGCGGCGT

GGCGGGAGCGGGAC

TGCGGGC

GCGGGCT

TACACAGACACAGT

TTTTTTC

TTTTTCG

GGCGGTA

GCGGTAC

AGCGGTCGCGGTCA

AACGGTT

ACGGTTA

GCACAGA

TGCGCTG

GCGCTGA

CATTACC

ATTACCT

ATTACCA

GGCGTGT

GCGTGTT

AACGTTA

ACGTTAC

GTACAGG

AACGTTTACGTTTT

CACTAAA

ACTAAAT

AAGTTTCAGTTTCG

GAGTTCG

AGTTCGG

TGAAAAA

ACTACTC

CTCTGCC

TCTGCCC

AGCTATGGCTATGA

TTACATCTACATCG

CCGTTGA

CGTTGAT

TACTCCAACTCCAC

TACTGAAACTGAAA

TACTCCG

ACTCCGC

AACTCCTACTCCTG

ACTCGCC

TGCAGAA

GCAGAAC

CAACATTAACATTC

AGCTGGT

GCTGGTG

TGCTGAA

GCTGAAT

GGCTGAC

GCTGACG

GCTGACC

GACTGAGACTGAGG

CGCTGAT

GCTGATT

GCTGATC

GACTGCA

ACTGCAA

CACTGCC

ACTGCCG

ACTGCCT

CGCTGCG

GCTGCGC

GCTGCGG

TTACCAC

TACCACC

CGCTGGC

GCTGGCA

ACTGGTT

TCCGCCCCCGCCCC

AACACCAACACCAC

AGCTGTTGCTGTTG

GTGCTGG

GACGGGA

ACGGGAC

ACTTAGG

CACCATC

CACCATT

CAGTTTG

AGTTTGT AGTTTGC

ACTTCTG

TCGCCGT

ACAGCGA

CAGAGTG AGAGTGT

TGCTTTA

GCTTTAC

ACTTTCG

TAAAACA

GGCTTTT

GCTTTTT

ATAAAAC

ATAAAAA

GGGAAATGGAAATA

CCACCCG

CAGAACG

GGGAAGCGGAAGCA

AGGAAGGGGAAGGG

CAAAATC

AGGAATT

CAGACAG

AGTAAAT

GTAAATT

ACCAAAG

TGCGGTG

GCGGTGC

GGGACGC

GGACGCG

GGGACTAGGACTAG

GGGACTC

GGAACCAGAACCAC

TTACCGT

CGCGTACGCGTACA

AGTAACTGTAACTT

CAGAGTA

ATGCCAATGCCAAG

TTACCTC

CAGATAA

AGATAAA

AGATAAT

CGGATAG

GGATAGC

TTACCTG

TACCTGC

CTGACGC

TGGATTA GGATTAA

GGGATTCGGATTCC

TGCCCGC

GCCCGCG

GGGATTT

TTAAAATTAAAATA

CGGCAAG

GGGCAAT

TGGCACAGGCACAA

TAGCACC

TGGCACGGGCACGA

TAGCAGC

AGCAGCT

AGGCAGG

TGGCAGTGGCAGTG

AGGCATA

GGCATAC

TAGCATC AGCATCA

CGGCCAA

GGCCAAA

TGGCCAC

GGCCACC

GTGGTCATGGTCAC

GAGCCGAAGCCGAT

TGGCCGG

CTGAGTGTGAGTGC

CGCGTGG

GCGTGGT

CGGCCTGGGCCTGT

AGGCGAG

TAGCGATTAGCGCA

CAGCGCG

CACCGCCACCGCCA

TGGCGGC

TTATCACTATCACC

CGGCGGT

CGTACAG

CGGCGTG

TGCTGAT

GCTGATG

GCCCGAACCCGAAT

TTCGCTGTCGCTGG

GGGCTCTGGCTCTT

GGGCTGA

ACCTGGTCCTGGTA

TTTACCC

TTACCCA

CAGGAAA

AGGAAAC

ACCCCGACCCCGAA

CAACGGT

TCTACCG

CTACCGT

CGGGACT

CAGGATAAGGATAG

CAGGATG

AGGATGC

CGGGATT

CGGGCAA

GGGGCAG

GGGCAGG

GGGCAGT

TAGGCAT

CGGGCCTGGGCCTT

CGGGCTG

CGGGCTT

GGGCTTT

CACCGCT

AGGGGCA

GGGGGCCGGGGCCT

CAGGGGC

CCTATTTCTATTTT

CTACTCC

CAGGTAA

GAGGTACAGGTACA

GGGGTAGGGGTAGC

TAGGTCA

AGGTCAC

TGGGTCA

GGGTCAT

GGGTCAG

GGGGTGAGGGTGAC

CAGGTGG

AGGTGGC

TATACTT

ATACTTT

TGGGTTAGGGTTAT

CAGGTTCAGGTTCT

CGGGTTG

GGGTTGC

GGTAACG

GGTAACA

GGTAACC

GAGTAAGAGTAAGT

CGGTAAT

GGTAATG

AGTACAC

AGTACAT

TTGAGGCTGAGGCA

TGGTAGC

GGTAGCG

GGTCACT

GGTCACC

CGGTCAG

GGTCAGG

GGTCATT

TGACTGC

CGGTCGA

GGTCGAA

TGGTCGTGGTCGTT

ATCGATC

TCGATCC

CCAGGAA

GCAGCTT

GTAGCGA

GGTGCGG

GGTGCTG

TAACCAA

GGTGGCC

CAGTGGG

AGTGGGT

TGGTGGT

GGTGGTT

GAGTGTC

AGTGTCT

CAGTGTGAGTGTGG

AGTGTTG

TGGTTAC

GGTTACC

TCACCACCACCACA

GGTTCCG

TGGTTCT

GGTTCTG

GGTTGCC

GGGTTGTGGTTGTC

TGGTTTC

GGTTTCA

CCAACCA

AGCAGGAGCAGGAA

GATAAAA

CATAAACATAAACG

ATTGAAA

CTCCGCT

TCCGCTG

GATAACAATAACAA

AATAACCATAACCA

GTAACGA

ATAGAAC

TAGAACA

GTAATGA

AATACAAATACAAC

GTACACA

CCATTAC

GGTACAT

GTACATC

GCCTGTGCCTGTGT

AATAGAA

ATAGCAT

ATAGCAG

CATAGCG

ATAGCGC

TATAGGCATAGGCA

GATAGTAATAGTAT

CGCAACG

CTGCCGG

AGTATACGTATACA

ATATAGG

ACGTCACCGTCACT

ATATCAG

CTCTCTG

TCTCTGC

TATATCGATATCGG

GTAACAA

ATATGTC

ATATTCT

GATATTG

ATATTGC

GTATTTT

ACTGCACCTGCACC

CATCAAC

ATCAACG

CAAAGGT

GGTCACA GTCACAA

ATCACCA

TTCTGAC

TCTGACG

TCTGACT

GTCAGGA

ATCAGTG

ATCAGTT

TGCCGAGGCCGAGA

GTCATTA

ATCCACC

ATCCACG

CCACCGT

CACCGTC

CATCCAT

ATCCATG

AGCTTTA

GATCCCGATCCCGA

ACCGTCC

ATCCGGT

CGTTCCGGTTCCGG

CGTCCTC

GTCCTGC

CGCTGCCGCTGCCC

GTCGAAA

TCACCAT

TATCGAT

CCAGCACCAGCACC

CCGTCCT

TGTCGGGGTCGGGG

AACTGCT

ACTGCTG

GATCGTAATCGTAT

TTCACCA

TCACCAA

CGTATTCGTATTCC

ATCTACC

CGTATTG

GTATTGC

GCTGGCG

TGTCTGA

GTCTGAT

ATCAGGTTCAGGTA

CATCTGG

ATCTGGT

CATCTGTATCTGTG

TGTCTTAGTCTTAG

TTTTCTG

TTTCTGC

CTTTCGT

ATGAAAA

ATGAAAG

GTTATGGTTATGGA

GTGCTGATGCTGAC

CGTGAGT

GTGAGTA

GGTGATGGTGATGG

ATGATTG

ATGCAGA

TGTCAATGTCAATC

AATGCCA

ATGCCAG

GTGCCCG

ATGCCGA

CATGCGA ATGCGAG

GTGCGGG

CATCACC

AATGGAAATGGAAA

TGTGGAT GTGGATT

GTGGCAA

GTGGCCA

CGTGGCG

GTGGCGA

ATGGCTG

GTGGGTCCATGGTG

GTGGTTC

GTCACTA

ACGGGTT

CATGTCC

ATGTCGA

TATGTCT

ATGTCTC

CACTCCG

GGGATTAGGATTAC

TGTGTGG

GTGTGGA

CGTGTTA

GTGTTAG

GTGTTGA

GATTAAAATTAAAA

ATTACAG

CGTTACC

GTTACCG

TATTACGATTACGG

CATTACTATTACTA

TGTTAGA

GTTAGAA

CATTAGC

ATTAGCA

ATTAGCG

AGCTTCT

GCTTCTG

AGTTATAGTTATAG

GTTATCG

CATTATG

ATTATGG

CGGTACA

CATTCCG

ATTCCGG

CCAGGCA

CAGGCAG

GTTCGGC

GGTTCTCGTTCTCG

TATTCTG

ATTCTGG

GTTTCGCTTTCGCT

TGTTGAA

GTTGAAG

TATTGACATTGACT

GTTGATA

GTTGCCG

TATTGCG

ATTGCGG

TATTGCT

ATTGCTG

TACTTTAACTTTAA

TTGATAT

TCCGGTC

CCGGTCG

TTACCCGTACCCGA

AGGCTGA

TGTTTACGTTTACG

GTTTCAC

ATTTCCG

AGTTTGAGTTTGAT

GATTTGCATTTGCC

GTTTGTT

ATTTTAT

CGTTTTC

GTTTTCT

TTTGCCG

TTGCCGA

TTGCCGT

TATTTTT

ATTTTTG

TTAAAAA

CCAAAAT

ACAAAGC

CCAAAGG

CCAAATA

TGGAAAG

CAACAGG

ACAACAT

TTAACCA

TAACGAG

TAACGAC

TCAACGC

CAACGCT

CAACGCC

ACAACGTCAACGTT

AATCCGG

CGCCGAT

GCCGATT

TTCCGCT

CGAGTAA

GACTTCT

CCGAACT

CGAACTT

GCGCGTG

TATCAGC

ATCAGCG

GCAATACCAATACG

TAACCGTAACCGTG

CCAATAT

AGCTTTT

GCTTTTC

CAATGCC

TTAATTATAATTAA

CCGAAGGCGAAGGT

CAATTGA

TCACAACCACAACG

GCACAAT

TTACAGA

TACAGAT

CAGAATAAGAATAC

CCACAGG

TCAGTCTCAGTCTG

GTACATATACATAA

CCACATG

CACATGG

TCGAATC

ACACCAGCACCAGC

TGTCGAT

GTCGATC

CCTGCCG

CTGCCGT

GCAGTGC

TTACCGCTACCGCT

GCAGTGG

GGATGCT

GATGCTT

TAGCGGT

AATTGAA

TTACGCG

TATTGATATTGATG

TTACGGTTACGGTG

TTACGTCTACGTCG

TCACTAA

CACTAAC

TGAAACG

GCACTATCACTATT

TACTCCC

GTTCCGA

TCACTGC

CACTGCA

ATACTGGTACTGGA

CAGAAAA

TTAGAAG

TAGAAGC

AGTCGGTGTCGGTC

CGATGGCGATGGCC

ATAGCAATAGCAAC

TTAGCAC

CCAGCCACAGCCAA

CCAGCCG

CAGCCGG

GCGACGCCGACGCC

TTAGCGG

TCGAAAA

GTGACGG

CAGCTTC

CAGGAAT

TCAGGAT

CAGGATC

CACATCA

CATTCTG

ATTCTGA

ACGACTA

ATATTACTATTACA

GCAGGTG

GCAGGTT

GCAGTGACAGTGAG

CAGTGCG

TCAGTGG

ACAGTTGCAGTTGA

TCAGTTT

ATATAAATATAAAA

CGATTCTGATTCTA

TGGCTGG

GGCTGGT

TGTCTCT

GTCTCTG

GCATCAA

TTCTGCG

TTATCGA

TGATATT

GTTCTGG

TTCTGGG

CCATCTG

TCGAGCCCGAGCCG

CCATGAA

CATGAAA

CATGAAC

ACATGACCATGACA

ACATGCACATGCAG

CCATGCG

CCATCAC

CATGGCA

CATGGCT

ACATGGT

GCATCAG

ATTCCGCTTCCGCC

CCCAACGCCAACGT

TCATTAC

CCATTAG

CCATTAT

CATTATC

GGTGCCGGTGCCGA

TCATTCT

TTATTGACGTCTTT

GTCTTTT

ATGAAAC

CCCAAAT

TCTGAAC

CTGAACT

GCGAGTACCCAATACCAATAG

GTCAATTTCAATTA

ACCACAT

TCACCAG

TCCACCC

CCACCCT

GCCACCG

GCCTTTTCCTTTTT

ACCACGA

CCACGAA

CCACGAC

CCATGGT

TTCACTG

TCTGACCCTGACCG

ACCAGAGCCAGAGG

CCCAGCC

GACATCA

GCCAGGC

TTGACTT

TTGACTC

TTGACTG

GTCATAATCATAAC

ACCATCA

CCATCAG

TCCATGA

CTCATGGTCATGGT

TTCATTC

CTCGCCG

TCGCCGC

GCCCAAA

AATGAGTATGAGTG

GCCCACGCCCACGC

CTTCTGA

CTTCTGG

CAACGCAAACGCAG

ATGGCAT

CCCCCAC

CCCCCGC

TGGCAAA

TATCCATATCCATC

ATGATGG

AATGCCC

TTCGGCG

TCGGCGG

TTCCGAC

TCCGACT

TCCGACA

CCCGCAC

CCCGCGC

GGCATCA

CCCGGAT

CCCGGAC

TTCCGGC

TCCGGCT

TTCCGTCTCCGTCG

TTATGGC

TATGGCC

TCCTGCA

AGTCACT

TTTGCCC

TTGCCCACCGAACG

TTCGAAGTCGAAGC

CTCGAAT

TTCGACC

CCGACTA

GCCGATA

CCGATAT

TCGATCG

GTTGCGGTTGCGGA

CCGATTG

CCGCACC

CCGCATT

CCGCCAA

CCGCCCA

CCGCCGC

CTGCGCT

AGGCACA

GCCGCGGCCGCGGT

CGCATGGGCATGGT

CCCCCGACCCCGAC

TTGCGGC

CCGCTGG

CCGCTGC

CCGGATA

CCGGCGT

CCGGCTG

CCGGGAT

CCGGTAA

CCGTATT

TTGCTGA

TTCGTCG

GGACACAGACACAC

CCGTGAG

CCGTGGC

GCCTGTT

GGCAGGT

TGCCGAT

ATGCTTT

TGGCAAGGGCAAGA

ACCTGCC

TCTGCGG

CTGCGGG

TGGCATC GGCATCC

GCCTCAACCTCAAC

GCTGACA

CTGACAG

TACAAAAACAAAAC

ACCTCCACCTCCAA

ATCTCCG

TCTCCGC

CCTCGAA

ACTGGCA

CTGGCAG

CCTCTCT

TCTCTGT

ATGGCCG

TTCTGAA

CCTGACA

TCTGATA

CCTGCAT

TTCTGGA

TCTGGAA

AAATTTAAATTTAT

TCTGGGA

TCTGGTA

CTCTGTG

TCTGTGT

CCTGTTT

CTAAGGTTAAGGTA

TGATTTG

TTGAAAA

TGCCCAA

GGATTTTGATTTTT

GTGACGA

GTGCCCC

GCGCAGTCGCAGTG

CTGCCCG

TGAACTT

TTGAAGATGAAGAA

GCTGGTA

TTGAAGT

CTGAATC

GCCAAATCCAAATT

TCGACCGCGACCGA

CTGACGATGACGAC

CTGGTTA

CTGACTG

AGCGCCAGCGCCAT

GGCAACG

CTGAGGC

TGAGGCG

AATGTAGATGTAGG

ATGATAATGATAAA

CTCAAATTCAAATA

CTGATAG

CTGATGG

CCGATGTCGATGTT

GAGCCGGAGCCGGT

CTGATTT

CGCACAG

TTGCGACTGCGACA

GTGTCTG

ACCAAACCCAAACG

TGGCGAG

CTGCATG

TGCCAGG

CGCCCAA

CTGCCCC

TGCCGGG

CCGCCTACGCCTAA

GCACAAA

CCGCGATCGCGATG

GCTGGTT

CTGCGGT

CCGCTCACGCTCAG

ATTGTTG

TTGTTGG

ACGCTGA

CTGGAAA

TTGCGGTTGCGGTA

ACGACCACGACCAC

CTGGCAC

CGGCTGA

CTGGCCCTGGCCCC

ACGGCGCCGGCGCG

GTTACCT

ACGGGAGCGGGAGT

ACGCCGCCGCCGCA

TTGGGGC

TGGGGCA

GCGGGTACGGGTAA

CTGGTAG

AATTAGC

ACCGATCCCGATCC

CGGTGCT

CTGGTGG

TGGTGGC

ACGGTTGCGGTTGA

CTGGTTT

GCGGCTG

GTTCGAATTCGAAA

ACGTAATCGTAATT

CTGGCGATGGCGAC

GGCAGAGGCAGAGG

GCCAAAG

CGCAGATGCAGATT

CCACTAG

CCACTAT

CAGCGGGAGCGGGC

TTTTATTTTTATTG

GTAACGTTAACGTA

TGTCTGC

GCGTCTTCGTCTTA

CAGCGTCAGCGTCC

CCGTGATCGTGATC

TGGAAGCGGAAGCG

TTGTGCCTGTGCCG

TTTCACT

AGTGAAAGTGAAAA

CTGTGTG

AATATTGATATTGG

CTGCAAC

TGCAACT

TGTTGAC

TGTTGGG

CTGTTTA

TTGTTTCTGTTTCG

CTGTTTGTGTTTGG

GTGGAAATGGAAAA

GGGTCGAGGTCGAC

TTTAACC

TACAGGT

TTGTTAG

TGCGCCG

TTTACGC

AGTGCTG

CCGGTTG

TGGCCGCGGCCGCC

TTTCGTC

TTAGCAA

TTAGCGT

TAACCACAACCACG

GTTAGTATTAGTAG

CTTATAGTTATAGC

TGTTGGCGTTGGCG

TTATCGG

GTTATGATTATGAA

TGCATCCGCATCCA

ACTCAACCTCAACA

TTTCACATTCACAG

GAGGCGT

CCTCAGACTCAGAT

TCGGATA

CTGACCA

TTTCATT

GTTCCAATTCCAAC

CGTAGCG

ACTCCCCCTCCCCG

GTCATGCTCATGCA

TTTCCGC

TTTCCTCTTCCTCG

TTTTGCC

TTTCGAC

GCCACCT

CTTCGGATTCGGAC

GTTCGGGTTCGGGC

CAACTTTAACTTTA

GTCAAAATCAAAAA

GTCATTC

AAAACAAAAACAAT

GTTGGGG

CCTGAAC

CTGATGT

AGTGCGC

CTTGCGATTGCGAG

GGCGAGT

CTGGCGG

ACTGGACCTGGACT

GCTGGATCTGGATT

ACTTTACCTTTACC

CTGGGAC

TTGGGGT

AACCAGCACCAGCT

AATTTCAATTTCAG

ATTGTGGTTGTGGC

TTTGTTG

TTGTTGA

CTTTAAATTTAAAC

CTTTAAC

ACTTAAGCTTAAGT

CCTAGGTCTAGGTC

CTTTTCTTTTTCTT

GGGACCCGGACCCG

AACAATTACAATTA

TTTTCAT

TTTTCGA

TTTTCGGTTTCGGC

AATTTGTATTTGTC

TTTTGCATTTGCAG

TGCCCACGCCCACA

GCTTGGCCTTGGCA

CCTTCTG

CCTTTAGCTTTAGC

TTTTATC

TTTTATG

CTTTTCA

TCAGTGC

TTTTTGC

TTTTTGG

CTTTTTC

Page 4: AAB ABB 2 AA BA BB

ry_thin

thing_t

4

_thing_

5y_thing

6

urn_tur

rn_turn

6

_turn_t

4

n_turn_

5

a_seaso

_season

6

5

6

turn_tu

4

turn_th

4

here_is

e_is_a_

4

ere_is_

6

re_is_a

5

ing_tur

5hing_tu

6

ng_turn

4

urn_the

_there_

4

n_there

5rn_ther

6

5

4

there_i

6

very_th

5ery_thi

6

4

_every_

5

4

every_t

6

6

4

5

is_a_se

5

s_a_sea

4

_is_a_s

6

6

4

5

4

6

5

5

4

5

4

66

4

6

5

o_every

4

6

5

4

6

g_turn_

5

5

6

4

4

_a_seas

5

6

5

6

4

6

5

4

4

4

6

55

6

5

4

6

5

5

44

6

to_ever

5

4

6

5

6

4

6

4

5

6

5

4

6

4

5

5

6

4

6

4

5

4

4

6

55

Green edges can be inferred from blue

Page 5: AAB ABB 2 AA BA BB

Maternal

Paternal

GCATA

CATAT

TCGGT

CGGTA

CGGCA

GGCAT

GCGCC

CGCCG

ATGCG

TGCGC

ATATGGTATA

TATAT

GTAGT

TAGTC

GGTAT

AGTCT

TATGC

TCGGC

TCTCG

CTCGG

GTCTCGTAGTCTCGGCATATGCGCCG  GTAGTCTCGGTATATGCGCCG

Maternal

Paternal

Page 6: AAB ABB 2 AA BA BB

ng_turn_

turn_tur

5

urn_turn4

_turn_th

67

4

to_every

very_thi

4

_thing_t 4

ere_is_a

is_a_sea

4

a_season5

4

5

4

n_there_

4

4

to_every_thing_turn_ _turn_there_is_a_season

_turn (repeated)

Page 7: AAB ABB 2 AA BA BB

Genome

Contigs

Page 8: AAB ABB 2 AA BA BB

Chaisson MJ, et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2015 Jan 29;517(7536):608-11.