Lecture: Graph-Based Genome Scaffolding - IGMigm.univ-mlv.fr/~mweller/scaf_lec.pdf · Sanger...

Preview:

Citation preview

Lecture: Graph-Based

Genome Scaffolding

Mathias Wellermathias.weller@univ-mlv.fr

Montpellier, 2017

DNA

- double strand

- inside nucleus (safe)

RNA

- single strand

- outside nucleus

- transfers genetic code

- Thymine (T) → Uracil (U)

Polymerase

2/33

DNA

- double strand

- inside nucleus (safe)

RNA

- single strand

- outside nucleus

- transfers genetic code

- Thymine (T) → Uracil (U)

Polymerase

2/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*

GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*

GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Sanger Sequencing [Sanger et al ’77]

CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT

GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA

GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*

Sanger Sequencing

1. split helix & create thousands of copies

2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)

4. stir & let polymerase act

5. measure the length of each fragment

each length is the position of a T in the template

Problem

unreliable after a couple hundred bp

chop up DNA into pieces and read those

3/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGTCTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTC

TGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGTCTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTC

TGGTACTCA......ACCTCTCAG

TGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGTCTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTC

TGGAGAGTC

TGAGTACCA

ACTCA......ACCTC

TGGTACTCA......ACCTCTCAG

TGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGTCTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTC

TGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAG

TGGTACTCA......ACCTCTCAG

ACCTCTCAG

ACTCATGGTCTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTC

TGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG

ACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTC

TGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG

ACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Next Generation Sequencing ( )

ACCA

AGTCTGGAGAGTC

TGAGTACCA

ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG

ACTCATGGT

CTGAGAGGT......TGAGTACCA

TGGTACTCA......ACCTCTCAG

1. chop DNA into smaller pieces

2. add anchors to each end of each piece

3. “flow cell” containing anchor places

4. strand anchors its two ends to two anchor places

5. polymerase completes the strand into double-strand

6. double strand is denaturized into single strands

7. rinse, repeat (last 3 steps) until flow chip is “full”

8. read all strands from their anchor points outwards

Paired-End reads (distance between reads = “insert size”)

4/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTCACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA

TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA

TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA

TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 1: parts of the sequence might not be covered by reads

sequence with “high coverage”

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA

TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 1: parts of the sequence might not be covered by reads

sequence with “high coverage”

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 1: parts of the sequence might not be covered by reads

sequence with “high coverage”

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 1: parts of the sequence might not be covered by reads

sequence with “high coverage”

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

1. produce best pairwise overlaps

2. layout the reads according to the overlaps

3. for each position, compute consensus base

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTC CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

1. produce best pairwise overlaps

2. layout the reads according to the overlaps

3. for each position, compute consensus base

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find path using all overlaps

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find Eulerian path

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCT

CCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find Eulerian path

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCTCCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 2: Shortest Common Superstring is NP-hard

“Overlap-Layout-Consensus” assemblers

Problem: Θ(n2) too slow in practice

DeBruijn-graph based assembly

1. chop all reads into “k-mers”

2. builds overlap graph

(“DeBruijn graph”)

3. find Eulerian path

k = 4

GAAC

AACT

ACTTCTTC

TTCG

TCGC

CGCTCCTT

CTTG

TTGG

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Sequence Assembly: Overview

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC

ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC

Goal: reconstruct sequence

Idea: overlap reads

Problem 3: repeats (common in DNA) make assembly ambiguous

end product is a set of “contiguous regions”

Problem: “contig soup” not very useful

But: we have paired-end information!

5/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

I removes reads in high-coverage area (likely repeats)I orientation step (heuristic) + ordering step (heuristic)I coded in Pearl (!!!)I (observed sparse contig graph)

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

I heuristic contig extensionI “reasonable time”

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

I np+O(1) time (p =#edge-deletions (≥ feedback edge set))I most work done by a heuristic “graph contraction”

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

I Mixed-Integer Quadratic ProgrammingI deals with uncertain data (slack variables)

“intractable even for small # of contigs”

I heuristic workaround:I solve relaxed formulation & use slack values ILP

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

I orientation step: use FPT algo for Odd Cycle TransersalI ordering step: heuristic

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Genome Scaffolding: Previous Work

Goal: order & orient contigs

Idea: use pairing information on reads to “link” contigs together

- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]

- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]

- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]

- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]

- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]

- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]

6/33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGG

GTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

2

5

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

Strategy

1. map reads into contigs

2. pair contigs according to read-pairing (weighted)

3. cover “scaffold graph” with (heavy) alternating paths

each path corresponds to a chromosome

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp ∈ N

Question: Can G be covered by

≤σp alternating paths

σc alternating cycles

of total weight ≥ k?

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N

Question: Can G be covered by

≤σp alternating paths &

≤σc alternating cycles

of total weight ≥ k?

7 /33

Graph-Based Scaffolding

AACGACAC

TCCTTGGG

TTTTTACG

TCGCGG

CTAGGCCATTGATTGCGGGTCCAGGTGCTG

GTTAATGTCCGAGCATAAAACTCTGGTTGGC

GTACTGAACTTGGGTTCCATAGGACCCAGA

AGAGCTTGACAGTAACACATTTAGGAGCACGCG

CGTCGCGG

ACTTGGGGTTTTTAC

CTACTGA

25

10

9

14

3

613

10

14

3

6

Exact ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N

Question: Can G be covered by

σp alternating paths &

σc alternating cycles

of total weight ≥ k?

7 /33

19

10

71

26

22

32

59

71

78

2 60

85

1

38

1

17

19

7

25

8/33

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

8/33

1

63

94

25

9

28

29

55

2

17

1

67

62

1

5

71

26

24

47

83

71

34

71

68

72

1

51

19

79

54

151

67

6

66

34

68

59

2

23

11

12

1

5

1

2

6

1

49

13

71

28

71

5

6

11

33

67

24

88

82

2

42

25

1

39

14

65

5

2

70

24

89

46

1

1

1

80

1

73

64

36

72

33

73

67

59

1

14

1

1

70

3

1

33

53

2

2

1

135

4

2980

74

80

84

11

31

1

1

62

73

38

31

33

77

108

28

13

1 4

1

70

8

5

1

47

3

31

2

1

18

56

49

45

2

2

8/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Construction

Given a directed graph D .

1. make a copy of D

2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Construction

Given a directed graph D .

1. make a copy of D2. duplicate all vertices M

3. “slide” down all arrow tips & ignore directions

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Construction

Given a directed graph D .

1. make a copy of D2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Lemma

D admits a directed Hamiltonian path ⇔ M can be covered with a

single alternating path in G .

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Lemma

D admits a directed Hamiltonian path ⇔ M can be covered with a

single alternating path in G .

“⇒”: replace each v in the Hamiltonian path by vup → vlow .

alternating X covers M X

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Lemma

D admits a directed Hamiltonian path ⇔ M can be covered with a

single alternating path in G .

“⇒”: replace each v in the Hamiltonian path by vup → vlow .

alternating X covers M X

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Lemma

D admits a directed Hamiltonian path ⇔ M can be covered with a

single alternating path in G .

“⇐”: contract each matching edge in the covering alternating path.

hits all vertices exactly once X is valid directed path X

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Lemma

D admits a directed Hamiltonian path ⇔ M can be covered with a

single alternating path in G .

“⇐”: contract each matching edge in the covering alternating path.

hits all vertices exactly once X is valid directed path X

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Theorem

Scaffolding is NP-hard, even restricted to

• bipartite graphs

• (σp, σc) ∈ {(0, 1), (1, 0)} and

• ω : E → {0}.

Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Theorem

Scaffolding is NP-hard, even restricted to

• supergraphs of bipartite graphs

• (σp, σc) ∈ {(0, 1), (1, 0)} and

• ω : E → {0, 1}.

Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Theorem

Scaffolding is NP-hard, even restricted to

• supergraphs of bipartite graphs

• (σp, σc) ∈ {(0, 1), (1, 0)} and

• ω : E → {0, 1}.

Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.9/33

Hardness Warm up: Hamiltonian Path

Recall: Scaffolding

Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &

≤ σc alternating cycles of total weight ≥ k?

Theorem

Exact Scaffolding is NP-hard, even restricted to

• supergraphs of bipartite graphs

• (σp, σc) ∈ {(0, 1), (1, 0)} and

• ω : E → {0, 1}.

Corollary

Exact Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.9/33

Scaffolding in Co-Bipartites

???

?

Wait, what?

Recap: Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

Unweighted!

10 /33

Scaffolding in Co-Bipartites

???

?

Wait, what?

Recap: Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

Unweighted!

10 /33

Scaffolding in Co-Bipartites

???

?

Wait, what?

Recap: Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

Unweighted!

10 /33

Scaffolding in Co-Bipartites

???

?

Wait, what?

Recap: Corollary

Scaffolding with 2 weights is NP-hard in any

sufficiently dense graph class.

Unweighted!

10 /33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

no edges between X & Y need 2 objects (paths/cycles)

otherwise can always cover G with 1 path

TODO

decide if we can cover with 1 cycle

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X

#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X

#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X

#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X

#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y

#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y

#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y

#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Observation

∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]

Observation

#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd

find any non-matching edge between X & Y#matching edges between X & Y is 0

all other cases are X (tedious case analysis)

11 / 33

Scaffolding in Unweighted Co-Bipartites

X Y

forbidden!

Theorem

Scaffolding can be solved in O(n + m) time on co-bipartite graphs

11 / 33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `

M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `

M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g

delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce k

or g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u

u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below”

∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−`

take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Case 2parent g of p is matched “above”

either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`

12 /33

Scaffolding in Unweighted Trees

Observation

no alternating cycles in a tree

Observation

consider a lowest leaf `M is perfect ` matched

parent p of ` has only 1 child

Case 1parent g of p is matched “below”

g is matched to a leaf `′

always take `−p−g−`′

g

p

`

Theorem

Scaffolding can be solved in O(n) time on unweighted trees

12 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[p, x ]v = max. weight collected

below v with p finished paths

“under x ”

up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[p, x ]v = max. weight collected

below v with p finished paths

“under x ”

up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[p, x ]v = max. weight collected

below v with p finished paths

“under x ”

up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .[p,X]v := max

p1, p2, . . . , pc∑pi = p

∑1≤i≤c

maxx∈{X,X}

[pi , x ]vi

BAD IDEA!

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[p, x ]v = max. weight collected

below v with p finished paths

“under x ”

up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .[p,X]v := max

p1, p2, . . . , pc∑pi = p

∑1≤i≤c

maxx∈{X,X}

[pi , x ]vi

BAD IDEA!

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈M

ω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M

{[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

v

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 0

1 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

Scaffolding in Weighted Trees

Dynamic Programming Idea

bottom-up traversal; in each

vertex v , need to remember:

• #paths used below v• v incident with non-matching?

Semantics

[j , p, x ]v = max. weight collected

below v with p finished paths

“under x ” up to vj(abbrev: last child [p, x ]v)

v

v0

v1 v2 v3v4

v5

vj

125

9 6

3

1 1 X 0

1 X 0

1 X 01 X 0

1 X 01 0 X 9

1 1 X 0

2 1 X 9

2 2 X 0

1 X 9

2 X 0

1 1 X 0

2 1 X 3

2 2 X 0

1 X 3

2 X 0

1 2 X 9

1 3 X 0

1 2 X 9

1 3 X 0

2 3 X 12

2 3 X 21

2 4 X 9

2 4 X 12

2 5 X 0

. . . .

Recurrence

Let v1, v2, . . . , vc be the children of v .

[0, 0,X]v :=0

[j , p, x ]v :=maxpj≤p

max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{

[pj − 1,X]vj[pj − 1,X]vj

}+ [j − 1, p − pj , x ]v if vvj ∈M

13 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding

1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Proof

Result S∗ is a valid solution X

Note: taking an edge forbids ≤ 3 OPT edges

mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them

3ω(S∗) ≥ OPT

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Proof

Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges

mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them

3ω(S∗) ≥ OPT

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Proof

Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges

mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them

3ω(S∗) ≥ OPT

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Proof

Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges

mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them

3ω(S∗) ≥ OPT

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Proof

Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges

mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them

3ω(S∗) ≥ OPT

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Theorem

Scaffolding in complete graphs can be

3-approximated in O(|V | log |V |) time.

Remark

For Exact Scaffolding, we have to keep an eye on the number of

components too.

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Theorem

Scaffolding in complete (bipartite) graphs can be

3-approximated in O(|V | log |V |) time.

Remark

For Exact Scaffolding, we have to keep an eye on the number of

components too.

14 /33

3-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Scaffolding1. sort all edges by weight

2. repeatedly take heaviest edge, if

possible

Theorem

Scaffolding in complete (bipartite) graphs can be

3-approximated in O(|V | log |V |) time.

Remark

For Exact Scaffolding, we have to keep an eye on the number of

components too.

14 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding

1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remain

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remainProof

Result S∗ is a valid solution X

ω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remainProof

Result S∗ is a valid solution Xω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remainTheorem

Exact Scaffolding in complete graphs can be

2-approximated in O(|V |2) time.

Remark

For Scaffolding, replace Step 3 by either merging cycles or

removing lightest edge, whatever looses less weight

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remainTheorem

Exact Scaffolding in complete (bipartite) graphs can be

2-approximated in O(|V |2) time.

Remark

For Scaffolding, replace Step 3 by either merging cycles or

removing lightest edge, whatever looses less weight

15 /33

2-Approximation in Dense Graphs

σp = 1, σc = 1?

Approximate Exact Scaffolding1. compute max.-weight perfect

matching S S ∪M is collection of cycles

2. “fix” all but lightest edge per cycle

3. repeatedly flip any lightest non-fix

4-cycle intersecting 2 cycles

until at most σc + σp cycles remain

4. repeatedly remove lightest non-fix

cycle-edge

until at most σc cycles remainTheorem

Exact Scaffolding in complete (bipartite) graphs can be

2-approximated in O(|V |2) time.

Remark

For Scaffolding, replace Step 3 by either merging cycles or

removing lightest edge, whatever looses less weight

15 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{

[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{

[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2

j ii − 2j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{

[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2

j ii − 2

j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2

j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2

j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2

j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(n!!)

16 /33

Exact Algorithms I: Brute Force

j ii − 2j ii − 2

j ii − 2

[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj

[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E

[p, c , i − 1]i = maxj<i−2j even

{[p − 1, c , j ]i−2

[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E

Observation

An ordering of V (G ) certifies YES-instances of Scaffolding.

try all O(n!) certificates

contigs force every other vertex O(√

2n · n/2!)

16 /33

Exact Algorithms II: Dynamic Programming

S − xy

x yw

u

Semantics

[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v

Computation

Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and

[S , p, c , u, y ] := maxw∈G [S−xy ]

u 6=w

[S − xy , p, c , u,w ] + ω(wx)

[S , p, c , x , y ] :=

maxu,w∈G [S−xy ]

{

[S − xy , p − 1, c , u,w ]

[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M

17 /33

Exact Algorithms II: Dynamic Programming

S − xy

x ywu

Semantics

[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v

Computation

Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and

[S , p, c , u, y ] := maxw∈G [S−xy ]

u 6=w

[S − xy , p, c , u,w ] + ω(wx)

[S , p, c , x , y ] :=

maxu,w∈G [S−xy ]

{

[S − xy , p − 1, c , u,w ]

[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M

17 /33

Exact Algorithms II: Dynamic Programming

S − xy

x ywu

Semantics

[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v

Computation

Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and

[S , p, c , u, y ] := maxw∈G [S−xy ]

u 6=w

[S − xy , p, c , u,w ] + ω(wx)

[S , p, c , x , y ] := maxu,w∈G [S−xy ]

{[S − xy , p − 1, c , u,w ]

[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M

17 /33

Exact Algorithms II: Dynamic Programming

S − xy

x ywu

Semantics

[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v

Computation

Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and

[S , p, c , u, y ] := maxw∈G [S−xy ]

u 6=w

[S − xy , p, c , u,w ] + ω(wx)

[S , p, c , x , y ] := maxu,w∈G [S−xy ]

{[S − xy , p − 1, c , u,w ]

[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M

17 /33

Exact Algorithms II: Dynamic Programming

S − xy

x ywu

Semantics

[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v

Theorem

Scaffolding can be solved in O(√

2nn3σpσc) time.

17 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both?

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if unweighted, can we take both? X

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!

remove all non-matching edges from parent u, except uv

u

v

Observation

• v in path & u in cycle 1 path X• v in path & u in path 2 paths X

unless it’s the same path!

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!

remove all non-matching edges from parent u, except uv

Corollary

Scaffolding can be solved in O(n) on quasi-forests if σp = 0.

Scaffolding can be solved in O(n2σp+1) in quasi-forests.

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!

remove all non-matching edges from parent u, except uv

Corollary

Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.

But is it even NP-hard?

18 /33

Sparse Graphs: Quasi-Forest

Recall

• Scaffolding is hard in any sufficiently

dense graph class

• Scaffolding is easy in trees

A Shot at Sparsity

G is Quasi-forest ⇔ G −M is forest

Observation

Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!

remove all non-matching edges from parent u, except uv

Corollary

Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.

But is it even NP-hard?18 /33

Sparse Graphs: Quasi-Forest

Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N

Question: is there a satisfying assignment for ϕ of weight ≤ k?

Remark

Independent Set is special case of Weighted 2-SAT

19 /33

Sparse Graphs: Quasi-Forest

Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N

Question: is there a satisfying assignment for ϕ of weight ≤ k?

Remark

Independent Set is special case of Weighted 2-SAT

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Sparse Graphs: Quasi-Forest

Observation

∃ weight-k satisfying assignment

∃ weight-k cover with ≤ nalternating paths

Theorem

Scaffolding is NP-hard even if G −Mis a collection of paths with weights

0/1

Corollary

no 2o(n+m)-time algorithm (ETH)

no no(k)-time algorithm (FPT 6=W[t])

19 /33

Other Forms of Tree-Likeness

Tree Decompositions

tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.

1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree

treewidth tw = size of largest bag - 1

Hope

Practical instances of Scaffolding have low treewidth (they

originate from linear structure)

Nice Decompositions

Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj

(each edge introduced exactly once)

Join: i has 2 children j and ` and Xi = Xj = X`

20/33

Other Forms of Tree-Likeness

Tree Decompositions

tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.

1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree

treewidth tw = size of largest bag - 1

Hope

Practical instances of Scaffolding have low treewidth (they

originate from linear structure)

Nice Decompositions

Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj

(each edge introduced exactly once)

Join: i has 2 children j and ` and Xi = Xj = X`

20/33

Other Forms of Tree-Likeness

Tree Decompositions

tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.

1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree

treewidth tw = size of largest bag - 1

Hope

Practical instances of Scaffolding have low treewidth (they

originate from linear structure)

Nice Decompositions

Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj

(each edge introduced exactly once)

Join: i has 2 children j and ` and Xi = Xj = X`

20/33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X

#matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X

#matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X

#matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X

#matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X

#matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Semantics

[d ,P, p, c]i = max. weight of any S with M∩ E (Gi ) ⊆ S ⊆ E (Gi ) and

1. each vertex v ∈ Xi has degree d(v) in Gi [S ],2. for each uv ∈ P , Gi [S ] contains an alternating path. . .

u = v : . . . from u avoiding d−1(1)u 6= v : . . . from u to v

3. Gi [S ] contains p alt. paths & c alt. cycles avoiding d−1(1)

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Leaf Bag

[∅,∅, 0, 0]i = 0

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Introduce v (single child j)[d ,P, p, c]i = [d |v→⊥,P, p, c]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Introduce uv (single child j)Case 1: d(u) = d(v) = 2

u v

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Introduce uv (single child j)Case 1: d(u) = d(v) = 2

u v [d ,P, p, c]i = [d |u→1,v→1,P + uv , p, c − 1]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Introduce uv (single child j)Case 1: d(u) = d(v) = 2

u v u v u v u v

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Forget v (single child j)

u

[d ,P, p, c]i = max

[d |v→1,P + vv , p − 1, c]j

maxuu∈P

[d |v→1, (P − uu) + uv , p, c]j

maxx∈{0,2}

[d |v→x ,P, p, c]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Forget v (single child j)

u

v

[d ,P, p, c]i = max

[d |v→1,P + vv , p − 1, c]j

maxuu∈P

[d |v→1, (P − uu) + uv , p, c]j

maxx∈{0,2}

[d |v→x ,P, p, c]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Forget v (single child j)u v

[d ,P, p, c]i = max

[d |v→1,P + vv , p − 1, c]j

maxuu∈P

[d |v→1, (P − uu) + uv , p, c]j

maxx∈{0,2}

[d |v→x ,P, p, c]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Forget v (single child j)

u

v

[d ,P, p, c]i = max

[d |v→1,P + vv , p − 1, c]j

maxuu∈P

[d |v→1, (P − uu) + uv , p, c]j

maxx∈{0,2}

[d |v→x ,P, p, c]j

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Join Bag (children j & `)

[d ,P, p, c]i = maxdj ,Pj ,pj ,cj

maxP`

Pj t P` = P

[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`

O(3tw · twtw /2 ·σp · σc) table entries

and O((tw +2)tw · σp · σc · n) time

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Join Bag (children j & `)

[d ,P, p, c]i = maxdj ,Pj ,pj ,cj

maxP`

Pj t P` = P

[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`

O(3tw · twtw /2 ·σp · σc) table entries

and O((tw +2)tw · σp · σc · n) time

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Join Bag (children j & `)

[d ,P, p, c]i = maxdj ,Pj ,pj ,cj

maxP`

Pj t P` = P

[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`

O(2tw · twtw /2 ·σp · σc) table entries

and O((tw +2)tw · σp · σc · n) time

21 /33

How do Solutions Interact with Bags?

Xi

Gi [S ]

Ingredients

• degree-function d : X → {0, 1, 2}• “pairing” ⊆

(X2

)∪ X #matchings possibilities O(|X ||X |/2)

• #paths and #cycles completed “below the bag”

Join Bag (children j & `)

[d ,P, p, c]i = maxdj ,Pj ,pj ,cj

maxP`

Pj t P` = P

[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`

O(2tw · twtw /2 ·σp · σc) table entries and O((tw +2)tw · σp · σc · n) time

21 /33

Too slow

in prac-

tice!

22/33

Integer Linear Program Formulation

s

tc

t

p- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s

tc

t

p

- chromosomes = disjoint s-t-paths

- bin. variables yuv = 1⇔ u → v usedx{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s

tc

t

p

- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ

- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s

tc

t

p

- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s

tc

t

p

- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t

∑v yvu =

∑v yuv

- path bounds:∑

v yvt ≤ σ- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

23/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

24/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

24/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

24/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

24/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

Jump Mechanicsfor each non-contig uv ,

1. introduce a variable zuv

2. construct “jump network” between u and v that fits in the gap

3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps

25/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu+zuv + zvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

Jump Mechanicsfor each non-contig uv ,

1. introduce a variable zuv

2. construct “jump network” between u and v that fits in the gap

3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps

25/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

26/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

26/33

Extension: Contig Jumps

Mean read length: 70bp

Mean insert size: 472bp

140

262

149

119

336

116

397

72

1

75

14

5

100

1

21

16

10

37

45

1

22

1

7

61

78

69

1

15

51

40

65

2

61

26/33

ILP Extension: Multiplicities

GGTGCGAGAGAGGTCATGGATTGCAACGA

GGTGCGAGAGGCCACTCCAATTGCAACGA

×2

×1

×1

×2

27/33

ILP Extension: Multiplicities

GGTGCGAGAGAGGTCATGGATTGCAACGA

GGTGCGAGAGGCCACTCCAATTGCAACGA

×2

×1

×1

×2

27/33

ILP Extension: Multiplicities

GGTGCGAGAGAGGTCATGGATTGCAACGA

GGTGCGAGAGGCCACTCCAATTGCAACGA

×2

×1

×1

×2

27/33

ILP Extension: Multiplicities

70

5

47

67

70

135

49

3931

5

49

56

80

80

34

6

11

2489

19

25

83

73

82

71

9

14

63

72

64

28

29

33

66

33

68

94

62

62

11

47

18

67

8

6

74

59

12

34

46

88

42

79

77

13

17

26

24

70

65

45

33

73

67

23

71

108

67

36

13

55

5

38

72

51

71

5

28

73

31

71

59

80

54

84

28

29

71

151

24

5

6

33

68

53

25

×2

28/33

ILP Extension: Multiplicities

70

5

47

67

70

135

49

3931

5

49

56

80

80

34

6

11

2489

19

25

83

73

82

71

9

14

63

72

64

28

29

33

66

33

68

94

62

62

11

47

18

67

8

6

74

59

12

34

46

88

42

79

77

13

17

26

24

70

65

45

33

73

67

23

71

108

67

36

13

55

5

38

72

51

71

5

28

73

31

71

59

80

54

84

28

29

71

151

24

5

6

33

68

53

25

×2

28/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used

x{u,v} = yuv + yvu+zuv + zvu

- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈C(yuv−yutc ) < |C |

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics

!!!

Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback

29/33

Integer Linear Program Formulation

s tc

tp- chromosomes = disjoint s-{tp, tc}-paths- int. variables yuv = `⇔ u → v used ` times

x{u,v} = yuv + yvu+zuv + zvu

- force contigs: ∀uv∈Mxuv≥1- path preservation: ∀u 6=s,tp ,tc

∑v yvu =

∑v yuv

- path & cycle bounds:∑

v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):

∀ cycle C :∑

uv∈Cyuv ≤ |C | ·mmax ·

∑u∈C ,v /∈C

yuv

- objective: max∑e∈E

x{u,v} · ω(e)

- cycle consistency: ∀uyutc ≤ ysu

- jump mechanics !!!Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback

29/33

Linearization of Solutions

Problem

no unique chromosome-configuration explaining solution

Proof

30/33

Linearization of Solutions

Problem

no unique chromosome-configuration explaining solution

Proof

30/33

Linearization of Solutions

Problem

no unique chromosome-configuration explaining solution

Proof

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

3 2 2 1 3 3 5 5 71

2 1 2 1

Proof

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇒”: contraposition; let p = ambigous path

2 2 21

11

(G ,M,m) not uniquely linearizable

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇒”: contraposition; let p = ambigous path

2 2 21

11

(G ,M,m) not uniquely linearizable

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇒”: contraposition; let p = ambigous path

2 2 21

11

(G ,M,m) not uniquely linearizable

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇒”: contraposition; let p = ambigous path

2 2 21

11

(G ,M,m) not uniquely linearizable

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

5

2

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

3 3 3

5

2

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

3

5

2

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

3

5

2

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

3

3

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proof

“⇐”: let (G ,M,m) be free of ambigous paths

Reduction (does not decrease number of linearizations):

3

3

22

result is collection of alternating paths & cycles

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily

missassembly

2. isolate each ambiguity

information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily

missassembly

2. isolate each ambiguity

information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity

information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity

information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible

computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible

computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

Multiplicitiesone &

#non-matching adj. to contig

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

Multiplicitiesone &

#non-matching adj. to contig

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

Multiplicitiesone &

#non-matching adj. to contig

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

Multiplicitiesone &

#non-matching adj. to contig

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Linearization of Solutions

Theorem

(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”

(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)

Proposals

1. decide arbitrarily missassembly

2. isolate each ambiguity information loss

3. cut as few ends as possible computationally hard

4. cut as few multiplicities as possible computationally hard

30/33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

What we saw

- 3-step sequencing technique:

1. produce paired-end reads

2. assemble reads to contigs

3. scaffold contigs to chromosomes using read-pairings

- computationally hard problem for dense graphs with weights 0/1

- no constant-factor approx or subexponential-time algorithm

for linear quasi trees with weights 0/1

- O(n2) time on unweighted cliques/co-bipartite/split

- O(n · σp · σc) time for constant treewidth

- 2-approximable in cliques/complete bipartite in O(n3) time

- O(√

2n

poly(n)) time exact algorithm

- ILP formulation with contig jumps & multiplicities

- Linearization problem raised by multiplicities in solution

31 /33

Conclusion

Outlook

- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone

correction using small reads?

- generally: multi-library scaffolding

- other sources for contig-connections (phylogenetic

information?)

- better parameters for Scaffolding and Scaffold Linearization

analyze practical instances

- approximation/heuristics for Scaffold Linearization

32/33

Conclusion

Outlook

- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone

correction using small reads?

- generally: multi-library scaffolding

- other sources for contig-connections (phylogenetic

information?)

- better parameters for Scaffolding and Scaffold Linearization

analyze practical instances

- approximation/heuristics for Scaffold Linearization

32/33

Conclusion

Outlook

- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone

correction using small reads?

- generally: multi-library scaffolding

- other sources for contig-connections (phylogenetic

information?)

- better parameters for Scaffolding and Scaffold Linearization

analyze practical instances

- approximation/heuristics for Scaffold Linearization

32/33

Conclusion

Outlook

- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone

correction using small reads?

- generally: multi-library scaffolding

- other sources for contig-connections (phylogenetic

information?)

- better parameters for Scaffolding and Scaffold Linearization

analyze practical instances

- approximation/heuristics for Scaffold Linearization

32/33

Conclusion

Outlook

- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone

correction using small reads?

- generally: multi-library scaffolding

- other sources for contig-connections (phylogenetic

information?)

- better parameters for Scaffolding and Scaffold Linearization

analyze practical instances

- approximation/heuristics for Scaffold Linearization

32/33

Fin!

33/33

Recommended