23
RNA Assembly Using extending method. Wei Xueliang 2010-04-07

RNA Assembly Using extending method

  • Upload
    magnar

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

RNA Assembly Using extending method. Wei Xueliang 2010-04-07. Overview. Why abandon deBruijn . Why abandon Extended deBruijn . Introduction to current method. Handle the old problem. The new problem. Tod o. Why abandon deBruijn . De Bruijn Graph’s ( dis )advantage: Very Fast. - PowerPoint PPT Presentation

Citation preview

Page 1: RNA Assembly  Using extending method

RNA Assembly Using extending method.

Wei Xueliang2010-04-07

Page 2: RNA Assembly  Using extending method

Overview

• Why abandon deBruijn.• Why abandon Extended deBruijn.• Introduction to current method.• Handle the old problem.• The new problem.• Todo

Page 3: RNA Assembly  Using extending method

Why abandon deBruijn.• De Bruijn Graph’s (dis)advantage: – Very Fast. – Coverage distribution and K-Value affect a

lot

• Key : the coverage is not uniform distributed in the RNA assembly.– No best K value.

Page 4: RNA Assembly  Using extending method

Why abandon deBruijn.

• The length of the red part is 27.

Page 5: RNA Assembly  Using extending method

deBruijn Graph of K = 28

Page 6: RNA Assembly  Using extending method

deBruijn Graph of K = 29

Page 7: RNA Assembly  Using extending method

deBruijn Graph of K = 30

Page 8: RNA Assembly  Using extending method

Why abandon deBruijn.• Key : The coverage is not uniform distributed

in the RNA assembly.– No best K value.

• Can we using different K to run the program many times?

• This is not De Novo Assembly’s job. – Time. – Provide high accurate contigs with-in limited time.– Scaffolding programs.

Page 9: RNA Assembly  Using extending method

Why abandon Extended deBruijn.• My Extended de Bruijn method: – Using two or more K value at the same time.

Page 10: RNA Assembly  Using extending method

Why abandon Extended deBruijn.

• The change rate of coverage is above my expectation. Need many K.

• The convert between different K are difficult. • Memory problem for big K. When K > 32, each

K-index need > 50G (with Data-Sets: 10G)

• Throw the K away.

Page 11: RNA Assembly  Using extending method

Introduction to the new method

• From Pramila’s genome assembly method. • Start from any Tag and do a correction.• If successfully corrected, continue.

Page 12: RNA Assembly  Using extending method

Introduction to the new method

• Find all the tag which have at least 24 bps overlaps. (Magic number)

• Using these overlapping tags to extend Base and continue add more tags.

Page 13: RNA Assembly  Using extending method

Introduction to the new method

• How to find the overlapping tags fast and with mis-match?

• Index and Union:{Tag3}, {Tag2, Tag3}, {Tag3, Tag4}Union =>{Tag1, Tag2, Tag3, Tag4}

Page 14: RNA Assembly  Using extending method

Introduction to the new method

• How to find the next overlapping tags fast and with mis-match?

• V1 <= U3• V2 <= (U1 << 1) + 0• V3 <= (U2 << 1) + 0

Page 15: RNA Assembly  Using extending method

Handle the old problem.

• When the length of overlapping part < 24?

Page 16: RNA Assembly  Using extending method

Handle the old problem.

• Check the tags one by one by descending order of the length of overlap.

Page 17: RNA Assembly  Using extending method

Handle the old problem.

  A GOverlap Count % Count %

60 1 6.67% 1 4.76%52 3 20.00% 1 4.76%44 6 40.00% 2 9.52%36 10 66.67% 10 47.62%30 11 73.33% 16 76.19%24 15 100.00% 21 100.00%

Page 18: RNA Assembly  Using extending method

Handle the old problem.

  A G(High Exp)Overlap Count % Count %

56 1 6.67% 5 2.50%50 3 20.00% 10 5.00%44 6 40.00% 20 10.00%36 10 66.67% 120 60.00%30 11 73.33% 150 75.00%24 15 100.00% 200 100.00%

Page 19: RNA Assembly  Using extending method

Handle the old problem.

• Degree of approximation.

Page 20: RNA Assembly  Using extending method

Handle the old problem.

• Less tips.

• Do not have bubbles. – Because we doing

overlap with mis-match.

– Use whole tags

Page 21: RNA Assembly  Using extending method

The new problem.

• Speed.

• The tail of the tag often have more errors.– Reverse Extending Problem.

Page 22: RNA Assembly  Using extending method

Todo

• Handle Reverse Extending Problem.• Speed

• Finish the comparision between deBruijn method(velvet) and my method.

• Paired End Tag.

Page 23: RNA Assembly  Using extending method

• Thank you very much for attention.