22
RNA Sequence Assembly WEI Xueliang

RNA Sequence Assembly

  • Upload
    marina

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

RNA Sequence Assembly. WEI Xueliang. Overview. Sequence Assembly Current Method My Method RNA Assembly To Do. Sequence Assembly. Goal : get the DNA/RNA sequence. Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. Define: Read = Tag = Fragment. - PowerPoint PPT Presentation

Citation preview

Page 1: RNA Sequence Assembly

RNA Sequence Assembly

WEI Xueliang

Page 2: RNA Sequence Assembly

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 3: RNA Sequence Assembly

Sequence Assembly

• Goal : get the DNA/RNA sequence.

• Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases.

• Define: Read = Tag = Fragment

Page 4: RNA Sequence Assembly

De novo sequence assembly

Page 5: RNA Sequence Assembly

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 6: RNA Sequence Assembly

De novo sequence assembly

Calculating the overlap need huge amount of time.

Page 7: RNA Sequence Assembly

DE BRUIJN GRAPH

K-Mer : Length k substring of the Tag.

Each nodes only have 4 out degrees at most.

Hashing the node. “CTG”=>(132)4=(30)10

“CTG”=>”TGG” (132=)4 shift left. (1320)4 module (1000)4

(320)4 + (3)4 ‘G’ (323)4

Page 8: RNA Sequence Assembly

DE BRUIJN GRAPH (CONT’)

If there are repeats, like ”GACT”

3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence.

Larger K, better result.

Page 9: RNA Sequence Assembly

De novo sequence assembly

Suppose use K = Length of Tag. (20-Mer) TGACGTAGCTATGTATTTTG GACGTAGCTATGTATTTTGT (no 20-Mer)

Coverage is not enough to support large K.

Page 10: RNA Sequence Assembly

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 11: RNA Sequence Assembly

MY METHOD.

Tag length=6, K=3 When we have

AAGACT? Try all the way:

AAGACTC AAGACTT AAGACTG

Check Tag : AGACTC

The correct way should be AAGACTC

Page 12: RNA Sequence Assembly

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Page 13: RNA Sequence Assembly

RNA ASSEMBLY

Page 14: RNA Sequence Assembly

ALTERNATIVE SPLICING

The graph

All cDNA sequences.

Page 15: RNA Sequence Assembly

RNA ASSEMBLY’S PROBLEM

Merge? Index the sequence.

Page 16: RNA Sequence Assembly

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

Page 17: RNA Sequence Assembly

RNA ASSEMBLY’S PROBLEM(CONT’)

Index Tags

Page 18: RNA Sequence Assembly

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

Speed?

Page 19: RNA Sequence Assembly

SINGLE TAG’S LIMITATION

|Yellow Sequence| >= Length of Tag Length of Tag 25-100bp. Single Tag is not enough!

Page 20: RNA Sequence Assembly

DATASET - PAIRED END TAGS

Fragment length usually > 1k Some RNA sequence is shorter than 1k.

Page 21: RNA Sequence Assembly

TO DO

Handle large data-sets. (10G) Improve accuracy. Using PETs data.

Page 22: RNA Sequence Assembly

Thanks!!