RNA Sequence Assembly

Preview:

DESCRIPTION

RNA Sequence Assembly. WEI Xueliang. Overview. Sequence Assembly Current Method My Method RNA Assembly To Do. Sequence Assembly. Goal : get the DNA/RNA sequence. Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. Define: Read = Tag = Fragment. - PowerPoint PPT Presentation

Citation preview

RNA Sequence Assembly

WEI Xueliang

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

Sequence Assembly

• Goal : get the DNA/RNA sequence.

• Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases.

• Define: Read = Tag = Fragment

De novo sequence assembly

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

De novo sequence assembly

Calculating the overlap need huge amount of time.

DE BRUIJN GRAPH

K-Mer : Length k substring of the Tag.

Each nodes only have 4 out degrees at most.

Hashing the node. “CTG”=>(132)4=(30)10

“CTG”=>”TGG” (132=)4 shift left. (1320)4 module (1000)4

(320)4 + (3)4 ‘G’ (323)4

DE BRUIJN GRAPH (CONT’)

If there are repeats, like ”GACT”

3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence.

Larger K, better result.

De novo sequence assembly

Suppose use K = Length of Tag. (20-Mer) TGACGTAGCTATGTATTTTG GACGTAGCTATGTATTTTGT (no 20-Mer)

Coverage is not enough to support large K.

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

MY METHOD.

Tag length=6, K=3 When we have

AAGACT? Try all the way:

AAGACTC AAGACTT AAGACTG

Check Tag : AGACTC

The correct way should be AAGACTC

Overview

• Sequence Assembly• Current Method• My Method• RNA Assembly• To Do

RNA ASSEMBLY

ALTERNATIVE SPLICING

The graph

All cDNA sequences.

RNA ASSEMBLY’S PROBLEM

Merge? Index the sequence.

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

RNA ASSEMBLY’S PROBLEM(CONT’)

Index Tags

RNA ASSEMBLY’S PROBLEM(CONT’)

Solution?

Speed?

SINGLE TAG’S LIMITATION

|Yellow Sequence| >= Length of Tag Length of Tag 25-100bp. Single Tag is not enough!

DATASET - PAIRED END TAGS

Fragment length usually > 1k Some RNA sequence is shorter than 1k.

TO DO

Handle large data-sets. (10G) Improve accuracy. Using PETs data.

Thanks!!

Recommended