33
Next-Generation Informatics David Dooling <[email protected]> AGBT Bioinformatics 2009-02-05

Next-Generation Informatics

Embed Size (px)

DESCRIPTION

Talk from the Bioinformatics session of the Advances in Genome Biology and Technology 2009 meeting.

Citation preview

Page 1: Next-Generation Informatics

Next-Generation InformaticsDavid Dooling <[email protected]>

AGBT Bioinformatics2009-02-05

Page 2: Next-Generation Informatics

[email protected]

Framing the problem

Page 3: Next-Generation Informatics

[email protected]

Framing the problem

!"""# !""$# !""!# !""%# !""&# !""'# !""(# !"")# !""*# !""+# !"$"#

,--./01#234#

567#

89-.3:/#;<=>#

8/?@/AB/#

6/.1-AA/C#

Page 4: Next-Generation Informatics

[email protected]

Different perspectives

Page 5: Next-Generation Informatics

[email protected]

LIMS

Page 6: Next-Generation Informatics

[email protected]

LIMS - Illumina/Solexa

Page 7: Next-Generation Informatics

[email protected]

LIMS - Roche/454

Page 8: Next-Generation Informatics

[email protected]

Analysis

Page 9: Next-Generation Informatics

[email protected]

Analysis - cDNASolexa cDNA reads

SNPsIndels

Gene expression

(to exquisite sensitivity)

[Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]

Variant discovery/

ASE

Splice isotypes

NovelGenes

Readdepth

Maq/Tophat

MaqReadsmap to

“non-genic”regions

VelvetGenScan

Readsmap to

novel SJs or introns

Page 10: Next-Generation Informatics

[email protected]

Project Lead

Page 11: Next-Generation Informatics

[email protected]

Changing pipelines

Page 12: Next-Generation Informatics

[email protected]

Changing pipelines - LIMSPrep

Tech-SpecificPrep /Detection

Primary Analysis

PCR

cDNAs

3730 Phred

Submission

HybridSelection

Bisulfite

SamplePooling

JumpingLibraries

WGS

Solexa

454

SOLiD

ChurchPolony(?)

Helicos(?)

(Technology-specific)

Flow-space

Color-space

.

.

.

NCBI Trace

ProjectArchives(e.g., DCC)

NCBI SRA

NCBI MedicalArchive

Courtesy of Toby Bloom

Page 13: Next-Generation Informatics

[email protected]

Changing pipelines - AnalysisBLASTBLATPASHssaha

runMappingELAND

mapreadsArachne

MAQexonerateSHRiMPSPLIGNMosaik

SLIM SearchSXOligoSearch

SOAP2NovoCraft

BowtieTophat

PhrapArachnePCAP

PhusionEuler

ATLASNewblerVelvetForge

SSAKEVCAKE

Euler-USRSHARCGS

CABOG

Alig

ners

Assem

blers

Page 14: Next-Generation Informatics

[email protected]

Framing the solution

Page 15: Next-Generation Informatics

[email protected]

Past is prologue

Page 16: Next-Generation Informatics

[email protected]

Convert this…

Page 17: Next-Generation Informatics

[email protected]

… into this

Page 18: Next-Generation Informatics

[email protected]

Convert this…

Page 19: Next-Generation Informatics

[email protected]

… into this

Page 20: Next-Generation Informatics

[email protected]

UR• Object-relational mapping (ORM) layer

– Interact with persistence layer (e.g., relational database) through objects and methods

– Automatic, dynamic class definitions– Moose1-like object definition syntax

• Object context– In-memory transactions (even across databases)– Caching/deferred loading

• Dynamic command-line interface• Integrated documentation system

1 - http://www.iinteractive.com/moose/

Page 21: Next-Generation Informatics

[email protected]

Genome Workflow

Page 22: Next-Generation Informatics

[email protected]

Genome Model

Page 23: Next-Generation Informatics

[email protected]

Past is prologue…

Page 24: Next-Generation Informatics

[email protected]

… but with a wrinkle• Lab personnel accept

the software you give them

• Analysts are more than happy to develop their own

• We need to make it easy for analysts to build tools within the system

Page 25: Next-Generation Informatics

[email protected]

Easy Perl API

Page 26: Next-Generation Informatics

[email protected]

Pairing

Analyst

Programmer

Page 27: Next-Generation Informatics

[email protected]

Variant Detection Pipeline

Page 28: Next-Generation Informatics

[email protected]

cDNA Analysis

Page 29: Next-Generation Informatics

[email protected]

16S Pipeline

Page 30: Next-Generation Informatics

[email protected]

Assembly and Annotation Pipeline

Page 31: Next-Generation Informatics

[email protected]

Challenges• There is still much more work to do• Sequencing is demolishing Moore’s law• The cult of traces• The richness of data• Visualization

Page 32: Next-Generation Informatics

[email protected]

CIRCOS

Page 33: Next-Generation Informatics

[email protected]

ThanksWeb Site http://genome.wustl.edu/Blog http://www.politigenomics.com/

LIMS Paper http://www.biomedcentral.com/1471-2105/8/362UR Presentation http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/