28
The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory Roche Life Sciences Workshop, Sept 2008 www.nmpdr.org www.theseed.org

The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Embed Size (px)

DESCRIPTION

Roche Life Sciences Workshop, Sept 2008. The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing. Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. - PowerPoint PPT Presentation

Citation preview

The Metagenomics RAST server: Annotation, Analysis, and

ComparisonsPerfect for Pyrosequencing

Rob Edwards

Department of Computer Science, San Diego State University

Mathematics and Computer Sciences Division, Argonne National Laboratory

Roche Life Sciences Workshop, Sept 2008

www.nmpdr.org www.theseed.org

Outline

• Metagenomics

• Tools for analyzing sequences

• Computational Challenges

• Does it work?

www.nmpdr.org www.theseed.org

Firstbacterial genome

100bacterial genomes

1,000bacterial genomes

Num

ber

of

know

n s

equence

s

Year

How much has been sequenced?

Environmentalsequencing

www.nmpdr.org www.theseed.org

Everybody inSan Diego

Everybody inUSA

AllculturedBacteria

100people

How much will be sequenced?

One genome fromevery species

Most majormicrobial environments

www.nmpdr.org www.theseed.org

Metagenomics(Just sequence it)

200 liters water 5-500 g fresh fecal matter50 g soil

Sequence

Epifluorescent Microscopy

Concentrate and purify bacteria, viruses, etc

Extract nucleic acids

Publish papers

Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments

Metazoanassociated Corals Fish Human blood Human stool

ModernMetagenomics

Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air

Freshwater Aquifer Glacial lake

ExtremeHot springs (84oC; 78oC)Soda lake (pH 13)Solar saltern (>35% salt)

The Problem

How do you generate consistent and accurate annotations for metagenomes?

www.nmpdr.org www.theseed.org

The SEED Family

www.nmpdr.org www.theseed.org

Annotations using subsystemsFIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex

Extended subsystems into FIGfams – protein families that perform the same functions.

www.nmpdr.org www.theseed.org

Annotation of Complete Genomes

• Automated user originated processing

• Takes 1-7 hours depending on size and complexity of the genome

• ~2,000 external submissions, including hundreds of genomes not yet publicly released.

• Reannotation of >500 genomes complete

• 1,000 users, 200 organizations, 25 countries.

http://rast.nmpdr.org/

www.nmpdr.org www.theseed.org

The metagenomics RAST server

www.nmpdr.org www.theseed.org

Automated Processing

www.nmpdr.org www.theseed.org

Summary View

Metagenomics ToolsAnnotation & Subsystems

www.nmpdr.org www.theseed.org

Metagenomics ToolsAnnotation & KEGG maps

Metagenomics ToolsRecruitment Plots

Metagenomics ToolsPhylogenetic Reconstruction

Metagenomics ToolsComparative Tools

Hours

of

Com

pute

Tim

e

Input size (MB)

Computational Requirements~19 hours of compute per input megabyte

www.nmpdr.org www.theseed.org

How much so far

986 metagenomes

79,417,238 sequences

17,306,834,870 bp (17 Gbp)

Average: ~15-20 M bp per genome

Compute time (on a single CPU):

328,814 hours = 13,700 days = 38 years

~300 GS20~300 FLX~300 Sanger

www.nmpdr.org www.theseed.org

Lots of sequencesall pyrosequencing

www.nmpdr.org www.theseed.org

Metagenomics ToolsFunctional Heat Maps

Sulfur

CDA 60.2%

CD

A 2

1.7

% Respiration

Capsule Motility

Membranetransport

Stress

Signaling

Phosphorus

RNA

MineSaltern

MarineMicrobialites

CoralFish

AnimalsFreshwater

From Sequences To Environments

Dinsdale et al, Nature 2008

Workshops

Free workshops on NMPDR, RAST, mg-RAST, SEED

Contact Leslie McNeil [email protected]

or visithttp://www.nmpdr.org/

www.nmpdr.org www.theseed.org

Acknowledgements

Environmental GenomicsForest Rohwer All the labs that

provided sequence

Metagenomics Annotation ServerRick StevensFolker MeyerBob Olson

Daniel Paarman Mark D'Souza

Jared Wilkening Andreas Wilke

Statistics & Web servicesLiz DinsdaleRobert SchmiederDana HallBeltran Rodriguez-BritoBahador Nosrat

FIGRoss OverbeekVeronika VonsteinAnnotators

www.nmpdr.org www.theseed.org

ArtistPaula Morris

Argonne SequencingMarc DomanusAreej Ammar

Artists impression : not all machines are known to explode

Terragenomics

Differences between soil samples