15
Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean

Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Embed Size (px)

Citation preview

Page 1: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Kayo Arima

California Institute for Telecommunications and

Information Technology (Calit2)-University of

California, San Diego Division

Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean

Page 2: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Looking Back Nearly 4 Billion YearsIn the Evolution of Microbe Genomics

Science Falkowski and Vargas 304 (5667): 58

Eukaryote has the nuclei .

Prokaryotes has genes but

no nuclear membrane.

Page 3: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Evolution is the Principle of Biological Systems:Most of Evolutionary Time Was in the Microbial World

You Are

Here

Source: Carl Woese, et al

Much of Genome Work Has

Occurred in Animals

Page 4: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Two completely different approach to get microbial genomic information

Microbial whole genomics Metagenomics

Source: Karin RemingtonJ. Craig Venter Institute

Environmental sample

DNA extraction

Enz. digestion

Shotgun sequencing

Scaffold assembly

Environmental sample

Culture (grow) in lab

Isolate the colony

Culture the isolated colony

DNA extraction

Enz. digestion

Shotgun sequencing

Gene assembly

              

  

Page 5: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Down Side of Metagenomics

• Often fragmentary

• Often highly

divergent

• Rarely any known

activity

• No chromosomal

placement

• No organism of origin

• Ab initio ORF

predictions

• Huge data

Page 6: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale…

GenBank Protein Data Bank

www.rcsb.org/pdb/holdings.htmlwww.ncbi.nlm.nih.gov/Genbank

100 Billion Bases!

Total Data < 1TB

35,000 Structures

Page 7: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Full Genome Sequencing is Exploding:Most Sequenced Genomes are Bacterial

Archaeal

Bacterial

Eukaryal

Total 1665

Ongoing Genomes

www.genomesonline.org

First Genome 1995 6 Genomes/ Year 2000

Total 422

Completed Genomes

90Metagenomes

Page 8: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Marine Metagenomics

• Microbes account for more than 90% of ocean

biomass, mediate all biochemical cycles in the

oceans and are responsible for 98% of primary

production in the sea.

• Metagenomics is a breakthrough sequencing

approach to examine the open-space microbial

species without the need for isolation and lab

cultivation of individual species.

Page 9: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

PI Larry Smarr

Paul Gilna Ex. Dir.

PI Larry Smarr

Page 10: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Marine Genome Sequencing ProjectMeasuring the Genetic Diversity of Ocean Microbes

Sorcerer II Data from this area has already reach to 10% of GenBank.

The Entire Data Will Double Number of Proteins in Embank!

Page 11: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Sample Metadata from GOS

• Site Metadata

– Location (lat/long, water depth)

– Site characterization (finite list of types plus “other”)

– Site description (free text)

– Country

• Sampling Metadata– Sample collection date/time

– Sampling depth

– Conditions at time of sampling (e.g., stormy, surface temperature)

– Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc)

– “author”

• Experimental Parameters– Filter size

– Insert size

Page 12: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Flat FileServerFarm

W E

B P

OR

TA

L

TraditionalUser

Response

Request

DedicatedCompute Farm(1000 CPUs)

TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)

(10000s of CPUs)

Data-BaseFarm

10 GigE Fabric

Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server

Source: Phil Papadopoulos, SDSC, Calit2+

We

b S

erv

ice

s

Sargasso Sea Data

Sorcerer II Expedition (GOS)

JGI Community Sequencing Project

Moore Marine Microbial Project

NASA Goddard Satellite Data

Community Microbial Metagenomics Data

Web(other service)

Local Cluster

LocalEnvironment

DirectAccess LambdaCnxns

Page 13: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;

Marine Metagenomics

Who is there?

Drug discovery

Environmental surveyMicrobial genetic survey

Microbial genomic survey

Symbiosis

Organism discovery

Marine conservation

Evolution study

Bioenergy discovery

Endosymbiosis

Biogeochemistry mapping

Metabolic pathway discovery

Page 14: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;
Page 15: Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;