48
Attribution-NonCommercial-ShareAlike CC BY-NC-SA The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results Microbial Genomics and Bioinformatics Research Group Renzo Kottmann [email protected] Hinxton, 2015-11-18

The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Embed Size (px)

Citation preview

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The Ocean Sampling Day's Metagenome Analysis:

Standards, Pipelines and First Results

Microbial Genomics and Bioinformatics Research GroupRenzo Kottmann

[email protected], 2015-11-18

MAX PLANCK INSTITUTEFOR MARINE MICROBIOLOGY

Investigation of the diversity,structure and distribution ofmicrobial populations through theapproaches based onnucleic acid analyses

Junior Group of Molecular EcologyDr. Rudolf Amann

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Common Bioinformatics View on Metagenomics:

Give me a big metagenomic sequence data,I assign gene functions

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Data Centric View on Metagenomics

Sketch from Martin Fowler: http://martinfowler.com/articles/bigData/

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OSD Sampling

Scientists

Masame/GustaME

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

A global mega-sequencing campaign

June Solstice 2014/15

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MyOSD Citizen Science Project

www.my-osd.org

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Key facts > 200 sites participated in OSD

• 4 OSD Pilot Events (Jun/Dec 2012 + Jun/Dec 2013) • 2 OSD Main Events (June 2014 and June 2015)

Participating OSD sites ranged from: • subtropical waters in Hawaii to extreme

environments such as the Fram Strait in the Arctic Ocean

Each main year:• ~150 Metagenomes• ~200 Amplicon (16S/18S) samples

2 Citizen Science Campaigns• MyOSD 2014 and 2015 • ~190 Amplicon (16S/18S) samples

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Standards & Data Harvesting

Scientists

Masame/GustaME

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Standard Sampling Protocols: OSD Handbook

http://www.microb3.eu/sites/default/files/osd/OSD_Handbook_v2.0.pdf P ten Hoopen et al.

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Data Standards: M2B3

Core:• Absolute minimum for Micro-

B3 data sets Time and place Event/sample ID Temperature Salinity Molecular sampling protocol

Recommended:• Remainder of mandatory fields

for existing standards and others

Optional• Useful fields from existing

standards

P ten Hoopen et al. Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Environmental Data

• Air temperature• Water temperature• Salinity• Phosphate• Secchi depth• …

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Information System

Curation

Contextual Data

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MicroB3 Summer School, May-June 2014

Logsheets

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OSD Citizen App

https://itunes.apple.com/us/app/osd-citizen/id834353532?mt=8

https://play.google.com/store/apps/details?id=com.iw.esa

Early, consistent, digital acquisition of environmental data

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sample Registration

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Logistics

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

More than 1000 filters (OSD2014)

High-throughput Visualization of Big Diversity

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

SAMPLESWith BARCODES

METADATA

SAMPLING

CONTEXTUALDATA

SequencingCentre +

SEQUENCINGDATA

CONTEXTUALDATA & METADATA

SEQUENCINGDATA & METADATA

Ocean Sampling Day: Data Flow Overview

http://www.oceansamplingday.org

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Pre-Processing

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sequence Data Pre-Processing Definition: Filter original raw sequence data

• based on well defined sequence quality criteria Goal:

• provide all OSD participants (and the whole scientific community) with a single, quality-controlled dataset

• ensure comparability and repeatability of analysis results.

Scope: • covers both amplicon (i.e. 16/18S rDNA) and

shotgun (i.e. metagenome) data sequenced with Documentation:

http://tinyurl.com/osd-pre-processing

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Raw reads

Quality evaluation

Pre-processing

Read-level analysis

Assembly-level analysis

Pipeline View

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

MetagenomicsAmplicons 16S/18S

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Open Collaboration All data publically available beFORe analysis Open Analysis Group

• Working on > 40 scientific questions/topics Analysis grouped into 3 topic areas:

1. Diversity(Using OTU-based metrics and alternatives such as MED, UniPept etc.)

2. Insights metabolic functions (with focus on human impact) and their role in the ecosystems from Metagenomes

3. Towards an understanding of broad-scale ecological patterns

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Some preliminary Results

By Antonio Fernandez Guerra(MPI Bremen, Oxford University)

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MalaspinaGOS TARA OSD

OSD in Context of other mega-sequencing campaigns

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sunagawa, et al., 2015

Most complete view of the Ocean Microbiome

First sampling 2013

243 sites

7.2Tbp

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ocean Microbial-Reference Gene Catalog (OM-RGC) Assembled reads Annotated Clustered to generate

a non-redundant set of reference genes.

Sunagawa et al., http://ocean-microbiome.embl.de/companion.html

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OM-RGCIncremental clustering

(cd-hit-2d):~ 7 M

OSD genes present in OM-RGC

After Clustering at 95%:

~3.6 M

OSD genes NOT present in OM-RGC

~4.5 M

Adding OSD to the Ocean Microbiome Reference Gene Catalogue

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MalaspinaGOS TARA OSDOSD is 75% Coastal Sampling Day

~50% of OSD genes NOT present in OM-RGC Under-sampling of coasts

• Where 30-40% of human population lives

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The Ocean Resistome

(slides left-out, please contact for further Information)

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The dark side of the metagenomes

Text

Unravelling the unknown in the metagenomic protein universe

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

unknownknown

Knowns and unknowns in the metagenomic protein universe

sequences with significant similarity to known protein domains (PFAM)

all the sequences without an assigned function

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

unknownknownenvironmentalunknown

genomicunknown

Knowns and unknowns in the metagenomic protein universe

sequences with significant similarity to known protein domains (PFAM)

all the sequences without an assigned function

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Metagenomic Network Analysis

Cluster1800572 environmental unknown

SAR11_0487 Tryptophan synthase

SAR11_1266 hypothetical protein

SAR11_0686 hypothetical protein

SAR11_1277 aspartate racemase

Discover the unknowns (Global Ocean Survey)

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ecological Analysis Tools: Gustame

http://mb3is.megx.net/gustame

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Buttigieg & Ramette, 2014

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Multivariate AnalysiS Applications for Microbial Ecology (MASAME)

http://mb3is.megx.net/masame/

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Summary Ocean Sampling Day

• A great open collaboration, open science community• Sharing data is pre-requisite!

Build on existing infrastructure(s)• Only added missing components• Cover all aspects of studying marine microbiomes

Data gathering Bioinformatics Analysis Archiving/Dissemination Ecological Analysis

Comparative metagenomics which takes into account whole data lifecycle• From sampling to ecological and biotechnological analysis

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Thanks for your attention

1st Marine Board Forum: Marine data Challenges: from Observation to Information

http://twitter.com/Micro_B3http://www.oceansamplingday.org

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ecological Traits of Metagenomes

Established Database of calculated traits • Available at http://mb3is.megx.net/mg-traits

PCA of Codon Usage (see http://mb3is.megx.net/mg-traits/traits-summary)

Attribution-NonCommercial-ShareAlike CC BY-NC-SA