Upload
renzo-kottmann
View
656
Download
0
Embed Size (px)
Citation preview
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
The Ocean Sampling Day's Metagenome Analysis:
Standards, Pipelines and First Results
Microbial Genomics and Bioinformatics Research GroupRenzo Kottmann
[email protected], 2015-11-18
MAX PLANCK INSTITUTEFOR MARINE MICROBIOLOGY
Investigation of the diversity,structure and distribution ofmicrobial populations through theapproaches based onnucleic acid analyses
Junior Group of Molecular EcologyDr. Rudolf Amann
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Common Bioinformatics View on Metagenomics:
Give me a big metagenomic sequence data,I assign gene functions
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Data Centric View on Metagenomics
Sketch from Martin Fowler: http://martinfowler.com/articles/bigData/
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
A global mega-sequencing campaign
June Solstice 2014/15
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Key facts > 200 sites participated in OSD
• 4 OSD Pilot Events (Jun/Dec 2012 + Jun/Dec 2013) • 2 OSD Main Events (June 2014 and June 2015)
Participating OSD sites ranged from: • subtropical waters in Hawaii to extreme
environments such as the Fram Strait in the Arctic Ocean
Each main year:• ~150 Metagenomes• ~200 Amplicon (16S/18S) samples
2 Citizen Science Campaigns• MyOSD 2014 and 2015 • ~190 Amplicon (16S/18S) samples
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Standards & Data Harvesting
Scientists
Masame/GustaME
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Standard Sampling Protocols: OSD Handbook
http://www.microb3.eu/sites/default/files/osd/OSD_Handbook_v2.0.pdf P ten Hoopen et al.
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Data Standards: M2B3
Core:• Absolute minimum for Micro-
B3 data sets Time and place Event/sample ID Temperature Salinity Molecular sampling protocol
Recommended:• Remainder of mandatory fields
for existing standards and others
Optional• Useful fields from existing
standards
P ten Hoopen et al. Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Environmental Data
• Air temperature• Water temperature• Salinity• Phosphate• Secchi depth• …
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
OSD Citizen App
https://itunes.apple.com/us/app/osd-citizen/id834353532?mt=8
https://play.google.com/store/apps/details?id=com.iw.esa
Early, consistent, digital acquisition of environmental data
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
More than 1000 filters (OSD2014)
High-throughput Visualization of Big Diversity
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
SAMPLESWith BARCODES
METADATA
SAMPLING
CONTEXTUALDATA
SequencingCentre +
SEQUENCINGDATA
CONTEXTUALDATA & METADATA
SEQUENCINGDATA & METADATA
Ocean Sampling Day: Data Flow Overview
http://www.oceansamplingday.org
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Sequence Data Pre-Processing Definition: Filter original raw sequence data
• based on well defined sequence quality criteria Goal:
• provide all OSD participants (and the whole scientific community) with a single, quality-controlled dataset
• ensure comparability and repeatability of analysis results.
Scope: • covers both amplicon (i.e. 16/18S rDNA) and
shotgun (i.e. metagenome) data sequenced with Documentation:
http://tinyurl.com/osd-pre-processing
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Raw reads
Quality evaluation
Pre-processing
Read-level analysis
Assembly-level analysis
Pipeline View
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Bioinformatics Pipelines
MetagenomicsAmplicons 16S/18S
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Open Collaboration All data publically available beFORe analysis Open Analysis Group
• Working on > 40 scientific questions/topics Analysis grouped into 3 topic areas:
1. Diversity(Using OTU-based metrics and alternatives such as MED, UniPept etc.)
2. Insights metabolic functions (with focus on human impact) and their role in the ecosystems from Metagenomes
3. Towards an understanding of broad-scale ecological patterns
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Some preliminary Results
By Antonio Fernandez Guerra(MPI Bremen, Oxford University)
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
MalaspinaGOS TARA OSD
OSD in Context of other mega-sequencing campaigns
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Sunagawa, et al., 2015
Most complete view of the Ocean Microbiome
First sampling 2013
243 sites
7.2Tbp
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Ocean Microbial-Reference Gene Catalog (OM-RGC) Assembled reads Annotated Clustered to generate
a non-redundant set of reference genes.
Sunagawa et al., http://ocean-microbiome.embl.de/companion.html
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
OM-RGCIncremental clustering
(cd-hit-2d):~ 7 M
OSD genes present in OM-RGC
After Clustering at 95%:
~3.6 M
OSD genes NOT present in OM-RGC
~4.5 M
Adding OSD to the Ocean Microbiome Reference Gene Catalogue
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
MalaspinaGOS TARA OSDOSD is 75% Coastal Sampling Day
~50% of OSD genes NOT present in OM-RGC Under-sampling of coasts
• Where 30-40% of human population lives
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
The Ocean Resistome
(slides left-out, please contact for further Information)
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
The dark side of the metagenomes
Text
Unravelling the unknown in the metagenomic protein universe
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
unknownknown
Knowns and unknowns in the metagenomic protein universe
sequences with significant similarity to known protein domains (PFAM)
all the sequences without an assigned function
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
unknownknownenvironmentalunknown
genomicunknown
Knowns and unknowns in the metagenomic protein universe
sequences with significant similarity to known protein domains (PFAM)
all the sequences without an assigned function
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Metagenomic Network Analysis
Cluster1800572 environmental unknown
SAR11_0487 Tryptophan synthase
SAR11_1266 hypothetical protein
SAR11_0686 hypothetical protein
SAR11_1277 aspartate racemase
Discover the unknowns (Global Ocean Survey)
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Ecological Analysis Tools: Gustame
http://mb3is.megx.net/gustame
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Multivariate AnalysiS Applications for Microbial Ecology (MASAME)
http://mb3is.megx.net/masame/
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Summary Ocean Sampling Day
• A great open collaboration, open science community• Sharing data is pre-requisite!
Build on existing infrastructure(s)• Only added missing components• Cover all aspects of studying marine microbiomes
Data gathering Bioinformatics Analysis Archiving/Dissemination Ecological Analysis
Comparative metagenomics which takes into account whole data lifecycle• From sampling to ecological and biotechnological analysis
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Thanks for your attention
1st Marine Board Forum: Marine data Challenges: from Observation to Information
http://twitter.com/Micro_B3http://www.oceansamplingday.org
Attribution-NonCommercial-ShareAlike CC BY-NC-SA
Ecological Traits of Metagenomes
Established Database of calculated traits • Available at http://mb3is.megx.net/mg-traits
PCA of Codon Usage (see http://mb3is.megx.net/mg-traits/traits-summary)