Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
SHAMAN : SHiny Application for Metagenomic ANalysisStevenn Volant, Amine GhozlaneHub Bioinformatique et Biostatistique – C3BI, USR 3756 IP CNRSBiomics – CITECH
Image from Alberts Molecular Biology of the Cell 5th
Ribosome
ITS (1) : located between 18S and 5.8S rRNA genes
2 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Quantitative metagenomics pipeline
Mapping
Merging Clustering500
1000
300
Samples
1 N
OTU 1
OTU 2
OTU 3
16S/18S/ITS
3 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
HUB – 16S/18S/ITS pipeline
A A
A A
A
A
Dereplication
Chimera filteringGold database
Amplicon sequences
Clustering
Taxonomic Annotation
Mapping
OTU
Countmatrix
PhylogenyMafft/Fastree
Vsearch3
1.Genomics, 102(5), 500-506. 2.Bioinformatics Vol.27 no. 21 2011, p2957-2963. 3.https://github.com/torognes/vsearch
Sample
Amplicon
FastQC
FLASH2
Silva SSU 123RDP
Greengenes13,4
Filter Human PhiX reads
Alientrimmer1
26 min
100 samples50 pairs30 min
Findley
UNITE150 min
4 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Readprocessing
OTU building
Uparse/Vsearch
Integrated in QIIME, MOTHUR and LotuS
Nature methods, 10(10), 996-998.
5 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Quantitative metagenomics pipeline
Mapping
Merging Clustering500
1000
300
Samples
1 N
OTU 1
OTU 2
OTU 3
16S/18S/ITS
6 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
« être éclairé » (Toungouse - Sibérie)
SHAMAN
SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is
considered only after an experiment is completed, when it may be too late. »
7 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
SHAMAN : shaman.c3bi.pasteur.fr
8 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Counts Annotation
SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is
considered only after an experiment is completed, when it may be too late. »
9 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Metagenomic vs RNA-seq
Metagenomic RNA-seq
DistributionOverdispersed counts → Negative
binomialOverdispersed counts → Negative
binomial
Constraints Highly abundant species Highly expressed genes
Goal
Find differentially abundant features (species, familly, …):
OTU distributions and abundances vary between conditions
Find differentially expressed genes: Distributions and expression vary
between conditions
DESeq2 approach is usually used for RNA-seq dataset
Metagenomic data are similar to RNA-seq data
10 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Data normalization
How ?✗ Fitting the distributions (Total Read Count, UpperQuartile, Median, Full Quantile)✗ Account for the feature length (RPKM)✗ Concept of « effective reads number » (TMM, DESeq2)
Why ? ✗ To correct technical biaises and make samples comparables.
Remarks?✗ Some methods normalize the counts, others the library sizes✗ Some are designed for differential analysis
11 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
DESeq2 normalization (OTU level)
Assumption✗ Most of the OTU have the « same » abundance between 2 conditions
where
Xij : Number of mapped reads of the OTU i in sample jn: Number of samples
Normalization factor :
12 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Comparison with RPKM (1/3)
● RPKM : Reads Per Kilobase per Million mapped reads
Assumption✗ Counts are proportional to abundance, the length and the sequencing deepth.
● Method
Normalized counts =
per Million Per Kilobase
Length Number of reads of sample j
13 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Comparison with RPKM (2/3)
Comparison of 7 normalization methods
14 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Comparison with RPKM (3/3)
Results on real data (7 samples) FDR and Power
15 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Dillies M. et al., Bioinformatics 2013
To sum up
DESeq2 normalization provides better results
Recommend using DESeq2 to perform analysis of differential abundance
16 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Statistical model of DESeq2
Generalized Linear Model
Dispersion
Log2 fold change
Advantages✗ Allows complex experimental designs.
17 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Size factor
Moyenne
Dispersion estimation
Problem✗ Get a good estimate of the dispersion with a small number of samples.
Modelisation of the dispersion :
Function of the mean of normalized count
Local parametric regression
18 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Contrasts (comparisons)
Aim✗ Testing a specific effect without having to re-fit the model.
Advantages✗ Parameters are estimated with all samples.
Coefficients
Contrast vector
Covariance matrix
19 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Conclusions
SHAMAN
16s/18s/its analysis
Strong statistical approach
Several visualizations available
Access : http://shaman.c3bi.pasteur.fr
Incoming features
WGS analysis
New visualizations (Taxonomy plot, Krona, continuous data)
Compatibility with FROGS
CIB – FROGS 16S/18S – GALAXY Pasteur
Galaxy team : Mathieu Valade, Fabien MareuilEmmanuel Quevillon, Eric Deveaud
Vsearchswarm
21 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Acknowledgements
CiTech Biomics Pole
Sean KENNEDYBéatrice REGNAULT
Olivier GASCUELMarie-Agnès DILLIESChristophe MALABATHugo VARETNicolas MAILLETPierre LECHATAnna ZHUKOVARachel TORCHET
C3BI Bioinformatics Biostatistics Hub