17
Statistical and Visualization Methods for Metagenomic Analysis Héctor Corrada Bravo Center for Bioinformatics and Computational Biology

Hmp 201512

Embed Size (px)

Citation preview

Methods for detecting differential abundance in metagenomic data

Statistical and Visualization Methods for Metagenomic AnalysisHctor Corrada Bravo Center for Bioinformatics and Computational Biology

metagenomeSeq16S differential abundanceR/Bioconductor infrastructure for metagenomic assaysLongitudinal data

metagenomicFeaturesIncipient attempt regularizing 16S feature annotations in R/BioconductorE.g., greengenes13.5MgDb

msd16sExample data, as infrastructure object

R/Bioconductor StrengthsInfrastructure objectsInteroperability, speed up startup time for method developmentStrict development practicesDocumentation, use cases, vignettesAnnotation infrastructureAgain, interoperability across experiments and data typesExploratory analysisReproducibilityVignettes, Rmarkdown, etc.Recently, exploratory and interactive visualizationShiny, epiviz

Integrative, visual and computational exploratory analysis of genomic dataBrowser-basedInteractiveIntegration of dataReproducible disseminationCommunication with R/Bioconductor: epivizr package

software systems to support creative exploratory analysis of large genome-wide datasets...

Computed Measurements: create new measurements from integrated measurements and visualize

Summarization: summarize integrated measurements (computed on data subsets)

Dynamically extensible: Easily integrate new data sources, data types and add new visualizations.

Data providers define coordinatespace

One interpretation of Big Data is many sources of relevant contextual data

Easily access/integrate contextual dataDriven by exploratory analysis of immediate dataIterative processVisual and computational exploration go hand in hand

Visualization design goals

Context Integrate and align multiple data sources; navigate; searchConnect: brushingEncode: map visualization properties to data on the flyReconfigure: multiple views of the same data

Visualization design goals

DataSelect and filter: tight-knit integration with R/Bioconductor(current work) filters on visualization propagate to data environmentModelNew 'measurements' the result of modeling; suggested by data context

Metagenomic VisualizationHow to effectively navigate large datasets where features are organized hierarchically?

Metaviz: browser-based, interactiveexploratory analysis of metagenomicdata

Connection to R/Bioconductor withmetavizr packageBuilt on metagenomeSeq and metagenomeFeatures infrastructure

MetavizExploration of hierarchically organized featuresGeared towards 16S for nowHierarchical organization relevant to WGSIntegration is a big part of designFramework designed for data integration

AcknowledgementsBrianna Lindsey, O. Colin Stine, Owen White, Anup Mahurkar: University of Maryland BaltimoreJim Nataro: University of VirginiaNIGMS, Genentech

Florin Chelaru(now @ MIT)

Joseph Paulson(now @ Harvard)

Mihai Pop (@ UMD)