37
1 Modélisation du Microbiote intestinal Béatrice Laroche INRA, MaIAGE Travaux conjoints avec Sébastien Raguideau, Sandra Plancade (MaIAGE), Marion Leclerc (MICALIS) Simon Labarthe (MaIAGE), Magali Ribot (U. Orléans) 22/09/2017 Journée chaire MMD

Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

1

Modélisation duMicrobiote intestinal

Béatrice LarocheINRA, MaIAGE

Travaux conjoints avecSébastien Raguideau, Sandra Plancade (MaIAGE), Marion Leclerc (MICALIS)

Simon Labarthe (MaIAGE), Magali Ribot (U. Orléans)

22/09/2017 Journée chaire MMD

Page 2: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

2

All microorganisms in the human intestine : bacteria, fungi, archaea.

Germ free at birth, progressive colonization, increasing diversity until adulthood, driven by• Environment (inc. health status)• Genetics • Diet

Very large number of individuals• 1014 microorganisms• 1000 different species

Symbiotic relationship between the microorganisms and their host.

The human intestinal microbiota

Page 3: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

3

Ecological barrier Immune System Education

Fermentation of non digestible dietary fibers

O’Hara and Shanahan (2006), Mosca et al. (2016)

Main Ecosystemic Functions

Page 4: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

4

Metagenomics : allow culture free analysis of microorganisms genetic material in samples

2 approaches:

SelectiveAmplification

Fragmentation & direct sequencing

Barcoding: amplification of a target sequence

Taxonomic analyses through 16S rDNA amplification

Whole Genome Shotgun (WGS)total DNA fragmentation and production of small reads

Read mapping to annotated gene catalogues,Counting

Access to all gene abundances, functional approaches

Sample 1 1 2 0 13 4

Sample 2 2 0 30 1 14

Sample n 10 5 3 14

.

.

.

Page 5: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

5

Ecological Framework

ECOLOGYCommunity

Ecology

FunctionalEcology

focuses on speciesdiversity and Interactions16S DataBarrier effect againstpathogensScreening of probiotic strains

focuses on the functions of individuals in their ecosystem and environmentorganisms described in terms of functional traits rather than taxonomy WGS dataFiber metabolism, host/microbe interaction understanding structure & mechanisms

Page 6: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

6

Outline

• Modelling Aggregated Functional Traits (AFT) for fiber degradation from WGS dataSébastien Raguideau, Sandra Plancade (MaIAGE),

Marion Leclerc (MICALIS)

• Mechanistic model of fiber degradation in the human gut at the organ scale

• Dynamic interaction models & parameter inference from 16S data

Page 7: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

7

Modelling AFT for fiber degradation

• How can we define AFT of fiber degradation

• Model & mathematical formulation

• Results

Page 8: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

8

Functional traits are quantifiablemorphological or physiological charac-teristics of individuals that directly orindirectly impact their fitness orperformance.

Functional Traits

Very large number of individualsAggregated Functional Traits (AFT)

Leaf surfacevalue = total or mean surface of all theleaves (spectral remote sensing)

Trait : leaf surfacevalue : x cm2

Violle et al. (2007), Homolová et al. (2013), Moreno García et al. (2014)

Page 9: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

9

I) Traits fonctionnels de la dégradation des fibres

AFT = global catalytic potential for reaction steps in fiber degradation

AFT value : Total abundance of all the genes that potentially code for the synthesis of this enzyme

Biochemical reaction

Gene family associated

to a reaction

Transcription and synthesis

Enzyme catalyzing the reaction

Fierer N, Barberan A, Laughlin DC. Seeing the forest for the genes: using metagenomics to infer the aggregated traits of microbial communities

Page 10: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

10

Hydrolysisbreakdown of fibers into simple sugar molecules

• Glycosides hydrolases (GH)• Polysaccharides Lyases (PL)

Catabolic conversion of sugars intoSCFA and syntrophic pathways

• KEGG Orthologies (KO)

Sugar

Page 11: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

11

KOs can be represented on a directed graph

Nodes : metabolic compoundsred : extracellularblue : assumed intracellular

Edges : 61 AFT (=KOs) simplified representation of many reaction steps

ONLY A TOPOLOGICAL GRAPHNO STOICHIOMETRYNO FLUXES

Page 12: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

Selection of 86 AFT 25 GH PL + 61 KOs

Topological graph whose edges are the 61 KOs

AFT measured as the total abundance of genes for the GH or KO

1408 metagenomic samples from 3 projects (HMP, MetaHIT, MicrObes)

AFT abundance matrix A size 1408 x 86

Page 13: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

13

Outline

• How can we define AFT of fiber degradation

• Model & mathematical formulation

• Results

Page 14: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

14

a

a

a

a

d b

d

c

c c

c e

Annotated genes

in a samplea b c d e

Sample 4 1 4 2 1

Abundance of 5 AFT

Page 15: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

Ass

ocia

tions

on

geno

mes

a

a

a

a

db

d

c

cc

ce

ba c

ad

c

a d

c

ce

Functional subcommunities of biotic and abiotic origin

a

15

Latent hierarchical structure :

genes associated on genomes,

microorganisms associated within subcommunities.

a b c d eh3=( 0,0,2,0,1)

a b c d eh1=( 2, 1, 2, 1, 0)

a b c d eh2=( 2,0,0,1,0)

a b c d e

Sample 4 1 4 2 1

Abundance of 5 AFT

Annotated genes

in a sample

Page 16: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

a

a

a

a

db

d

c

cc

ce

ba c

ad

c

a d

c

ce

Functional subcommunities of biotic and abiotic origin

a

16

Latent hierarchical structure :

genes associated on genomes,

microorganisms associated within subcommunities.

but we cannot access the genomes a b c d e

h3=( 0,0,2,0,1)a b c d e

h1=( 2, 1, 2, 1, 0)

a b c d eh2=( 2,0,0,1,0)

a b c d e

Sample 4 1 4 2 1

Abundance of 5 AFT

Annotated genes

in a sample

Page 17: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

17

a b c d e

Echantillon 4 1 4 2 1

a b c d eh3=( 0,0,2,0,1)

a

a

a

a

db

d

c

cc

ce

a b c d eh1=( 2, 1, 2, 1, 0)

a b c d eh2=( 2,0,0,1,0)

The functional subcommunities are characterized by Combined Aggregated Functional Traits (CAFT) defined by their composition in

terms of AFT (h1, h2, h3)

a b c d e

Sample 4 1 4 2 1

Abundance of 5 AFT

Functional subcommunities of biotic and abiotic

origine

Annotated genes

in a sample

Latent hierarchical structure :

genes associated within subcommunities.

Page 18: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

18

• The fiber degradation potential can be represented by a limited number of CAFTs• All samples are mixtures of these CAFTs, in variable proportions

Page 19: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

19

AFT abundances are modelled as mixture of k CAFTs

in variable proportions

𝐴𝑖𝑗 abundance AFT 𝑗 in sample 𝑖

𝑊𝑖𝑙 abundance of CAFT 𝑙 in sample 𝑖

𝐻𝑙𝑗 proportion of AFT 𝑗 in CAFT 𝑙

k is unknown,

A, W, H nonnegative => NMF (Nonnegative Matrix Factorization)

Page 20: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

Frobenius norm (Gaussian error model), other possible choices KL divergence (Poisson model)

Penalization to lift identifiability problem and impose sparsity

Nonnegative Matrix Factorization (NMF) problem

20

Ridge Exclusive LASSO

Page 21: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

CAFT characterize AFT frequencies in subcommunities

A subcommunity is a set of microorganisms

Find upper bound on the FT ratios valid for all genomes (if possible!)

Derive constraints for all CAFTs

21

aa

aa a

a

a

ab

bb

b

. . .

Microorganisme 1 Microorganisme 2 Microorganisme n

Subcommunity

𝑁𝑎 ≤ 3𝑁𝑏 𝑁𝑏 ≤2

3𝑁𝑎

𝐻𝑙𝑎 ≤ 3𝐻𝑙𝑏 𝐻𝑙𝑏 ≤2

3𝐻𝑙𝑎

Page 22: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

22

Use 190 genomes of most prevalent

microorganisms in the gut

Check if bounds exist for bluenodes

Calibrate the ratio bounds

25 blue nodes => 38 effective constraints out of 50 possible 𝐹𝐻𝑇 ≤ 0

𝐴𝐹𝑇 𝑥 𝑝𝑟𝑜𝑑𝑢𝑐𝑖𝑛𝑔𝑚

𝐻𝑙𝑥 ≤ 𝑟1𝑚

𝐴𝐹𝑇 𝑦 𝑐𝑜𝑛𝑠𝑢𝑚𝑖𝑛𝑔𝑚

𝐻𝑙𝑦

𝐴𝐹𝑇 𝑦 𝑐𝑜𝑛𝑠𝑢𝑚𝑖𝑛𝑔𝑚

𝐻𝑙𝑦 ≤ 𝑟2𝑚

𝐴𝐹𝑇 𝑥 𝑝𝑟𝑜𝑑𝑢𝑐𝑖𝑛𝑔𝑚

𝐻𝑙𝑥

𝐻𝑙13 + 𝐻𝑙14 ≤ 𝑟1𝐻𝑙15𝐻𝑙15 ≤ 𝑟2 𝐻𝑙13 + 𝐻𝑙14

look for specific constraints associated to blue nodes (intracellular metabolites)

Page 23: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

23

Bi-convex problem,Alternate minimization (shown to converge to a stationnary point)

Nesterov accelerated projected gradient

Semi explicit solution of the Lagrangian min-max problem and Nesterov

Grippo and Sciandrone (2000), Nesterov (1998), Arrow et al. (1958)

Constrained NMF problem

Page 24: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

24

Outline

• Aggregated Functional Traits (AFTs) of fiber degradation

• Model & mathematical formulation

• Results

Page 25: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

• Solve for a range of values for k (use of SVD)

• Selection according to 3 criteria

Relative reconstruction Error

Decreasing with k

Optimal k = slope change

Page 26: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

26

Bi Crossvalidation

Assess the predictive capacity

A11

A21

A12

A22

Random 10-fold split on lines and columns

Training set (red)Validation set (green)

Optimal k = minimum BiCV error

AFT (columns ) are not independant => no theoretical guarantee

Not possible to use with constraints

Owen and Perry (2009)

Page 27: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

2727

𝐴1 ≈ 𝑊1𝐻1

𝐴2 ≈ 𝑊2𝐻2

Separate NMF on two-fold splits

Assess the robustesse of CAFT inference

similarity between H1 et H2 up to an orthogonal transformation

Mean over several random splits

Jiang et al. (2012)

Concordance index on H

Page 28: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

28

Relative reconstruction error : k=4 or k=6BiCV error : k=4 or k=7Concordance : k=4 (or k=6?)

Relative reconstruction error

Concordance

Bi-crossvalidation error

k=4

k (number of CAFT) k (number of CAFT)

k (number of CAFT)

Page 29: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

29

61 KO + 25 GH PL

0

1

Page 30: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

30

Métavariables : clinical study x healthstatus

Abundances of 4 CAFT in each group

• Chinese x Diabetes

• Chinese x Healthy

• Spanish x Crohn

• Spanish x Healthy

• Spanish x Ulcerative Colitis

Difference between Spanish Crohn/Healthy

Confirmed on external data (ongoing work)

Page 31: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

31

Wrap up

Genomic and metagenomic

data

W matrix H matrix

Characterizepopulations

Characterizesubcommunities

AFTStructureconstraint

Ecological model

MetabolicKnowledge

Raguideau et al. Inferring Aggregated FunctionalTraits from Metagenomic Data UsingConstrained Non-Negative Matrix Factorization: Application to Fiber Degradation in the HumanGut Microbiota.

PLoS Computational Biology, 2016.

Set of scripts and code developped in python and C, available soon.

Page 32: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

32

Outline

• Modelling Aggregated Functional Traits (AFT) for fiber degradation from WGS data

• Mechanistic model of fiber degradation in the human gut at the organ scaleintegrate all the knowledge we gain in a deterministic mechanistic model, started in 2009 (ODE)Simon Labarthe (MaIAGE), Magali Ribot (U. Orléans)

• Dynamic interaction models & parameter inference from 16S data

Page 33: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

33

+ Boundaryconditions

Cylindric geometry

Page 34: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

34

Mixture model,Transport, reaction, diffusion, friction+ Stokes (fluid mechanics)+ Keller Segel (swimming)

Solved using MAC grid and adapted numerical schemes (time splitting)

Simplified model (homogeneization) => highly efficient simulation

Next step (PhD starting soon)• improve bacterial population structuration using previous results

and build dFBA like models • Improve boundary terms= host/microbe crosstalk, mucus/microbe

interaction…

Page 35: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

35

Outline

• Modelling Aggregated Functional Traits (AFT) for fiber degradation from WGS data

• Mechanistic model of fiber degradation in the human gut at the organ scale

• Dynamic interaction models & parameter inference from 16S datarecently started, Simon Labarthe, Nicolas Brunel (U Evry) fordeterministic, Generalized Lotka Volterra modelsS. Widder (U Vienna) for stochastic IBM+collab. INRA

Page 36: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

36

Stochastic IBM

Individual interaction rules and rates (interaction, death,Immigration)Marvov jump process

Objective= inference of parameters& model selectionHigh dimension (data/model reduction)Sparse interactionsNoisy and poorly sampled data

Page 37: Modélisation du Microbiote intestinal · production of small reads Read mapping to annotated gene catalogues, Counting Access to all gene abundances, functional approaches Sample

37

Merci de votreattention