Download pdf - Statistical learning of biological networks: a brief overview · Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 12 / 30. Supervised

Statistical learning of biological networks: a briefoverview

Florence d’Alché–Buc

IBISC CNRS, Université d’Evry, GENOPOLE, Evry, FranceEmail: [email protected]

Statistical learning of biological networks: a brief overview 1 / 30

Biological networks

Statistical learning of biological networks: a brief overview Introduction 2 / 30

Motivation

Identify and understand complex mechanisms at work in the cellBiological networks

signaling pathwaysgene regulatory networksprotein-protein interaction networksmetabolic pathways

Use experimental data and prior knowledge AND statisticalinference to unravel biological networks and predict theirbehaviour


How to learn biological networks from data ?

Data-mining approaches : extract co-expressed patterns and/orco-regulated patterns, reduce dimension [large scale data, oftenpreliminary to more accurate modelling or prediction]Modeling approaches : model the network behavior, can beused to simulate and predict the network as a system [smallerscale data]Predictive approaches : predict (only) edges in an unsupervisedor supervised way [large or medium scale data]


Learning (biological) networks


Outline

1 Introduction

2 Supervised Predictive approaches

3 Modelling approaches

4 Conclusion

Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 6 / 30

Supervised learning of the regulation concept

Instance Problem 1 (transcriptional regulatorynetworks):Training sample S = {(wi = (vi , v ′i ), yi), i = 1...n}where wi are pairs of components vi and v ′i (think transcriptionfactor and potential regulee) and yi ∈ Y indicates if there is vi is atranscription factor for v ′i . We wish to be able to predict newregulations.Reference : Qian et al. 2003, Bioinformatics.In symbolic machine learning, this corresponds to the frameworkof relational learning classically associated with inductive logicprogramming (ILP) and more recently to statistical ILP :The predicate interaction(X,Y) can be learned from labeledexamples


Supervised learning of interactions

From a known network where each vertex is described by someinput feature vector x , predict the edges involving new verticesdescribed by their input feature vector


Supervised prediction of protein-protein interactionnetwork

Instance Problem 2 (protein-protein interaction networks) :Training sample S = {(wi = (vi , v ′i ), yi), i = 1...n} where wi arecouples of components vi and v ′i (think proteins) and yi ∈ Yindicates if there is an edge or not between vi and v ′i . We wish topredict interactions for test and training input dataNoble et al. in 2005 (SVM) with kernel combinationFurther studied by Biau and Bleakley 2006, Bleakley et al. 2007


Similarity or kernel learning

In the case of non oriented graphs, a similarity betweencomponents can be learnt instead of a classification functionYamanishi and Vert’s work (2005) first introduced this kind ofapproachWe proposed a new way of formulating the problem as regressionin output space endowed with a kernel (Geurts et al. 2006,2007)


Supervised learning with output (kernel) feature space

Suppose we have a learning sampleLS = {xi = x(vi), i = 1, . . . ,N} drawn from a fixed but unknownprobability distribution and an additional information provided by aGram matrix K = kij = k(vi , vj), fori , j = 1, . . . ,N} that expresseshow much objects vi , i = 1...n are close to each other.Let us call respectively φ the implicit output feature map and k thepositive definite kernel defined on V × V such that< φ(v), φ(v ′) >= k(v , v ′).

From a learning sample {(xi ,Kij |i = 1, . . . ,N, j = 1, . . . ,N}with xi ∈ X ,find a function f : X → F that minimizes theexpectation of some loss function ` : F × F → IR over thejoint distribution of input/output pairs:

Ex ,φ(v){`(fφ(x), φ(v))}


Application to supervised inference of edges in agraph 1

For objects v1, ..., vN , let us assume we have : feature vectorsx(vi), i = 1...N and a Gram matrix K defined as Ki,j = k(vi , vj). Thekernel k reflects the proximity between objects v , as vertices in theknown graph.Reminder: kernel k is a positive definite (similarity) function. For suchfunction, there exists a function φ called a feature map :V → F such thatk(v , v ′) = 〈φ(v), φ(v ′)〉.


Supervised inference of edges in a graph

Use a machine learning method that can infer a function h :X → F to get for a given x(v), an approximation of φ(v) and getan approximation g(x(v), x(v ′)) = 〈h(x(v)),h(x(v ′))〉 of the kernelvalue between v and v ′ described by their input feature vectorsx(v) and x(v ′)Connect these two vertices if g(x(v), x(v ′)) > θ

(by varying θ we get different tradeoffs between true positive and falsepositive rates)


A kernel on graph nodes

Diffusion kernel (Kondor and Lafferty, 2002):The Gram matrix K with Ki,j = k(vi , vj) is given by:

K = exp(−βL)

where the graph Laplacian L is defined by:

Li,j =

di the degree of node vi if i = j ;−1 if vi and vj are connected;0 otherwise.


Interpretability: rules and clusters (an example with aprotein-protein network)


Network completion and function prediction for yeastdata


Challenges and limitations in supervised predictiveapproaches

Semi-supervised learning or even transductive learningIssue : unbalanced distribution of positive and negative exampleslocal approach (the graph is not seen as a single variable)data (labeled examples) are not i.i.d. : regulations are notindependent


Outline

1 Introduction

2 Supervised Predictive approaches

3 Modelling approaches

4 Conclusion

Statistical learning of biological networks: a brief overviewModelling approaches 18 / 30

Graphical models : from simple interactions models tocomplex ones

Graphical Gaussian Model model estimation: estimating partialcorrelation as a measure of conditional independency (classifiedas graph prediction in my terminology)Bayesian networks estimation: modelling directed interactionsDynamic Bayesian Networks estimation: modelling directedinteractions through timeState-space models estimation: modelling observed and hiddendynamical processes as well


Focus on state-space models

Goal:Quantitative models (easier to learn, encompass mechanisticmodels : biological relevance)Taking into account timeSome variables are not measured: assumption of an hiddenprocessLinear Gaussian models: parameters encapsulate networkstructure (Perrin et al. 03, Rangel et al. 04)Nonlinear models (more biologically relevant): the structure isencapsulated in the form of the transition function (Nachman 04,Rogers et al. 06, Quach et al. 07)

x(tk+1) = F(x(tk ),u; θ) + εh(tk )

y(tk ) = H(x(tk ),u(tk ); θ) + ε(tk )


System of Ordinary Differential Equations (ODE)

dxdt

= f(x(t),u(t); θ)

Let us focus on gene regulatory networksx(t) : state variables at time t

protein concentrationsmRNA concentrations

f : the form of f encodes the nature of interactions (and theirstructure)

linear/nonlinear modelsMichaelis-Menten kineticsMass action kinetics...

θ: parameter set (kinetic parameters, rate constants,...)u(t): input variables at time t


Reverse Engineering of Biological Networks

GivenAn ODE model :

dx(t)dt

= f(x(t),u(t); θ)

A partially and noisy observation model:

y(t) = H(x(t),u(t); θ) + ε(t)

where H is a nonlinear observation function, ε(t) is a i.i.d noiseA sequence of observed data : y1:K = {y1, ...,yK} at timet1, t2, ..., tk

GoalStructure estimationParameters estimation θ

States estimation x(t)


Reverse Engineering of Biological Networks

GivenAn ODE model :

dx(t)dt

= f(x(t),u(t); θ)

A partially and noisy observation model:

y(t) = H(x(t),u(t); θ) + ε(t)

where H is a nonlinear observation function, ε(t) is a i.i.d noiseA sequence of observed data : y1:K = {y1, ...,yK} at timet1, t2, ..., tk

GoalStructure estimationParameters estimation θ

States estimation x(t)Statistical learning of biological networks: a brief overviewModelling approaches 22 / 30

Structure learning

Case 1: a very few variables involved, then a combinatorial searchfor structure can be processed. For each potential structure,estimation of parameters has to be carried onCase 2: more than a tens of variables are involved, then it is worthusing an algorithm dedicated to structure learning. Structurelearning in nonlinear dynamical models as well as in staticBayesian networks can be solved by a stochastic exploration ofthe candidates (huge) set using an appropriate criterion that takeinto account data and parameters estimation, given the candidatestructure. MCMC methods, evolutionary approaches are used.In the following, we assume that the network structure is given


An example of Nonlinear State-Space Model

Continuous time ODE model

dx(t)dt

= f(x(t),u(t); θ)

y(t) = H(x(t),u(t); θ) + ε(t)

The system at discrete-time points t1, ..., tK

x(tk+1) = F(x(tk ),u; θ)

y(tk ) = H(x(tk ),u(tk ); θ) + ε(tk )

with

F(x(tk ),u; θ) = x(tk ) +

∫ tk+1

tkf(x(τ),u(τ); θ)dτ


Bayesian inference

Given:Prior distribution over the initial state and parameters: p(x1,θ)

A state transition model: p(xk |xk−1,θ)

An observation model: p(yk |xk ,θ)

A sequence of observations: y1:K = {y1, ...,yK}

Estimating the posterior distributionsFocus on the filtering distribution: p(xk ,θ|y1:k )

Tool: Unscented Kalman Filter to deal with nonlinearities (Quachet al., 2007)


Bayesian inference

Given:Prior distribution over the initial state and parameters: p(x1,θ)

A state transition model: p(xk |xk−1,θ)

An observation model: p(yk |xk ,θ)

A sequence of observations: y1:K = {y1, ...,yK}

Estimating the posterior distributionsFocus on the filtering distribution: p(xk ,θ|y1:k )

Tool: Unscented Kalman Filter to deal with nonlinearities (Quachet al., 2007)


Example: the Repressilator

[Elowitz and Leibler,Nature 2000] dr1

dt= vmax

1kn

12kn

12 + pn2− kmRNA

1 r1

dr2

dt= vmax

2kn

23kn

23 + pn3− kmRNA

2 r2

dr3

dt= vmax

3kn

31kn

31 + pn1− kmRNA

3 r3

dp1

dt= k1r1 − kprotein

1 p1

dp2


2 p2

dp3


3 p3

mRNAs are observed, proteins are hiddenmRNA and protein degradation rate constants are supposed to beknownEstimate 9 parametersStatistical learning of biological networks: a brief overviewModelling approaches 26 / 30

Parameter Estimation


Challenges in (dynamical) modelling approaches

Identifiability of dynamical modelsTheoretical results about sample complexityScaling to large networksNon stationnarityIncorporate other components : space, cellular compartments ...coupled systems : metabolic and regulatory networks,protein-protein interactions and regulatory networkMORE DATA : benchmark problems, challenges


General conclusion and perspective

Different views of the learning problem, different scales, differentprior knowledgeSome of these methods could be linked to participate to the samediscovery processNeed for building data repository and demand for biologicalvalidation

Statistical learning of biological networks: a brief overview Conclusion 29 / 30

References

C. Auliac, V. Frouin, X. Gidrol, F. d’Alché-Buc, Evolutionary Approaches for theReverse-Engineering of Gene Regulatory Networks: A Study on a RealisticBiological Dataset, accepté à BMC Bioinformatics, à paraître en 2008.P. Geurts, N. Touleimat, M. Dutreix, F. d’Alché-Buc, Inferring biologicalnetworks with output kernel trees, BMC Bioinformatics, to appear, May 3,2007.Kato, K. Tsuda, EM based algorithm for kernel matrix completion,Bioinformatics, vol. 21, 2005.B.-E. Perrin, L. Ralaivola,A. Mazurie, S. Bottani, J. Mallet, F. d’Alché-Buc,Inference of gene regulatory network with Dynamic Bayesian Network,Bioinformatics (Oxford Press), vol. 19, pi38-49,2003.M. Quach, N.Brunel, F. d’Alché-Buc, Estimating parameters and hiddenvariables in nonlinear state-space models based on ODEs for biologicalnetworks inference, November, 23:3209-3216, 2007.Y. Yamanishi, Y., J.-P. Vert and Kanehisa, M. Supervised enzyme networkinference from the integration of genomic data and chemicalinformation,Bioinformatics, vol. 21,2005.

Statistical learning of biological networks: a brief overview Conclusion 30 / 30