Statistical learning of biological networks: a briefoverview
Florence d’Alché–Buc
IBISC CNRS, Université d’Evry, GENOPOLE, Evry, FranceEmail: [email protected]
Statistical learning of biological networks: a brief overview 1 / 30
Biological networks
Statistical learning of biological networks: a brief overview Introduction 2 / 30
Motivation
Identify and understand complex mechanisms at work in the cellBiological networks
signaling pathwaysgene regulatory networksprotein-protein interaction networksmetabolic pathways
Use experimental data and prior knowledge AND statisticalinference to unravel biological networks and predict theirbehaviour
Statistical learning of biological networks: a brief overview Introduction 3 / 30
How to learn biological networks from data ?
Data-mining approaches : extract co-expressed patterns and/orco-regulated patterns, reduce dimension [large scale data, oftenpreliminary to more accurate modelling or prediction]Modeling approaches : model the network behavior, can beused to simulate and predict the network as a system [smallerscale data]Predictive approaches : predict (only) edges in an unsupervisedor supervised way [large or medium scale data]
Statistical learning of biological networks: a brief overview Introduction 4 / 30
Learning (biological) networks
Statistical learning of biological networks: a brief overview Introduction 5 / 30
Outline
1 Introduction
2 Supervised Predictive approaches
3 Modelling approaches
4 Conclusion
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 6 / 30
Supervised learning of the regulation concept
Instance Problem 1 (transcriptional regulatorynetworks):Training sample S = {(wi = (vi , v ′i ), yi), i = 1...n}where wi are pairs of components vi and v ′i (think transcriptionfactor and potential regulee) and yi ∈ Y indicates if there is vi is atranscription factor for v ′i . We wish to be able to predict newregulations.Reference : Qian et al. 2003, Bioinformatics.In symbolic machine learning, this corresponds to the frameworkof relational learning classically associated with inductive logicprogramming (ILP) and more recently to statistical ILP :The predicate interaction(X,Y) can be learned from labeledexamples
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 7 / 30
Supervised learning of interactions
From a known network where each vertex is described by someinput feature vector x , predict the edges involving new verticesdescribed by their input feature vector
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 8 / 30
Supervised prediction of protein-protein interactionnetwork
Instance Problem 2 (protein-protein interaction networks) :Training sample S = {(wi = (vi , v ′i ), yi), i = 1...n} where wi arecouples of components vi and v ′i (think proteins) and yi ∈ Yindicates if there is an edge or not between vi and v ′i . We wish topredict interactions for test and training input dataNoble et al. in 2005 (SVM) with kernel combinationFurther studied by Biau and Bleakley 2006, Bleakley et al. 2007
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 9 / 30
Similarity or kernel learning
In the case of non oriented graphs, a similarity betweencomponents can be learnt instead of a classification functionYamanishi and Vert’s work (2005) first introduced this kind ofapproachWe proposed a new way of formulating the problem as regressionin output space endowed with a kernel (Geurts et al. 2006,2007)
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 10 / 30
Supervised learning with output (kernel) feature space
Suppose we have a learning sampleLS = {xi = x(vi), i = 1, . . . ,N} drawn from a fixed but unknownprobability distribution and an additional information provided by aGram matrix K = kij = k(vi , vj), fori , j = 1, . . . ,N} that expresseshow much objects vi , i = 1...n are close to each other.Let us call respectively φ the implicit output feature map and k thepositive definite kernel defined on V × V such that< φ(v), φ(v ′) >= k(v , v ′).
From a learning sample {(xi ,Kij |i = 1, . . . ,N, j = 1, . . . ,N}with xi ∈ X ,find a function f : X → F that minimizes theexpectation of some loss function ` : F × F → IR over thejoint distribution of input/output pairs:
Ex ,φ(v){`(fφ(x), φ(v))}
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 11 / 30
Application to supervised inference of edges in agraph 1
For objects v1, ..., vN , let us assume we have : feature vectorsx(vi), i = 1...N and a Gram matrix K defined as Ki,j = k(vi , vj). Thekernel k reflects the proximity between objects v , as vertices in theknown graph.Reminder: kernel k is a positive definite (similarity) function. For suchfunction, there exists a function φ called a feature map :V → F such thatk(v , v ′) = 〈φ(v), φ(v ′)〉.
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 12 / 30
Supervised inference of edges in a graph
Use a machine learning method that can infer a function h :X → F to get for a given x(v), an approximation of φ(v) and getan approximation g(x(v), x(v ′)) = 〈h(x(v)),h(x(v ′))〉 of the kernelvalue between v and v ′ described by their input feature vectorsx(v) and x(v ′)Connect these two vertices if g(x(v), x(v ′)) > θ
(by varying θ we get different tradeoffs between true positive and falsepositive rates)
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 13 / 30
A kernel on graph nodes
Diffusion kernel (Kondor and Lafferty, 2002):The Gram matrix K with Ki,j = k(vi , vj) is given by:
K = exp(−βL)
where the graph Laplacian L is defined by:
Li,j =
di the degree of node vi if i = j ;−1 if vi and vj are connected;0 otherwise.
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 14 / 30
Interpretability: rules and clusters (an example with aprotein-protein network)
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 15 / 30
Network completion and function prediction for yeastdata
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 16 / 30
Challenges and limitations in supervised predictiveapproaches
Semi-supervised learning or even transductive learningIssue : unbalanced distribution of positive and negative exampleslocal approach (the graph is not seen as a single variable)data (labeled examples) are not i.i.d. : regulations are notindependent
Statistical learning of biological networks: a brief overviewSupervised Predictive approaches 17 / 30
Outline
1 Introduction
2 Supervised Predictive approaches
3 Modelling approaches
4 Conclusion
Statistical learning of biological networks: a brief overviewModelling approaches 18 / 30
Graphical models : from simple interactions models tocomplex ones
Graphical Gaussian Model model estimation: estimating partialcorrelation as a measure of conditional independency (classifiedas graph prediction in my terminology)Bayesian networks estimation: modelling directed interactionsDynamic Bayesian Networks estimation: modelling directedinteractions through timeState-space models estimation: modelling observed and hiddendynamical processes as well
Statistical learning of biological networks: a brief overviewModelling approaches 19 / 30
Focus on state-space models
Goal:Quantitative models (easier to learn, encompass mechanisticmodels : biological relevance)Taking into account timeSome variables are not measured: assumption of an hiddenprocessLinear Gaussian models: parameters encapsulate networkstructure (Perrin et al. 03, Rangel et al. 04)Nonlinear models (more biologically relevant): the structure isencapsulated in the form of the transition function (Nachman 04,Rogers et al. 06, Quach et al. 07)
x(tk+1) = F(x(tk ),u; θ) + εh(tk )
y(tk ) = H(x(tk ),u(tk ); θ) + ε(tk )
Statistical learning of biological networks: a brief overviewModelling approaches 20 / 30
System of Ordinary Differential Equations (ODE)
dxdt
= f(x(t),u(t); θ)
Let us focus on gene regulatory networksx(t) : state variables at time t
protein concentrationsmRNA concentrations
f : the form of f encodes the nature of interactions (and theirstructure)
linear/nonlinear modelsMichaelis-Menten kineticsMass action kinetics...
θ: parameter set (kinetic parameters, rate constants,...)u(t): input variables at time t
Statistical learning of biological networks: a brief overviewModelling approaches 21 / 30
Reverse Engineering of Biological Networks
GivenAn ODE model :
dx(t)dt
= f(x(t),u(t); θ)
A partially and noisy observation model:
y(t) = H(x(t),u(t); θ) + ε(t)
where H is a nonlinear observation function, ε(t) is a i.i.d noiseA sequence of observed data : y1:K = {y1, ...,yK} at timet1, t2, ..., tk
GoalStructure estimationParameters estimation θ
States estimation x(t)
Statistical learning of biological networks: a brief overviewModelling approaches 22 / 30
Reverse Engineering of Biological Networks
GivenAn ODE model :
dx(t)dt
= f(x(t),u(t); θ)
A partially and noisy observation model:
y(t) = H(x(t),u(t); θ) + ε(t)
where H is a nonlinear observation function, ε(t) is a i.i.d noiseA sequence of observed data : y1:K = {y1, ...,yK} at timet1, t2, ..., tk
GoalStructure estimationParameters estimation θ
States estimation x(t)Statistical learning of biological networks: a brief overviewModelling approaches 22 / 30
Structure learning
Case 1: a very few variables involved, then a combinatorial searchfor structure can be processed. For each potential structure,estimation of parameters has to be carried onCase 2: more than a tens of variables are involved, then it is worthusing an algorithm dedicated to structure learning. Structurelearning in nonlinear dynamical models as well as in staticBayesian networks can be solved by a stochastic exploration ofthe candidates (huge) set using an appropriate criterion that takeinto account data and parameters estimation, given the candidatestructure. MCMC methods, evolutionary approaches are used.In the following, we assume that the network structure is given
Statistical learning of biological networks: a brief overviewModelling approaches 23 / 30
An example of Nonlinear State-Space Model
Continuous time ODE model
dx(t)dt
= f(x(t),u(t); θ)
y(t) = H(x(t),u(t); θ) + ε(t)
The system at discrete-time points t1, ..., tK
x(tk+1) = F(x(tk ),u; θ)
y(tk ) = H(x(tk ),u(tk ); θ) + ε(tk )
with
F(x(tk ),u; θ) = x(tk ) +
∫ tk+1
tkf(x(τ),u(τ); θ)dτ
Statistical learning of biological networks: a brief overviewModelling approaches 24 / 30
Bayesian inference
Given:Prior distribution over the initial state and parameters: p(x1,θ)
A state transition model: p(xk |xk−1,θ)
An observation model: p(yk |xk ,θ)
A sequence of observations: y1:K = {y1, ...,yK}
Estimating the posterior distributionsFocus on the filtering distribution: p(xk ,θ|y1:k )
Tool: Unscented Kalman Filter to deal with nonlinearities (Quachet al., 2007)
Statistical learning of biological networks: a brief overviewModelling approaches 25 / 30
Bayesian inference
Given:Prior distribution over the initial state and parameters: p(x1,θ)
A state transition model: p(xk |xk−1,θ)
An observation model: p(yk |xk ,θ)
A sequence of observations: y1:K = {y1, ...,yK}
Estimating the posterior distributionsFocus on the filtering distribution: p(xk ,θ|y1:k )
Tool: Unscented Kalman Filter to deal with nonlinearities (Quachet al., 2007)
Statistical learning of biological networks: a brief overviewModelling approaches 25 / 30
Example: the Repressilator
[Elowitz and Leibler,Nature 2000] dr1
dt= vmax
1kn
12kn
12 + pn2− kmRNA
1 r1
dr2
dt= vmax
2kn
23kn
23 + pn3− kmRNA
2 r2
dr3
dt= vmax
3kn
31kn
31 + pn1− kmRNA
3 r3
dp1
dt= k1r1 − kprotein
1 p1
dp2
dt= k2r2 − kprotein
2 p2
dp3
dt= k3r3 − kprotein
3 p3
mRNAs are observed, proteins are hiddenmRNA and protein degradation rate constants are supposed to beknownEstimate 9 parametersStatistical learning of biological networks: a brief overviewModelling approaches 26 / 30
Parameter Estimation
Statistical learning of biological networks: a brief overviewModelling approaches 27 / 30
Challenges in (dynamical) modelling approaches
Identifiability of dynamical modelsTheoretical results about sample complexityScaling to large networksNon stationnarityIncorporate other components : space, cellular compartments ...coupled systems : metabolic and regulatory networks,protein-protein interactions and regulatory networkMORE DATA : benchmark problems, challenges
Statistical learning of biological networks: a brief overviewModelling approaches 28 / 30
General conclusion and perspective
Different views of the learning problem, different scales, differentprior knowledgeSome of these methods could be linked to participate to the samediscovery processNeed for building data repository and demand for biologicalvalidation
Statistical learning of biological networks: a brief overview Conclusion 29 / 30
References
C. Auliac, V. Frouin, X. Gidrol, F. d’Alché-Buc, Evolutionary Approaches for theReverse-Engineering of Gene Regulatory Networks: A Study on a RealisticBiological Dataset, accepté à BMC Bioinformatics, à paraître en 2008.P. Geurts, N. Touleimat, M. Dutreix, F. d’Alché-Buc, Inferring biologicalnetworks with output kernel trees, BMC Bioinformatics, to appear, May 3,2007.Kato, K. Tsuda, EM based algorithm for kernel matrix completion,Bioinformatics, vol. 21, 2005.B.-E. Perrin, L. Ralaivola,A. Mazurie, S. Bottani, J. Mallet, F. d’Alché-Buc,Inference of gene regulatory network with Dynamic Bayesian Network,Bioinformatics (Oxford Press), vol. 19, pi38-49,2003.M. Quach, N.Brunel, F. d’Alché-Buc, Estimating parameters and hiddenvariables in nonlinear state-space models based on ODEs for biologicalnetworks inference, November, 23:3209-3216, 2007.Y. Yamanishi, Y., J.-P. Vert and Kanehisa, M. Supervised enzyme networkinference from the integration of genomic data and chemicalinformation,Bioinformatics, vol. 21,2005.
Statistical learning of biological networks: a brief overview Conclusion 30 / 30