14
Research Article Exact and Heuristic Methods for Network Completion for Time-Varying Genetic Networks Natsu Nakajima and Tatsuya Akutsu Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan Correspondence should be addressed to Tatsuya Akutsu; [email protected] Received 13 August 2013; Revised 9 January 2014; Accepted 22 January 2014; Published 9 March 2014 Academic Editor: Nasimul Noman Copyright © 2014 N. Nakajima and T. Akutsu. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Robustness in biological networks can be regarded as an important feature of living systems. A system maintains its functions against internal and external perturbations, leading to topological changes in the network with varying delays. To understand the flexibility of biological networks, we propose a novel approach to analyze time-dependent networks, based on the framework of network completion, which aims to make the minimum amount of modifications to a given network so that the resulting network is most consistent with the observed data. We have developed a novel network completion method for time-varying networks by extending our previous method for the completion of stationary networks. In particular, we introduce a double dynamic programming technique to identify change time points and required modifications. Although this extended method allows us to guarantee the optimality of the solution, this method has relatively low computational efficiency. In order to resolve this difficulty, we developed a heuristic method for speeding up the calculation of minimum least squares errors. We demonstrate the effectiveness of our proposed methods through computational experiments using synthetic data and real microarray gene expression data. e results indicate that our methods exhibit good performance in terms of completing and inferring gene association networks with time-varying structures. 1. Introduction Computational analysis of gene regulatory networks is an important topic in systems biology. A gene regulatory net- work is a collection of genes and their correlations and causal interactions. It is oſten represented as a directed graph in which the nodes correspond to genes and the edges correspond to regulatory relationships between two genes. Gene regulatory networks play important roles in cells. For example, gene regulatory networks maintain organisms through protein production, response to the external envi- ronment, and control of cell division processes. erefore, deciphering gene regulatory network structures is important for understanding cellular systems, which might also be use- ful for the prediction of adverse effects of new drugs and the detection of target genes for the development of new drugs. In order to infer gene regulatory networks, various kinds of data have been used, such as gene expression profiles (particularly mRNA expression profiles), CHromatin ImmunoPrecipita- tion (ChIP)-chip data for transcription binding information, DNA-protein interaction data, and protein-protein interac- tion data [13]. However, many existing studies have focused on the use of gene expression profiles, because expression data from a large number of genes can be simultaneously observed due to developments in DNA microarray technol- ogy [13]. Various mathematical models and computational methods have been applied and/or developed to infer gene regulatory networks from gene expression profiles, which include Boolean networks [4, 5], Bayesian networks [6, 7], dynamic Bayesian networks [8], differential equations [9, 10], and graphical Gaussian models [11]. In Boolean networks, the state of each gene is simplified into 0 or 1 and the gene regulation rules are given as Boolean functions, where 0 and 1 mean that a gene is active (in high expression) and inactive (in low expression), respectively. In the most widely used Boolean network model, it is assumed that the states of genes change synchronously according to discrete time steps. In Bayesian networks, the states of genes are usually classified into discrete values and the gene regulation rules are given Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 684014, 13 pages http://dx.doi.org/10.1155/2014/684014

Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

Research ArticleExact and Heuristic Methods for Network Completion forTime-Varying Genetic Networks

Natsu Nakajima and Tatsuya Akutsu

Bioinformatics Center Institute for Chemical Research Kyoto University Gokasho Uji Kyoto 611-0011 Japan

Correspondence should be addressed to Tatsuya Akutsu takutsukuicrkyoto-uacjp

Received 13 August 2013 Revised 9 January 2014 Accepted 22 January 2014 Published 9 March 2014

Academic Editor Nasimul Noman

Copyright copy 2014 N Nakajima and T Akutsu This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Robustness in biological networks can be regarded as an important feature of living systems A system maintains its functionsagainst internal and external perturbations leading to topological changes in the network with varying delays To understand theflexibility of biological networks we propose a novel approach to analyze time-dependent networks based on the framework ofnetwork completion which aims to make the minimum amount of modifications to a given network so that the resulting networkis most consistent with the observed data We have developed a novel network completion method for time-varying networksby extending our previous method for the completion of stationary networks In particular we introduce a double dynamicprogramming technique to identify change time points and required modifications Although this extended method allows us toguarantee the optimality of the solution this method has relatively low computational efficiency In order to resolve this difficultywe developed a heuristicmethod for speeding up the calculation ofminimum least squares errorsWe demonstrate the effectivenessof our proposed methods through computational experiments using synthetic data and real microarray gene expression data Theresults indicate that our methods exhibit good performance in terms of completing and inferring gene association networks withtime-varying structures

1 Introduction

Computational analysis of gene regulatory networks is animportant topic in systems biology A gene regulatory net-work is a collection of genes and their correlations andcausal interactions It is often represented as a directedgraph in which the nodes correspond to genes and theedges correspond to regulatory relationships between twogenes Gene regulatory networks play important roles in cellsFor example gene regulatory networks maintain organismsthrough protein production response to the external envi-ronment and control of cell division processes Thereforedeciphering gene regulatory network structures is importantfor understanding cellular systems which might also be use-ful for the prediction of adverse effects of new drugs and thedetection of target genes for the development of newdrugs Inorder to infer gene regulatory networks various kinds of datahave been used such as gene expression profiles (particularlymRNA expression profiles) CHromatin ImmunoPrecipita-tion (ChIP)-chip data for transcription binding information

DNA-protein interaction data and protein-protein interac-tion data [1ndash3] However many existing studies have focusedon the use of gene expression profiles because expressiondata from a large number of genes can be simultaneouslyobserved due to developments in DNA microarray technol-ogy [1ndash3] Various mathematical models and computationalmethods have been applied andor developed to infer generegulatory networks from gene expression profiles whichinclude Boolean networks [4 5] Bayesian networks [6 7]dynamic Bayesian networks [8] differential equations [9 10]and graphical Gaussian models [11] In Boolean networksthe state of each gene is simplified into 0 or 1 and the generegulation rules are given as Boolean functions where 0 and1 mean that a gene is active (in high expression) and inactive(in low expression) respectively In the most widely usedBoolean network model it is assumed that the states of geneschange synchronously according to discrete time steps InBayesian networks the states of genes are usually classifiedinto discrete values and the gene regulation rules are given

Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 684014 13 pageshttpdxdoiorg1011552014684014

2 BioMed Research International

in the form of conditional probabilities Although standardBayesian networks can only handle static data and acyclicnetworks dynamic Bayesian networks can handle time seriesdata and cyclic networks In differential equation models thedynamics of gene expression are represented by a set of linearor nonlinear equations (one equation per gene) In graphicalGaussianmodels partial correlations are used as ameasure ofindependence of any two genes by which direct interactionsare distinguished from indirect interactions For details ofthese models and methods see reviewcomparison papers[1ndash3]

These network models assume that the topology ofthe network does not change through time whereas thereal gene regulatory network in the cell might dynamicallychange its structure depending on time the effects of certainshocks and so forth Therefore many reverse engineeringtools have recently been proposed which can reconstructtime-varying biological networks based on time-series geneexpression data Yoshida et al [12] developed a dynamiclinear model with Markov switching that represents changepoints in regimes that evolve according to a first-orderMarkov process Fujita et al [13] proposed a method basedon the dynamic autoregressivemodelThismodel extends thevector autoregression (VAR) model which can be applied tothe inference of nonlinear time-dependent biological corre-lations such as dynamic gene regulatory networks Robinsonand Hartemink [14] proposed a model called a nonstationarydynamic Bayesian network based on dynamic Bayesiannetworks which allows inference from data generated bynonstationary processes in a time-dependent manner Lebreet al [15] also introduced the autoregressive time-varying(ARTIVA) algorithm for the analysis of time-varying networktopologies from time course data which is generated fromdifferent processes This model adopts a combination ofreversible jump Markov chain Monte Carlo (RJMCMC) anddynamic Bayesian networks (DBN) in which RJMCMC isused for the identification of change time points and theresulting networks and DBN is used to represent causalinteractions among genes Thorne and Stumpf [8] presenteda method to model the regulatory network structure betweendistinct segments with a set of hidden states by applyingthe hierarchical Dirichlet process hidden Markov model[16] including a potentially infinite number of states and aBayesian networkmodel for estimating relationships betweengenes Rassol and Bouaynaya [17] presented a new methodbased on constrained and smoothed Kalman filtering whichis capable of estimating time-varying networks from time-series data including unobserved and noisy measurementsThe dynamics of genetic modules are represented as a linear-state space equation and the observability of linear time-varying systems is defined by imposing sparse constraintsin Kalman filters Ahmed et al [18] proposed an algorithmcalled Tesla with machine learning which can be cast in theformof a convex optimization problemThebasic assumptionin this method is that networks at close time points do nothave significant topological differences but have commonedges with high probability in contrast networks at distanttime points are markedly different The regulatory networks

are represented by Markov random fields at arbitrary timeintervals

As mentioned above there have been many studiesand attempts to analyze both time-independent and time-dependent networks from time-series expression data how-ever gene regulatory systems in living organisms are socomplicated that any mathematical model has limitationsand there is not yet a standard or established method forinference even for time-independent networks One of thepossible reasons is that there exists an insufficient number ofhigh-quality time-series datasets to reconstruct the dynamicbehavior of the network In other words it is difficult toreveal a correct or nearly correct network based on a smallamount of data that includes some noise Hence in ourrecent study we proposed a new approach for the analysisof time-independent networks called network completion[19 20] in which the minimum amount of modificationsare made to a given network so that the resulting networkis most consistent with the observed data Similar conceptshave been independently proposed [21ndash24] In additionnetwork completion can be applied to inference of networksby starting with the null network

In this paper we present two novel methods for thecompletion and inference of time-varying networks usingdynamic programming and least squares fitting (DPLSQ)DPLSQ-TV (DPLSQ-TV was presented in a preliminary ver-sion of this paper [25] however in this paper more detailedcomputational experiments are performed andDPLSQ-HS isnewly introduced) and DPLSQ-HS where TV and HS standfor time varying and heuristics DPLSQ-TV is an extension ofDPLSQ [20] such that it can identify the time points at whichthe structure of the gene regulatory network changes Sincethe additions and deletions of edges are basic modificationsin network completion we need to extend DPLSQ so thatthese operations can be performed at several time points InDPLSQ-TV these edges and time points are identified by anovel double dynamic programming method in which theinner loop is used to identify static network structures and theouter loop is used to determine change points It is to be notedthat a single dynamic programming (DP)methodwas used inour previous work on the completion and inference of time-independent networks [20] whereas a double DP method isemployed here in order to cope with time-varying networksOur proposed methods also allow us to find an optimalsolution in polynomial time if the maximum indegree (iethe maximum number of input genes to a gene) is boundedby a constant Although DPLSQ-TV is guaranteed to findan optimal solution in polynomial time the degree of thepolynomial is not low which prevents themethod frombeingapplied to the completion of large networks Therefore wefurther propose a heuristic method called DPLSQ-HS tospeed up the calculation of the minimum least squares errorby applying restriction constraints that limit the number ofcombinations of incoming nodes

We evaluate the efficiency of our methods through com-putational experiments using synthetic data and microarraygene expression data from the life cycle of D melanogasterand the cell cycle of S cerevisiae We also demonstrate the

BioMed Research International 3

Expression

Gene1

Gene2

Gene3

Time1 2 3 4 5 6 7 8 9 10 11

V1

V2

V3V1

V2

V3V1

V2

V3

Inference

E0 E1 E2

c1 = 5 c2 = 9

12 = m

Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888

1= 5 and 119888

2= 9

effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]

2 Method

In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888

1minus 1] [119888

1 119888

2minus 1] [119888

119861 119898]

where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V

1 V

119899

be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864

119861be the sets of directed edges (ie the output)

where 119864119894denotes the edge set for the 119894th interval

Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888

119861) and complete the initial network 119866(119881 119864) by

adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864

0 1198641 119864

119861at the corresponding time intervals (see

Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network

21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V

119894are

determined by the following differential equation

119889119909119894

119889119905= 119886119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895119909119894119896+ 119887119894120596 (1)

where 119909119894corresponds to the expression value of node V

119894 120596

denotes random noise and V1198941 V

119894ℎare incoming nodes to

i1 i2ih

ai2

ai1

aih

ai12

i

middot middot middot

Figure 2 Dynamics model for a node

V119894 The second and third terms of the right-hand side of the

equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894

119895

or 119886119894

119895119896corresponds to an activation effect and a negative

value for 119886119894119895or 119886119894119895119896

corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead

In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows

119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895(119905) + sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905))

(2)

where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]

4 BioMed Research International

In our previous method DPLSQ [20] we assume thattime series data 119910

119894(119905)s which correspond to 119909

119894(119905)s in (2) are

given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910

119894(119905) from an expression value 119909

119894(119905) in

the mathematical model equation (2) Then the parameters119886119894

119895s and 119886

119894

119895119896s are estimated from these time series data by

minimizing the following objective function (ie the sum ofthe least squares errors) for each node V

119894

119898

sum

119905=1

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus[

[

119910119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(3)

It should be noted that 119910119894(119905) is the observed expression

value of gene V119894at time 119905 and V

1198941 V1198942 V

119894ℎare tentative

incoming nodes to node V119894 Incoming nodes to each node

are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ

22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889

[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881

and the set of initial edges 119864 are given Let the current setof incoming nodes to V

119894be V1198941 V

119894119889 We define the least

squares error for V119894during the time period between 119901 and 119902

as119878119894

V1198941 V1198942 V119894119889 [119901 119902]

= min1198861198940 119886119894119895119886119894119895119896

119902minus1

sum

119905=119901

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus [

[

119910119894(119905) + Δ119905

times (119886119894

0+

119889

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(4)

where 119910119894(119905) denotes the observed expression value of gene V

119894

at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this

minimum value can be computed by a standard least squaresfitting method

Because network completion is considered to involve theaddition of edges let 119890minus(V

119894) = V

1198951 V

119895119889 be the set of initial

incoming nodes to V119894 Let 120590+

119896119895 119895[119901 119902] denote the minimum

least squares error when adding 119896119895edges to the 119895th node

during the time from 119901 to 119902 which is formally defined as

120590+

119896119895 119895[119901 119902] = min

1198951 1198952119895119896119895

119878119895

119890minus(V119895)cupV1198951 V1198952 V119895119896119895

[119901 119902] (5)

where each V119895119897must be selected from119881minus V

119895minus 119890minus(V119895) In order

to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+

119896119895 119895[119901 119902] = +infin for

119896119895gt 119870 or 119896

119895+ |119890minus(V119895)| ge 119899

Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(6)

where 1198880= 1 and 119888

119861+1minus 1 = 119898

Here we define119863+[119896 119894 119901 119902] as

119863+[119896 119894 119901 119902] = min

1198961+1198962+sdotsdotsdot+119896119894=119896

119894

sum

119895=1

120590+

119896119895 119895[119901 119902]

(7)

The entries of 119863+[119896 119895 119901 119902] can be computed by the

following DP algorithm

119863+[119896 1 119901 119902] = 120590

+

1198961[119901 119902]

119863+[119896 119895 + 1 119901 119902] = min

1198961015840+11989610158401015840=119896

119863+[1198961015840 119895] + 120590

+

11989610158401015840119895+1

[119901 119902]

(8)

It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows

min1198961+1198962+sdotsdotsdot+119896119899=119896

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]

= min1198961015840+11989610158401015840=119896

min1198961+1198962+sdotsdotsdot+119896119899minus1=119896

1015840

119899minus1

sum

119895=1

120590+

119896119895 119895[119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

= min1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 minus 1 119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

(9)

Next we define 119864+[119896 119887 119902] as

119864+[119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(10)

BioMed Research International 5

where 119888119887+1

minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm

119864+[119896 0 119902] = 119863

+[119896 119899 1 119902]

119864+[119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 119901 119902] + 119864

+[11989610158401015840 119887 minus 1 119901]

(11)

The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]

The correctness of this DP algorithm can be seen asfollows

min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

= min119888119887minus1lt119888119887

1198961015840+11989610158401015840=119896

min1198881ltsdotsdotsdotlt119888119887minus1

1198960+sdotsdotsdot+119896

119887minus1=1198961015840

119887minus1

sum

119894=0

min1198961+sdotsdotsdot+119896119899=119896

119894

times[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

+ min1198961+sdotsdotsdot+119896119899=119896

10158401015840

[

[

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]]

]

= min119901lt119902

1198961015840+11989610158401015840=119896

119864+[1198961015840 119887 minus 1 119901] + 119863

+[11989610158401015840 119899 119901 119902]

(12)

23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here

Let 120590ℎ119895 119896119895 119895

[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896

119895edges

to 119890minus(V119895) and deleting ℎ

119895edges from 119890

minus(V119895) where the added

and deleted edges must be disjointed We constrain themaximum 119896

119895and ℎ

119895to the small constants 119870 and 119867 We let

120590ℎ119895 119896119895 119895

[119901 119902] = +infin if 119896119895gt 119870 ℎ

119895gt 119867 119896

119895minusℎ119895+|119890minus(V119895)| ge 119899

or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119861=ℎ

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(13)

Here we define119863[ℎ 119896 119894 119901 119902] as

119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896

ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ

119894

sum

119895=1

120590ℎ119895 119896119895 119895

[119901 119902]

(14)

As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by

119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895

[119901 119902]

119863 [ℎ 119896 119895 + 1 119901 119902]

= min1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590

ℎ1015840101584011989610158401015840119895+1

[119901 119902]

(15)

Next we define 119864[ℎ 119896 119887 119902] as

119864 [ℎ 119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119887=ℎ

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(16)

119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm

119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]

119864 [ℎ 119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ

10158401015840 11989610158401015840 119887 minus 1 119901]

(17)

24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges

First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901

2+ 1199013) time where 119898 is the

number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

Next we analyze the time complexity required for com-puting120590

ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590

ℎ119895 119896119895 119895

is 119874(119898119899119870+1

) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590

ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)

because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

2 BioMed Research International

in the form of conditional probabilities Although standardBayesian networks can only handle static data and acyclicnetworks dynamic Bayesian networks can handle time seriesdata and cyclic networks In differential equation models thedynamics of gene expression are represented by a set of linearor nonlinear equations (one equation per gene) In graphicalGaussianmodels partial correlations are used as ameasure ofindependence of any two genes by which direct interactionsare distinguished from indirect interactions For details ofthese models and methods see reviewcomparison papers[1ndash3]

These network models assume that the topology ofthe network does not change through time whereas thereal gene regulatory network in the cell might dynamicallychange its structure depending on time the effects of certainshocks and so forth Therefore many reverse engineeringtools have recently been proposed which can reconstructtime-varying biological networks based on time-series geneexpression data Yoshida et al [12] developed a dynamiclinear model with Markov switching that represents changepoints in regimes that evolve according to a first-orderMarkov process Fujita et al [13] proposed a method basedon the dynamic autoregressivemodelThismodel extends thevector autoregression (VAR) model which can be applied tothe inference of nonlinear time-dependent biological corre-lations such as dynamic gene regulatory networks Robinsonand Hartemink [14] proposed a model called a nonstationarydynamic Bayesian network based on dynamic Bayesiannetworks which allows inference from data generated bynonstationary processes in a time-dependent manner Lebreet al [15] also introduced the autoregressive time-varying(ARTIVA) algorithm for the analysis of time-varying networktopologies from time course data which is generated fromdifferent processes This model adopts a combination ofreversible jump Markov chain Monte Carlo (RJMCMC) anddynamic Bayesian networks (DBN) in which RJMCMC isused for the identification of change time points and theresulting networks and DBN is used to represent causalinteractions among genes Thorne and Stumpf [8] presenteda method to model the regulatory network structure betweendistinct segments with a set of hidden states by applyingthe hierarchical Dirichlet process hidden Markov model[16] including a potentially infinite number of states and aBayesian networkmodel for estimating relationships betweengenes Rassol and Bouaynaya [17] presented a new methodbased on constrained and smoothed Kalman filtering whichis capable of estimating time-varying networks from time-series data including unobserved and noisy measurementsThe dynamics of genetic modules are represented as a linear-state space equation and the observability of linear time-varying systems is defined by imposing sparse constraintsin Kalman filters Ahmed et al [18] proposed an algorithmcalled Tesla with machine learning which can be cast in theformof a convex optimization problemThebasic assumptionin this method is that networks at close time points do nothave significant topological differences but have commonedges with high probability in contrast networks at distanttime points are markedly different The regulatory networks

are represented by Markov random fields at arbitrary timeintervals

As mentioned above there have been many studiesand attempts to analyze both time-independent and time-dependent networks from time-series expression data how-ever gene regulatory systems in living organisms are socomplicated that any mathematical model has limitationsand there is not yet a standard or established method forinference even for time-independent networks One of thepossible reasons is that there exists an insufficient number ofhigh-quality time-series datasets to reconstruct the dynamicbehavior of the network In other words it is difficult toreveal a correct or nearly correct network based on a smallamount of data that includes some noise Hence in ourrecent study we proposed a new approach for the analysisof time-independent networks called network completion[19 20] in which the minimum amount of modificationsare made to a given network so that the resulting networkis most consistent with the observed data Similar conceptshave been independently proposed [21ndash24] In additionnetwork completion can be applied to inference of networksby starting with the null network

In this paper we present two novel methods for thecompletion and inference of time-varying networks usingdynamic programming and least squares fitting (DPLSQ)DPLSQ-TV (DPLSQ-TV was presented in a preliminary ver-sion of this paper [25] however in this paper more detailedcomputational experiments are performed andDPLSQ-HS isnewly introduced) and DPLSQ-HS where TV and HS standfor time varying and heuristics DPLSQ-TV is an extension ofDPLSQ [20] such that it can identify the time points at whichthe structure of the gene regulatory network changes Sincethe additions and deletions of edges are basic modificationsin network completion we need to extend DPLSQ so thatthese operations can be performed at several time points InDPLSQ-TV these edges and time points are identified by anovel double dynamic programming method in which theinner loop is used to identify static network structures and theouter loop is used to determine change points It is to be notedthat a single dynamic programming (DP)methodwas used inour previous work on the completion and inference of time-independent networks [20] whereas a double DP method isemployed here in order to cope with time-varying networksOur proposed methods also allow us to find an optimalsolution in polynomial time if the maximum indegree (iethe maximum number of input genes to a gene) is boundedby a constant Although DPLSQ-TV is guaranteed to findan optimal solution in polynomial time the degree of thepolynomial is not low which prevents themethod frombeingapplied to the completion of large networks Therefore wefurther propose a heuristic method called DPLSQ-HS tospeed up the calculation of the minimum least squares errorby applying restriction constraints that limit the number ofcombinations of incoming nodes

We evaluate the efficiency of our methods through com-putational experiments using synthetic data and microarraygene expression data from the life cycle of D melanogasterand the cell cycle of S cerevisiae We also demonstrate the

BioMed Research International 3

Expression

Gene1

Gene2

Gene3

Time1 2 3 4 5 6 7 8 9 10 11

V1

V2

V3V1

V2

V3V1

V2

V3

Inference

E0 E1 E2

c1 = 5 c2 = 9

12 = m

Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888

1= 5 and 119888

2= 9

effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]

2 Method

In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888

1minus 1] [119888

1 119888

2minus 1] [119888

119861 119898]

where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V

1 V

119899

be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864

119861be the sets of directed edges (ie the output)

where 119864119894denotes the edge set for the 119894th interval

Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888

119861) and complete the initial network 119866(119881 119864) by

adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864

0 1198641 119864

119861at the corresponding time intervals (see

Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network

21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V

119894are

determined by the following differential equation

119889119909119894

119889119905= 119886119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895119909119894119896+ 119887119894120596 (1)

where 119909119894corresponds to the expression value of node V

119894 120596

denotes random noise and V1198941 V

119894ℎare incoming nodes to

i1 i2ih

ai2

ai1

aih

ai12

i

middot middot middot

Figure 2 Dynamics model for a node

V119894 The second and third terms of the right-hand side of the

equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894

119895

or 119886119894

119895119896corresponds to an activation effect and a negative

value for 119886119894119895or 119886119894119895119896

corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead

In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows

119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895(119905) + sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905))

(2)

where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]

4 BioMed Research International

In our previous method DPLSQ [20] we assume thattime series data 119910

119894(119905)s which correspond to 119909

119894(119905)s in (2) are

given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910

119894(119905) from an expression value 119909

119894(119905) in

the mathematical model equation (2) Then the parameters119886119894

119895s and 119886

119894

119895119896s are estimated from these time series data by

minimizing the following objective function (ie the sum ofthe least squares errors) for each node V

119894

119898

sum

119905=1

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus[

[

119910119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(3)

It should be noted that 119910119894(119905) is the observed expression

value of gene V119894at time 119905 and V

1198941 V1198942 V

119894ℎare tentative

incoming nodes to node V119894 Incoming nodes to each node

are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ

22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889

[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881

and the set of initial edges 119864 are given Let the current setof incoming nodes to V

119894be V1198941 V

119894119889 We define the least

squares error for V119894during the time period between 119901 and 119902

as119878119894

V1198941 V1198942 V119894119889 [119901 119902]

= min1198861198940 119886119894119895119886119894119895119896

119902minus1

sum

119905=119901

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus [

[

119910119894(119905) + Δ119905

times (119886119894

0+

119889

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(4)

where 119910119894(119905) denotes the observed expression value of gene V

119894

at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this

minimum value can be computed by a standard least squaresfitting method

Because network completion is considered to involve theaddition of edges let 119890minus(V

119894) = V

1198951 V

119895119889 be the set of initial

incoming nodes to V119894 Let 120590+

119896119895 119895[119901 119902] denote the minimum

least squares error when adding 119896119895edges to the 119895th node

during the time from 119901 to 119902 which is formally defined as

120590+

119896119895 119895[119901 119902] = min

1198951 1198952119895119896119895

119878119895

119890minus(V119895)cupV1198951 V1198952 V119895119896119895

[119901 119902] (5)

where each V119895119897must be selected from119881minus V

119895minus 119890minus(V119895) In order

to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+

119896119895 119895[119901 119902] = +infin for

119896119895gt 119870 or 119896

119895+ |119890minus(V119895)| ge 119899

Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(6)

where 1198880= 1 and 119888

119861+1minus 1 = 119898

Here we define119863+[119896 119894 119901 119902] as

119863+[119896 119894 119901 119902] = min

1198961+1198962+sdotsdotsdot+119896119894=119896

119894

sum

119895=1

120590+

119896119895 119895[119901 119902]

(7)

The entries of 119863+[119896 119895 119901 119902] can be computed by the

following DP algorithm

119863+[119896 1 119901 119902] = 120590

+

1198961[119901 119902]

119863+[119896 119895 + 1 119901 119902] = min

1198961015840+11989610158401015840=119896

119863+[1198961015840 119895] + 120590

+

11989610158401015840119895+1

[119901 119902]

(8)

It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows

min1198961+1198962+sdotsdotsdot+119896119899=119896

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]

= min1198961015840+11989610158401015840=119896

min1198961+1198962+sdotsdotsdot+119896119899minus1=119896

1015840

119899minus1

sum

119895=1

120590+

119896119895 119895[119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

= min1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 minus 1 119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

(9)

Next we define 119864+[119896 119887 119902] as

119864+[119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(10)

BioMed Research International 5

where 119888119887+1

minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm

119864+[119896 0 119902] = 119863

+[119896 119899 1 119902]

119864+[119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 119901 119902] + 119864

+[11989610158401015840 119887 minus 1 119901]

(11)

The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]

The correctness of this DP algorithm can be seen asfollows

min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

= min119888119887minus1lt119888119887

1198961015840+11989610158401015840=119896

min1198881ltsdotsdotsdotlt119888119887minus1

1198960+sdotsdotsdot+119896

119887minus1=1198961015840

119887minus1

sum

119894=0

min1198961+sdotsdotsdot+119896119899=119896

119894

times[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

+ min1198961+sdotsdotsdot+119896119899=119896

10158401015840

[

[

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]]

]

= min119901lt119902

1198961015840+11989610158401015840=119896

119864+[1198961015840 119887 minus 1 119901] + 119863

+[11989610158401015840 119899 119901 119902]

(12)

23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here

Let 120590ℎ119895 119896119895 119895

[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896

119895edges

to 119890minus(V119895) and deleting ℎ

119895edges from 119890

minus(V119895) where the added

and deleted edges must be disjointed We constrain themaximum 119896

119895and ℎ

119895to the small constants 119870 and 119867 We let

120590ℎ119895 119896119895 119895

[119901 119902] = +infin if 119896119895gt 119870 ℎ

119895gt 119867 119896

119895minusℎ119895+|119890minus(V119895)| ge 119899

or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119861=ℎ

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(13)

Here we define119863[ℎ 119896 119894 119901 119902] as

119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896

ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ

119894

sum

119895=1

120590ℎ119895 119896119895 119895

[119901 119902]

(14)

As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by

119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895

[119901 119902]

119863 [ℎ 119896 119895 + 1 119901 119902]

= min1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590

ℎ1015840101584011989610158401015840119895+1

[119901 119902]

(15)

Next we define 119864[ℎ 119896 119887 119902] as

119864 [ℎ 119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119887=ℎ

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(16)

119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm

119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]

119864 [ℎ 119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ

10158401015840 11989610158401015840 119887 minus 1 119901]

(17)

24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges

First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901

2+ 1199013) time where 119898 is the

number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

Next we analyze the time complexity required for com-puting120590

ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590

ℎ119895 119896119895 119895

is 119874(119898119899119870+1

) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590

ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)

because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 3

Expression

Gene1

Gene2

Gene3

Time1 2 3 4 5 6 7 8 9 10 11

V1

V2

V3V1

V2

V3V1

V2

V3

Inference

E0 E1 E2

c1 = 5 c2 = 9

12 = m

Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888

1= 5 and 119888

2= 9

effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]

2 Method

In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888

1minus 1] [119888

1 119888

2minus 1] [119888

119861 119898]

where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V

1 V

119899

be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864

119861be the sets of directed edges (ie the output)

where 119864119894denotes the edge set for the 119894th interval

Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888

119861) and complete the initial network 119866(119881 119864) by

adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864

0 1198641 119864

119861at the corresponding time intervals (see

Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network

21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V

119894are

determined by the following differential equation

119889119909119894

119889119905= 119886119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895119909119894119896+ 119887119894120596 (1)

where 119909119894corresponds to the expression value of node V

119894 120596

denotes random noise and V1198941 V

119894ℎare incoming nodes to

i1 i2ih

ai2

ai1

aih

ai12

i

middot middot middot

Figure 2 Dynamics model for a node

V119894 The second and third terms of the right-hand side of the

equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894

119895

or 119886119894

119895119896corresponds to an activation effect and a negative

value for 119886119894119895or 119886119894119895119896

corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead

In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows

119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895(119905) + sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905))

(2)

where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]

4 BioMed Research International

In our previous method DPLSQ [20] we assume thattime series data 119910

119894(119905)s which correspond to 119909

119894(119905)s in (2) are

given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910

119894(119905) from an expression value 119909

119894(119905) in

the mathematical model equation (2) Then the parameters119886119894

119895s and 119886

119894

119895119896s are estimated from these time series data by

minimizing the following objective function (ie the sum ofthe least squares errors) for each node V

119894

119898

sum

119905=1

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus[

[

119910119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(3)

It should be noted that 119910119894(119905) is the observed expression

value of gene V119894at time 119905 and V

1198941 V1198942 V

119894ℎare tentative

incoming nodes to node V119894 Incoming nodes to each node

are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ

22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889

[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881

and the set of initial edges 119864 are given Let the current setof incoming nodes to V

119894be V1198941 V

119894119889 We define the least

squares error for V119894during the time period between 119901 and 119902

as119878119894

V1198941 V1198942 V119894119889 [119901 119902]

= min1198861198940 119886119894119895119886119894119895119896

119902minus1

sum

119905=119901

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus [

[

119910119894(119905) + Δ119905

times (119886119894

0+

119889

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(4)

where 119910119894(119905) denotes the observed expression value of gene V

119894

at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this

minimum value can be computed by a standard least squaresfitting method

Because network completion is considered to involve theaddition of edges let 119890minus(V

119894) = V

1198951 V

119895119889 be the set of initial

incoming nodes to V119894 Let 120590+

119896119895 119895[119901 119902] denote the minimum

least squares error when adding 119896119895edges to the 119895th node

during the time from 119901 to 119902 which is formally defined as

120590+

119896119895 119895[119901 119902] = min

1198951 1198952119895119896119895

119878119895

119890minus(V119895)cupV1198951 V1198952 V119895119896119895

[119901 119902] (5)

where each V119895119897must be selected from119881minus V

119895minus 119890minus(V119895) In order

to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+

119896119895 119895[119901 119902] = +infin for

119896119895gt 119870 or 119896

119895+ |119890minus(V119895)| ge 119899

Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(6)

where 1198880= 1 and 119888

119861+1minus 1 = 119898

Here we define119863+[119896 119894 119901 119902] as

119863+[119896 119894 119901 119902] = min

1198961+1198962+sdotsdotsdot+119896119894=119896

119894

sum

119895=1

120590+

119896119895 119895[119901 119902]

(7)

The entries of 119863+[119896 119895 119901 119902] can be computed by the

following DP algorithm

119863+[119896 1 119901 119902] = 120590

+

1198961[119901 119902]

119863+[119896 119895 + 1 119901 119902] = min

1198961015840+11989610158401015840=119896

119863+[1198961015840 119895] + 120590

+

11989610158401015840119895+1

[119901 119902]

(8)

It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows

min1198961+1198962+sdotsdotsdot+119896119899=119896

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]

= min1198961015840+11989610158401015840=119896

min1198961+1198962+sdotsdotsdot+119896119899minus1=119896

1015840

119899minus1

sum

119895=1

120590+

119896119895 119895[119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

= min1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 minus 1 119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

(9)

Next we define 119864+[119896 119887 119902] as

119864+[119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(10)

BioMed Research International 5

where 119888119887+1

minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm

119864+[119896 0 119902] = 119863

+[119896 119899 1 119902]

119864+[119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 119901 119902] + 119864

+[11989610158401015840 119887 minus 1 119901]

(11)

The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]

The correctness of this DP algorithm can be seen asfollows

min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

= min119888119887minus1lt119888119887

1198961015840+11989610158401015840=119896

min1198881ltsdotsdotsdotlt119888119887minus1

1198960+sdotsdotsdot+119896

119887minus1=1198961015840

119887minus1

sum

119894=0

min1198961+sdotsdotsdot+119896119899=119896

119894

times[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

+ min1198961+sdotsdotsdot+119896119899=119896

10158401015840

[

[

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]]

]

= min119901lt119902

1198961015840+11989610158401015840=119896

119864+[1198961015840 119887 minus 1 119901] + 119863

+[11989610158401015840 119899 119901 119902]

(12)

23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here

Let 120590ℎ119895 119896119895 119895

[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896

119895edges

to 119890minus(V119895) and deleting ℎ

119895edges from 119890

minus(V119895) where the added

and deleted edges must be disjointed We constrain themaximum 119896

119895and ℎ

119895to the small constants 119870 and 119867 We let

120590ℎ119895 119896119895 119895

[119901 119902] = +infin if 119896119895gt 119870 ℎ

119895gt 119867 119896

119895minusℎ119895+|119890minus(V119895)| ge 119899

or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119861=ℎ

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(13)

Here we define119863[ℎ 119896 119894 119901 119902] as

119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896

ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ

119894

sum

119895=1

120590ℎ119895 119896119895 119895

[119901 119902]

(14)

As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by

119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895

[119901 119902]

119863 [ℎ 119896 119895 + 1 119901 119902]

= min1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590

ℎ1015840101584011989610158401015840119895+1

[119901 119902]

(15)

Next we define 119864[ℎ 119896 119887 119902] as

119864 [ℎ 119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119887=ℎ

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(16)

119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm

119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]

119864 [ℎ 119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ

10158401015840 11989610158401015840 119887 minus 1 119901]

(17)

24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges

First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901

2+ 1199013) time where 119898 is the

number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

Next we analyze the time complexity required for com-puting120590

ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590

ℎ119895 119896119895 119895

is 119874(119898119899119870+1

) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590

ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)

because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

4 BioMed Research International

In our previous method DPLSQ [20] we assume thattime series data 119910

119894(119905)s which correspond to 119909

119894(119905)s in (2) are

given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910

119894(119905) from an expression value 119909

119894(119905) in

the mathematical model equation (2) Then the parameters119886119894

119895s and 119886

119894

119895119896s are estimated from these time series data by

minimizing the following objective function (ie the sum ofthe least squares errors) for each node V

119894

119898

sum

119905=1

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus[

[

119910119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(3)

It should be noted that 119910119894(119905) is the observed expression

value of gene V119894at time 119905 and V

1198941 V1198942 V

119894ℎare tentative

incoming nodes to node V119894 Incoming nodes to each node

are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ

22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889

[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881

and the set of initial edges 119864 are given Let the current setof incoming nodes to V

119894be V1198941 V

119894119889 We define the least

squares error for V119894during the time period between 119901 and 119902

as119878119894

V1198941 V1198942 V119894119889 [119901 119902]

= min1198861198940 119886119894119895119886119894119895119896

119902minus1

sum

119905=119901

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

119910119894(119905 + 1)

minus [

[

119910119894(119905) + Δ119905

times (119886119894

0+

119889

sum

119895=1

119886119894

119895119910119894119895(119905) +sum

119895lt119896

119886119894

119895119896119910119894119895(119905) 119910119894119896(119905))]

]

100381610038161003816100381610038161003816100381610038161003816100381610038161003816

2

(4)

where 119910119894(119905) denotes the observed expression value of gene V

119894

at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this

minimum value can be computed by a standard least squaresfitting method

Because network completion is considered to involve theaddition of edges let 119890minus(V

119894) = V

1198951 V

119895119889 be the set of initial

incoming nodes to V119894 Let 120590+

119896119895 119895[119901 119902] denote the minimum

least squares error when adding 119896119895edges to the 119895th node

during the time from 119901 to 119902 which is formally defined as

120590+

119896119895 119895[119901 119902] = min

1198951 1198952119895119896119895

119878119895

119890minus(V119895)cupV1198951 V1198952 V119895119896119895

[119901 119902] (5)

where each V119895119897must be selected from119881minus V

119895minus 119890minus(V119895) In order

to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+

119896119895 119895[119901 119902] = +infin for

119896119895gt 119870 or 119896

119895+ |119890minus(V119895)| ge 119899

Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(6)

where 1198880= 1 and 119888

119861+1minus 1 = 119898

Here we define119863+[119896 119894 119901 119902] as

119863+[119896 119894 119901 119902] = min

1198961+1198962+sdotsdotsdot+119896119894=119896

119894

sum

119895=1

120590+

119896119895 119895[119901 119902]

(7)

The entries of 119863+[119896 119895 119901 119902] can be computed by the

following DP algorithm

119863+[119896 1 119901 119902] = 120590

+

1198961[119901 119902]

119863+[119896 119895 + 1 119901 119902] = min

1198961015840+11989610158401015840=119896

119863+[1198961015840 119895] + 120590

+

11989610158401015840119895+1

[119901 119902]

(8)

It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows

min1198961+1198962+sdotsdotsdot+119896119899=119896

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]

= min1198961015840+11989610158401015840=119896

min1198961+1198962+sdotsdotsdot+119896119899minus1=119896

1015840

119899minus1

sum

119895=1

120590+

119896119895 119895[119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

= min1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 minus 1 119901 119902] + 120590

+

11989610158401015840119899[119901 119902]

(9)

Next we define 119864+[119896 119887 119902] as

119864+[119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

(10)

BioMed Research International 5

where 119888119887+1

minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm

119864+[119896 0 119902] = 119863

+[119896 119899 1 119902]

119864+[119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 119901 119902] + 119864

+[11989610158401015840 119887 minus 1 119901]

(11)

The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]

The correctness of this DP algorithm can be seen asfollows

min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

= min119888119887minus1lt119888119887

1198961015840+11989610158401015840=119896

min1198881ltsdotsdotsdotlt119888119887minus1

1198960+sdotsdotsdot+119896

119887minus1=1198961015840

119887minus1

sum

119894=0

min1198961+sdotsdotsdot+119896119899=119896

119894

times[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

+ min1198961+sdotsdotsdot+119896119899=119896

10158401015840

[

[

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]]

]

= min119901lt119902

1198961015840+11989610158401015840=119896

119864+[1198961015840 119887 minus 1 119901] + 119863

+[11989610158401015840 119899 119901 119902]

(12)

23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here

Let 120590ℎ119895 119896119895 119895

[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896

119895edges

to 119890minus(V119895) and deleting ℎ

119895edges from 119890

minus(V119895) where the added

and deleted edges must be disjointed We constrain themaximum 119896

119895and ℎ

119895to the small constants 119870 and 119867 We let

120590ℎ119895 119896119895 119895

[119901 119902] = +infin if 119896119895gt 119870 ℎ

119895gt 119867 119896

119895minusℎ119895+|119890minus(V119895)| ge 119899

or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119861=ℎ

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(13)

Here we define119863[ℎ 119896 119894 119901 119902] as

119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896

ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ

119894

sum

119895=1

120590ℎ119895 119896119895 119895

[119901 119902]

(14)

As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by

119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895

[119901 119902]

119863 [ℎ 119896 119895 + 1 119901 119902]

= min1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590

ℎ1015840101584011989610158401015840119895+1

[119901 119902]

(15)

Next we define 119864[ℎ 119896 119887 119902] as

119864 [ℎ 119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119887=ℎ

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(16)

119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm

119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]

119864 [ℎ 119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ

10158401015840 11989610158401015840 119887 minus 1 119901]

(17)

24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges

First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901

2+ 1199013) time where 119898 is the

number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

Next we analyze the time complexity required for com-puting120590

ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590

ℎ119895 119896119895 119895

is 119874(119898119899119870+1

) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590

ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)

because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 5

where 119888119887+1

minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm

119864+[119896 0 119902] = 119863

+[119896 119899 1 119902]

119864+[119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

119863+[1198961015840 119899 119901 119902] + 119864

+[11989610158401015840 119887 minus 1 119901]

(11)

The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]

The correctness of this DP algorithm can be seen asfollows

min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

= min119888119887minus1lt119888119887

1198961015840+11989610158401015840=119896

min1198881ltsdotsdotsdotlt119888119887minus1

1198960+sdotsdotsdot+119896

119887minus1=1198961015840

119887minus1

sum

119894=0

min1198961+sdotsdotsdot+119896119899=119896

119894

times[

[

119899

sum

119895=1

120590+

119896119895 119895[119888119894 119888119894+1

minus 1]]

]

+ min1198961+sdotsdotsdot+119896119899=119896

10158401015840

[

[

119899

sum

119895=1

120590+

119896119895 119895[119901 119902]]

]

= min119901lt119902

1198961015840+11989610158401015840=119896

119864+[1198961015840 119887 minus 1 119901] + 119863

+[11989610158401015840 119899 119901 119902]

(12)

23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here

Let 120590ℎ119895 119896119895 119895

[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896

119895edges

to 119890minus(V119895) and deleting ℎ

119895edges from 119890

minus(V119895) where the added

and deleted edges must be disjointed We constrain themaximum 119896

119895and ℎ

119895to the small constants 119870 and 119867 We let

120590ℎ119895 119896119895 119895

[119901 119902] = +infin if 119896119895gt 119870 ℎ

119895gt 119867 119896

119895minusℎ119895+|119890minus(V119895)| ge 119899

or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as

min1198881lt1198882ltsdotsdotsdotlt119888119861

1198960+1198961+sdotsdotsdot+119896

119861=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119861=ℎ

119861

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(13)

Here we define119863[ℎ 119896 119894 119901 119902] as

119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896

ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ

119894

sum

119895=1

120590ℎ119895 119896119895 119895

[119901 119902]

(14)

As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by

119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895

[119901 119902]

119863 [ℎ 119896 119895 + 1 119901 119902]

= min1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590

ℎ1015840101584011989610158401015840119895+1

[119901 119902]

(15)

Next we define 119864[ℎ 119896 119887 119902] as

119864 [ℎ 119896 119887 119902]

= min1198881lt1198882ltsdotsdotsdotlt119888119887

1198960+1198961+sdotsdotsdot+119896

119887=119896

ℎ0+ℎ1+sdotsdotsdot+ℎ

119887=ℎ

119887

sum

119894=0

min1198961+1198962+sdotsdotsdot+119896119899=119896

119894

ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894

[

[

119899

sum

119895=1

120590ℎ119895 119896119895 119895

[119888119894 119888119894+1

minus 1]]

]

(16)

119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm

119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]

119864 [ℎ 119896 119887 119902]

= min119901lt119902

1198961015840+11989610158401015840=119896

ℎ1015840+ℎ10158401015840=ℎ

119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ

10158401015840 11989610158401015840 119887 minus 1 119901]

(17)

24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges

First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901

2+ 1199013) time where 119898 is the

number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

Next we analyze the time complexity required for com-puting120590

ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590

ℎ119895 119896119895 119895

is 119874(119898119899119870+1

) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590

ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)

because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

6 BioMed Research International

size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)

Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899

2) where we assume that 119861 is a constant Since the

number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is

119874(1198983119899119870+1

+ 11989821198993) (18)

It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898

3119899119870+1

+ 11989821198993) Although this complexity is not small

it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2

3 Heuristic Method

Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33

31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V

119894 we maintain119872 combinations of 119896 incoming

nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the

set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V

1198941 V

119894119896minus1 isin 1198781

119894cup1198782

119894cupsdot sdot sdotcup 119878

119896minus1

119894

where 1198941lt 1198942lt sdot sdot sdot lt 119894

119896minus1 we calculate the least squares error

for each V119895such that 119895 gt 119894

119896minus1is the 119896th incoming node to V

119894

The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878

119896

119894

32 Algorithm The following is the description of the algo-rithm to compute 120590

+

119896119894[119901 119902] in DPLSQ-HS where 120590

+

119896119894[119901 119902]

does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31

Step 1 For each period [119901 119902] repeat Steps 2ndash6

Step 2 Let 1198780119894= 0 for all 119894 = 1 119899

Step 3 For 119894 = 1 to 119899 do Steps 4ndash7

Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870

Step 5 For each combination V1198941 V

119894119896minus1 isin 119878

1

119894cup 1198782

119894cup

sdot sdot sdot cup 119878119896minus1

119894and each node V

119895such that 119895 gt 119894

119896minus1(119895 gt 0 if

119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V

119894119896minus1 V119894) (V119895 V119894)

Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896

119894

Step 7 Let 120590+

119896119894[119901 119902] be the minimum least squares error

among these top119872 combinations

The other parts of the algorithm are the same as in DPLSQ-TV

33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges

In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)

As for the time complexity of computing 120590ℎ119895 119896119895 119895

[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is

well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590

ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since

the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590

ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because

119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-

puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 7

k = 1 k = 2 k = 3

i i i

S1i

S1i S

1iS

2i

S2i

S3i

Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote

the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V

119894119896minus1 isin 1198781

119894cup 1198782

119894cup sdot sdot sdot cup 119878

119896minus1

119894 we calculate

the least squares error for each V119895such that 119895 gt 119894

119896minus1as a 119896th incoming node to V

119894

119874(11989821198993) as described in Section 24 Hence the total time

complexity of DPLSQ-HS is

119874(1198983119899 log 119899 + 119898

21198993) (19)

If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898

3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898

3119899119870+1

)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice

4 Results

We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod

41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866

1 1198662 and 119866

3generated by randomly adding and

deleting edges from the initial network In this method foreach node V

119894with ℎ input nodes we considered the following

model119909119894(119905 + 1)

= 119909119894(119905) + Δ119905(119886

119894

0+

sum

119895=1

119886119894

119895119909119894119895+ sum

119895lt119896

119886119894

119895119896119909119894119895(119905) 119909119894119896(119905) + 119887

119894120596)

(20)

where 119886119894

119895s and 119886

119894

119895119896s are constants selected uniformly at

random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886

119894

119895119896s is smaller than that for

119886119894

119895s is that nonlinear terms are not considered as strong as

linear terms It should also be noted that 119887119894120596 is a stochastic

term where 119887119894is a constant (we used 119887

119894= 005) and 120596 is

random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910

119894(119905) we used

119910119894(119905) = 119909

119894(119905) + 119900

119894120598 (21)

where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]

As for the time series data we generated an originaldataset with 30 time points including two change points 119888

1=

10 1198882= 20 which is generated by merging three datasets for

1198661 1198662 and 119866

3 Since the use of time series data beginning

from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896

0edges and deleting ℎ

0edges per node resulting in

1198661 1198662 and 119866

3 additionally we also conducted DPLSQ-HS

experiments in which the initial network 119866 was modified byrandomly adding 119896

0edges per node using the default values

of 1198960= ℎ0= 1 We evaluated the performance of this method

by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

8 BioMed Research International

MMP-3

WNT5A

HADHB

RET-1

S100P

Synuclein

Pirin

MART-1

RHO-C

STC2

Figure 4 Structure of WNT5A network [26]

Table 1 Result on completion with synthetic data

(a) Using DPLSQ-TV

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 160Accuracy 0830 0685 0554 0433

CPU time (sec) 53974 58574 50499 61189

119899 = 20

Time points error 000 000 000 210Accuracy 0781 0637 0541 0299

CPU time (sec) 75898 85391 142903 124400

119899 = 30

Time points error 000 000 035 400Accuracy 0539 0524 0418 0310

CPU time (sec) 264190 244480 234377 276081

119899 = 40

Time points error 000 000 035 350Accuracy 0585 0498 0458 0241

CPU time (sec) 342065 333398 317420 337274

(b) Using DPLSQ-HS

Observation error level01 03 05 07

119899 = 10

Time points error 000 000 000 685Accuracy 0627 0600 0533 0307

CPU time (sec) 32783 32030 35594 30765

119899 = 20

Time points error 000 000 000 870Accuracy 0488 0473 0413 0153

CPU time (sec) 52129 51348 56723 44699

119899 = 30

time points error 000 000 425 880Accuracy 0469 0386 0286 0097

CPU time (sec) 76852 86844 76313 74653

119899 = 40

Time points error 000 000 085 865Accuracy 0510 0413 0308 0093

CPU time (sec) 76360 98398 97657 102813

119899 = 60

Time points error 000 000 000 075Accuracy 0411 0375 0382 0355

CPU time (sec) 371635 395596 372192 391110

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 9

DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866

11198662 and119866

3were obtained by addition and

deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1

The accuracy is defined as follows

ℎ + 119896 + sum119861

119894=0(10038161003816100381610038161003816119864119894cap 1198641015840

119894

10038161003816100381610038161003816minus1003816100381610038161003816119864119894

1003816100381610038161003816)

ℎ + 119896 (22)

where 119864119894and 119864

1015840

119894(119894 = 0 1 119861) are respectively the sets of

edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861

119894=0(|119864119894| minus

|119864119894cap 1198641015840

119894|) corresponds to the number of false positives and

ℎ + 119896 + sum119861

119894=0(|119864119894cap 1198641015840

119894| minus |119864

119894|) corresponds to the number of

true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as

1

119861

119861

sum

119894=1

10038161003816100381610038161003816119888119894minus 1198881015840

119894

10038161003816100381610038161003816 (23)

where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As

for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1

It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small

It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points

Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The

variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900

119894= 05 respectively The variances for DPLSQ-

HS were 001188 and 000732 for 119900119894= 03 and 119900

119894= 05

respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability

In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting

We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points

42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae

The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

10 BioMed Research International

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 01

(a)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 03

(b)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 05

(c)

0

005

01

015

02

025

5 10 15 20 25 30

Poste

rior p

roba

bilit

y

Time point

Observation error level = 07

(d)

Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data

YOX1CLN1

MCM1

CDC28

SWI4

SWI6CLN3

MBP1

CLB5

YHP1

Figure 6 Structure of part of the yeast cell cycle network

the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively

The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67

time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle

We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points

In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200

datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 11

Table 2 Result on inference of change points in S cerevisiae data

119888119894(correct answer) 119888

1015840

119894(DPLSQ-TV) 119888

1015840

119894(DPLSQ-HS) ARTIVA

119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60

CPU time (sec) 1414718 90880 mdash

Table 3 Result on inference of change points in D melanogasterdata

119888119894

(correct answer)1198881015840

119894

(DPLSQ-TV)1198881015840

119894

(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42

CPU time (sec) 12156082 262096

dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset

The results are shown in Tables 2 and 3 119888119894s are the

values of the change point in the original data and 1198881015840

119894s are

the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation

Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods

The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the

Table 4 Comparative experiment for inference of change points

119888119894

(correct answer)1198881015840

119894

(DPLSQ-HS) ARTIVA

119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61

CPU time (sec) 260640 mdash

number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888

119894s are three change

time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)

5 Conclusion

In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows

To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

12 BioMed Research International

methods works in polynomial time if the maximum indegreeis bounded by a constant

The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation

Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866

119894and 119866

119894+1

Although 119866119894and 119866

119894+1are inferred independently from the

original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work

Conflict of Interests

The authors declare that they have no conflict of interests

Acknowledgments

The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)

References

[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007

[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009

[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration

in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009

[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998

[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000

[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000

[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003

[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012

[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000

[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006

[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002

[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005

[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007

[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008

[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010

[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010

[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012

[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008

[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009

[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

BioMed Research International 13

[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008

[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009

[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011

[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009

[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013

[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002

[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012

[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009

[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Exact and Heuristic Methods for Network ...downloads.hindawi.com/journals/bmri/2014/684014.pdf · real gene regulatory network in the cell might dynamically change

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology