Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Research ArticleExact and Heuristic Methods for Network Completion forTime-Varying Genetic Networks
Natsu Nakajima and Tatsuya Akutsu
Bioinformatics Center Institute for Chemical Research Kyoto University Gokasho Uji Kyoto 611-0011 Japan
Correspondence should be addressed to Tatsuya Akutsu takutsukuicrkyoto-uacjp
Received 13 August 2013 Revised 9 January 2014 Accepted 22 January 2014 Published 9 March 2014
Academic Editor Nasimul Noman
Copyright copy 2014 N Nakajima and T Akutsu This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited
Robustness in biological networks can be regarded as an important feature of living systems A system maintains its functionsagainst internal and external perturbations leading to topological changes in the network with varying delays To understand theflexibility of biological networks we propose a novel approach to analyze time-dependent networks based on the framework ofnetwork completion which aims to make the minimum amount of modifications to a given network so that the resulting networkis most consistent with the observed data We have developed a novel network completion method for time-varying networksby extending our previous method for the completion of stationary networks In particular we introduce a double dynamicprogramming technique to identify change time points and required modifications Although this extended method allows us toguarantee the optimality of the solution this method has relatively low computational efficiency In order to resolve this difficultywe developed a heuristicmethod for speeding up the calculation ofminimum least squares errorsWe demonstrate the effectivenessof our proposed methods through computational experiments using synthetic data and real microarray gene expression data Theresults indicate that our methods exhibit good performance in terms of completing and inferring gene association networks withtime-varying structures
1 Introduction
Computational analysis of gene regulatory networks is animportant topic in systems biology A gene regulatory net-work is a collection of genes and their correlations andcausal interactions It is often represented as a directedgraph in which the nodes correspond to genes and theedges correspond to regulatory relationships between twogenes Gene regulatory networks play important roles in cellsFor example gene regulatory networks maintain organismsthrough protein production response to the external envi-ronment and control of cell division processes Thereforedeciphering gene regulatory network structures is importantfor understanding cellular systems which might also be use-ful for the prediction of adverse effects of new drugs and thedetection of target genes for the development of newdrugs Inorder to infer gene regulatory networks various kinds of datahave been used such as gene expression profiles (particularlymRNA expression profiles) CHromatin ImmunoPrecipita-tion (ChIP)-chip data for transcription binding information
DNA-protein interaction data and protein-protein interac-tion data [1ndash3] However many existing studies have focusedon the use of gene expression profiles because expressiondata from a large number of genes can be simultaneouslyobserved due to developments in DNA microarray technol-ogy [1ndash3] Various mathematical models and computationalmethods have been applied andor developed to infer generegulatory networks from gene expression profiles whichinclude Boolean networks [4 5] Bayesian networks [6 7]dynamic Bayesian networks [8] differential equations [9 10]and graphical Gaussian models [11] In Boolean networksthe state of each gene is simplified into 0 or 1 and the generegulation rules are given as Boolean functions where 0 and1 mean that a gene is active (in high expression) and inactive(in low expression) respectively In the most widely usedBoolean network model it is assumed that the states of geneschange synchronously according to discrete time steps InBayesian networks the states of genes are usually classifiedinto discrete values and the gene regulation rules are given
Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 684014 13 pageshttpdxdoiorg1011552014684014
2 BioMed Research International
in the form of conditional probabilities Although standardBayesian networks can only handle static data and acyclicnetworks dynamic Bayesian networks can handle time seriesdata and cyclic networks In differential equation models thedynamics of gene expression are represented by a set of linearor nonlinear equations (one equation per gene) In graphicalGaussianmodels partial correlations are used as ameasure ofindependence of any two genes by which direct interactionsare distinguished from indirect interactions For details ofthese models and methods see reviewcomparison papers[1ndash3]
These network models assume that the topology ofthe network does not change through time whereas thereal gene regulatory network in the cell might dynamicallychange its structure depending on time the effects of certainshocks and so forth Therefore many reverse engineeringtools have recently been proposed which can reconstructtime-varying biological networks based on time-series geneexpression data Yoshida et al [12] developed a dynamiclinear model with Markov switching that represents changepoints in regimes that evolve according to a first-orderMarkov process Fujita et al [13] proposed a method basedon the dynamic autoregressivemodelThismodel extends thevector autoregression (VAR) model which can be applied tothe inference of nonlinear time-dependent biological corre-lations such as dynamic gene regulatory networks Robinsonand Hartemink [14] proposed a model called a nonstationarydynamic Bayesian network based on dynamic Bayesiannetworks which allows inference from data generated bynonstationary processes in a time-dependent manner Lebreet al [15] also introduced the autoregressive time-varying(ARTIVA) algorithm for the analysis of time-varying networktopologies from time course data which is generated fromdifferent processes This model adopts a combination ofreversible jump Markov chain Monte Carlo (RJMCMC) anddynamic Bayesian networks (DBN) in which RJMCMC isused for the identification of change time points and theresulting networks and DBN is used to represent causalinteractions among genes Thorne and Stumpf [8] presenteda method to model the regulatory network structure betweendistinct segments with a set of hidden states by applyingthe hierarchical Dirichlet process hidden Markov model[16] including a potentially infinite number of states and aBayesian networkmodel for estimating relationships betweengenes Rassol and Bouaynaya [17] presented a new methodbased on constrained and smoothed Kalman filtering whichis capable of estimating time-varying networks from time-series data including unobserved and noisy measurementsThe dynamics of genetic modules are represented as a linear-state space equation and the observability of linear time-varying systems is defined by imposing sparse constraintsin Kalman filters Ahmed et al [18] proposed an algorithmcalled Tesla with machine learning which can be cast in theformof a convex optimization problemThebasic assumptionin this method is that networks at close time points do nothave significant topological differences but have commonedges with high probability in contrast networks at distanttime points are markedly different The regulatory networks
are represented by Markov random fields at arbitrary timeintervals
As mentioned above there have been many studiesand attempts to analyze both time-independent and time-dependent networks from time-series expression data how-ever gene regulatory systems in living organisms are socomplicated that any mathematical model has limitationsand there is not yet a standard or established method forinference even for time-independent networks One of thepossible reasons is that there exists an insufficient number ofhigh-quality time-series datasets to reconstruct the dynamicbehavior of the network In other words it is difficult toreveal a correct or nearly correct network based on a smallamount of data that includes some noise Hence in ourrecent study we proposed a new approach for the analysisof time-independent networks called network completion[19 20] in which the minimum amount of modificationsare made to a given network so that the resulting networkis most consistent with the observed data Similar conceptshave been independently proposed [21ndash24] In additionnetwork completion can be applied to inference of networksby starting with the null network
In this paper we present two novel methods for thecompletion and inference of time-varying networks usingdynamic programming and least squares fitting (DPLSQ)DPLSQ-TV (DPLSQ-TV was presented in a preliminary ver-sion of this paper [25] however in this paper more detailedcomputational experiments are performed andDPLSQ-HS isnewly introduced) and DPLSQ-HS where TV and HS standfor time varying and heuristics DPLSQ-TV is an extension ofDPLSQ [20] such that it can identify the time points at whichthe structure of the gene regulatory network changes Sincethe additions and deletions of edges are basic modificationsin network completion we need to extend DPLSQ so thatthese operations can be performed at several time points InDPLSQ-TV these edges and time points are identified by anovel double dynamic programming method in which theinner loop is used to identify static network structures and theouter loop is used to determine change points It is to be notedthat a single dynamic programming (DP)methodwas used inour previous work on the completion and inference of time-independent networks [20] whereas a double DP method isemployed here in order to cope with time-varying networksOur proposed methods also allow us to find an optimalsolution in polynomial time if the maximum indegree (iethe maximum number of input genes to a gene) is boundedby a constant Although DPLSQ-TV is guaranteed to findan optimal solution in polynomial time the degree of thepolynomial is not low which prevents themethod frombeingapplied to the completion of large networks Therefore wefurther propose a heuristic method called DPLSQ-HS tospeed up the calculation of the minimum least squares errorby applying restriction constraints that limit the number ofcombinations of incoming nodes
We evaluate the efficiency of our methods through com-putational experiments using synthetic data and microarraygene expression data from the life cycle of D melanogasterand the cell cycle of S cerevisiae We also demonstrate the
BioMed Research International 3
Expression
Gene1
Gene2
Gene3
Time1 2 3 4 5 6 7 8 9 10 11
V1
V2
V3V1
V2
V3V1
V2
V3
Inference
E0 E1 E2
c1 = 5 c2 = 9
12 = m
Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888
1= 5 and 119888
2= 9
effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]
2 Method
In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888
1minus 1] [119888
1 119888
2minus 1] [119888
119861 119898]
where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V
1 V
119899
be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864
119861be the sets of directed edges (ie the output)
where 119864119894denotes the edge set for the 119894th interval
Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888
119861) and complete the initial network 119866(119881 119864) by
adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864
0 1198641 119864
119861at the corresponding time intervals (see
Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network
21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V
119894are
determined by the following differential equation
119889119909119894
119889119905= 119886119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895119909119894119896+ 119887119894120596 (1)
where 119909119894corresponds to the expression value of node V
119894 120596
denotes random noise and V1198941 V
119894ℎare incoming nodes to
i1 i2ih
ai2
ai1
aih
ai12
i
middot middot middot
Figure 2 Dynamics model for a node
V119894 The second and third terms of the right-hand side of the
equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894
119895
or 119886119894
119895119896corresponds to an activation effect and a negative
value for 119886119894119895or 119886119894119895119896
corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead
In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows
119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895(119905) + sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905))
(2)
where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]
4 BioMed Research International
In our previous method DPLSQ [20] we assume thattime series data 119910
119894(119905)s which correspond to 119909
119894(119905)s in (2) are
given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910
119894(119905) from an expression value 119909
119894(119905) in
the mathematical model equation (2) Then the parameters119886119894
119895s and 119886
119894
119895119896s are estimated from these time series data by
minimizing the following objective function (ie the sum ofthe least squares errors) for each node V
119894
119898
sum
119905=1
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus[
[
119910119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(3)
It should be noted that 119910119894(119905) is the observed expression
value of gene V119894at time 119905 and V
1198941 V1198942 V
119894ℎare tentative
incoming nodes to node V119894 Incoming nodes to each node
are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ
22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889
[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881
and the set of initial edges 119864 are given Let the current setof incoming nodes to V
119894be V1198941 V
119894119889 We define the least
squares error for V119894during the time period between 119901 and 119902
as119878119894
V1198941 V1198942 V119894119889 [119901 119902]
= min1198861198940 119886119894119895119886119894119895119896
119902minus1
sum
119905=119901
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus [
[
119910119894(119905) + Δ119905
times (119886119894
0+
119889
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(4)
where 119910119894(119905) denotes the observed expression value of gene V
119894
at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this
minimum value can be computed by a standard least squaresfitting method
Because network completion is considered to involve theaddition of edges let 119890minus(V
119894) = V
1198951 V
119895119889 be the set of initial
incoming nodes to V119894 Let 120590+
119896119895 119895[119901 119902] denote the minimum
least squares error when adding 119896119895edges to the 119895th node
during the time from 119901 to 119902 which is formally defined as
120590+
119896119895 119895[119901 119902] = min
1198951 1198952119895119896119895
119878119895
119890minus(V119895)cupV1198951 V1198952 V119895119896119895
[119901 119902] (5)
where each V119895119897must be selected from119881minus V
119895minus 119890minus(V119895) In order
to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+
119896119895 119895[119901 119902] = +infin for
119896119895gt 119870 or 119896
119895+ |119890minus(V119895)| ge 119899
Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(6)
where 1198880= 1 and 119888
119861+1minus 1 = 119898
Here we define119863+[119896 119894 119901 119902] as
119863+[119896 119894 119901 119902] = min
1198961+1198962+sdotsdotsdot+119896119894=119896
119894
sum
119895=1
120590+
119896119895 119895[119901 119902]
(7)
The entries of 119863+[119896 119895 119901 119902] can be computed by the
following DP algorithm
119863+[119896 1 119901 119902] = 120590
+
1198961[119901 119902]
119863+[119896 119895 + 1 119901 119902] = min
1198961015840+11989610158401015840=119896
119863+[1198961015840 119895] + 120590
+
11989610158401015840119895+1
[119901 119902]
(8)
It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows
min1198961+1198962+sdotsdotsdot+119896119899=119896
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]
= min1198961015840+11989610158401015840=119896
min1198961+1198962+sdotsdotsdot+119896119899minus1=119896
1015840
119899minus1
sum
119895=1
120590+
119896119895 119895[119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
= min1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 minus 1 119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
(9)
Next we define 119864+[119896 119887 119902] as
119864+[119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(10)
BioMed Research International 5
where 119888119887+1
minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm
119864+[119896 0 119902] = 119863
+[119896 119899 1 119902]
119864+[119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 119901 119902] + 119864
+[11989610158401015840 119887 minus 1 119901]
(11)
The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]
The correctness of this DP algorithm can be seen asfollows
min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
= min119888119887minus1lt119888119887
1198961015840+11989610158401015840=119896
min1198881ltsdotsdotsdotlt119888119887minus1
1198960+sdotsdotsdot+119896
119887minus1=1198961015840
119887minus1
sum
119894=0
min1198961+sdotsdotsdot+119896119899=119896
119894
times[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
+ min1198961+sdotsdotsdot+119896119899=119896
10158401015840
[
[
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]]
]
= min119901lt119902
1198961015840+11989610158401015840=119896
119864+[1198961015840 119887 minus 1 119901] + 119863
+[11989610158401015840 119899 119901 119902]
(12)
23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here
Let 120590ℎ119895 119896119895 119895
[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896
119895edges
to 119890minus(V119895) and deleting ℎ
119895edges from 119890
minus(V119895) where the added
and deleted edges must be disjointed We constrain themaximum 119896
119895and ℎ
119895to the small constants 119870 and 119867 We let
120590ℎ119895 119896119895 119895
[119901 119902] = +infin if 119896119895gt 119870 ℎ
119895gt 119867 119896
119895minusℎ119895+|119890minus(V119895)| ge 119899
or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119861=ℎ
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(13)
Here we define119863[ℎ 119896 119894 119901 119902] as
119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896
ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ
119894
sum
119895=1
120590ℎ119895 119896119895 119895
[119901 119902]
(14)
As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by
119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895
[119901 119902]
119863 [ℎ 119896 119895 + 1 119901 119902]
= min1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590
ℎ1015840101584011989610158401015840119895+1
[119901 119902]
(15)
Next we define 119864[ℎ 119896 119887 119902] as
119864 [ℎ 119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119887=ℎ
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(16)
119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm
119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]
119864 [ℎ 119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ
10158401015840 11989610158401015840 119887 minus 1 119901]
(17)
24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges
First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901
2+ 1199013) time where 119898 is the
number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
Next we analyze the time complexity required for com-puting120590
ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590
ℎ119895 119896119895 119895
is 119874(119898119899119870+1
) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590
ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)
because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
2 BioMed Research International
in the form of conditional probabilities Although standardBayesian networks can only handle static data and acyclicnetworks dynamic Bayesian networks can handle time seriesdata and cyclic networks In differential equation models thedynamics of gene expression are represented by a set of linearor nonlinear equations (one equation per gene) In graphicalGaussianmodels partial correlations are used as ameasure ofindependence of any two genes by which direct interactionsare distinguished from indirect interactions For details ofthese models and methods see reviewcomparison papers[1ndash3]
These network models assume that the topology ofthe network does not change through time whereas thereal gene regulatory network in the cell might dynamicallychange its structure depending on time the effects of certainshocks and so forth Therefore many reverse engineeringtools have recently been proposed which can reconstructtime-varying biological networks based on time-series geneexpression data Yoshida et al [12] developed a dynamiclinear model with Markov switching that represents changepoints in regimes that evolve according to a first-orderMarkov process Fujita et al [13] proposed a method basedon the dynamic autoregressivemodelThismodel extends thevector autoregression (VAR) model which can be applied tothe inference of nonlinear time-dependent biological corre-lations such as dynamic gene regulatory networks Robinsonand Hartemink [14] proposed a model called a nonstationarydynamic Bayesian network based on dynamic Bayesiannetworks which allows inference from data generated bynonstationary processes in a time-dependent manner Lebreet al [15] also introduced the autoregressive time-varying(ARTIVA) algorithm for the analysis of time-varying networktopologies from time course data which is generated fromdifferent processes This model adopts a combination ofreversible jump Markov chain Monte Carlo (RJMCMC) anddynamic Bayesian networks (DBN) in which RJMCMC isused for the identification of change time points and theresulting networks and DBN is used to represent causalinteractions among genes Thorne and Stumpf [8] presenteda method to model the regulatory network structure betweendistinct segments with a set of hidden states by applyingthe hierarchical Dirichlet process hidden Markov model[16] including a potentially infinite number of states and aBayesian networkmodel for estimating relationships betweengenes Rassol and Bouaynaya [17] presented a new methodbased on constrained and smoothed Kalman filtering whichis capable of estimating time-varying networks from time-series data including unobserved and noisy measurementsThe dynamics of genetic modules are represented as a linear-state space equation and the observability of linear time-varying systems is defined by imposing sparse constraintsin Kalman filters Ahmed et al [18] proposed an algorithmcalled Tesla with machine learning which can be cast in theformof a convex optimization problemThebasic assumptionin this method is that networks at close time points do nothave significant topological differences but have commonedges with high probability in contrast networks at distanttime points are markedly different The regulatory networks
are represented by Markov random fields at arbitrary timeintervals
As mentioned above there have been many studiesand attempts to analyze both time-independent and time-dependent networks from time-series expression data how-ever gene regulatory systems in living organisms are socomplicated that any mathematical model has limitationsand there is not yet a standard or established method forinference even for time-independent networks One of thepossible reasons is that there exists an insufficient number ofhigh-quality time-series datasets to reconstruct the dynamicbehavior of the network In other words it is difficult toreveal a correct or nearly correct network based on a smallamount of data that includes some noise Hence in ourrecent study we proposed a new approach for the analysisof time-independent networks called network completion[19 20] in which the minimum amount of modificationsare made to a given network so that the resulting networkis most consistent with the observed data Similar conceptshave been independently proposed [21ndash24] In additionnetwork completion can be applied to inference of networksby starting with the null network
In this paper we present two novel methods for thecompletion and inference of time-varying networks usingdynamic programming and least squares fitting (DPLSQ)DPLSQ-TV (DPLSQ-TV was presented in a preliminary ver-sion of this paper [25] however in this paper more detailedcomputational experiments are performed andDPLSQ-HS isnewly introduced) and DPLSQ-HS where TV and HS standfor time varying and heuristics DPLSQ-TV is an extension ofDPLSQ [20] such that it can identify the time points at whichthe structure of the gene regulatory network changes Sincethe additions and deletions of edges are basic modificationsin network completion we need to extend DPLSQ so thatthese operations can be performed at several time points InDPLSQ-TV these edges and time points are identified by anovel double dynamic programming method in which theinner loop is used to identify static network structures and theouter loop is used to determine change points It is to be notedthat a single dynamic programming (DP)methodwas used inour previous work on the completion and inference of time-independent networks [20] whereas a double DP method isemployed here in order to cope with time-varying networksOur proposed methods also allow us to find an optimalsolution in polynomial time if the maximum indegree (iethe maximum number of input genes to a gene) is boundedby a constant Although DPLSQ-TV is guaranteed to findan optimal solution in polynomial time the degree of thepolynomial is not low which prevents themethod frombeingapplied to the completion of large networks Therefore wefurther propose a heuristic method called DPLSQ-HS tospeed up the calculation of the minimum least squares errorby applying restriction constraints that limit the number ofcombinations of incoming nodes
We evaluate the efficiency of our methods through com-putational experiments using synthetic data and microarraygene expression data from the life cycle of D melanogasterand the cell cycle of S cerevisiae We also demonstrate the
BioMed Research International 3
Expression
Gene1
Gene2
Gene3
Time1 2 3 4 5 6 7 8 9 10 11
V1
V2
V3V1
V2
V3V1
V2
V3
Inference
E0 E1 E2
c1 = 5 c2 = 9
12 = m
Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888
1= 5 and 119888
2= 9
effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]
2 Method
In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888
1minus 1] [119888
1 119888
2minus 1] [119888
119861 119898]
where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V
1 V
119899
be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864
119861be the sets of directed edges (ie the output)
where 119864119894denotes the edge set for the 119894th interval
Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888
119861) and complete the initial network 119866(119881 119864) by
adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864
0 1198641 119864
119861at the corresponding time intervals (see
Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network
21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V
119894are
determined by the following differential equation
119889119909119894
119889119905= 119886119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895119909119894119896+ 119887119894120596 (1)
where 119909119894corresponds to the expression value of node V
119894 120596
denotes random noise and V1198941 V
119894ℎare incoming nodes to
i1 i2ih
ai2
ai1
aih
ai12
i
middot middot middot
Figure 2 Dynamics model for a node
V119894 The second and third terms of the right-hand side of the
equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894
119895
or 119886119894
119895119896corresponds to an activation effect and a negative
value for 119886119894119895or 119886119894119895119896
corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead
In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows
119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895(119905) + sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905))
(2)
where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]
4 BioMed Research International
In our previous method DPLSQ [20] we assume thattime series data 119910
119894(119905)s which correspond to 119909
119894(119905)s in (2) are
given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910
119894(119905) from an expression value 119909
119894(119905) in
the mathematical model equation (2) Then the parameters119886119894
119895s and 119886
119894
119895119896s are estimated from these time series data by
minimizing the following objective function (ie the sum ofthe least squares errors) for each node V
119894
119898
sum
119905=1
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus[
[
119910119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(3)
It should be noted that 119910119894(119905) is the observed expression
value of gene V119894at time 119905 and V
1198941 V1198942 V
119894ℎare tentative
incoming nodes to node V119894 Incoming nodes to each node
are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ
22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889
[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881
and the set of initial edges 119864 are given Let the current setof incoming nodes to V
119894be V1198941 V
119894119889 We define the least
squares error for V119894during the time period between 119901 and 119902
as119878119894
V1198941 V1198942 V119894119889 [119901 119902]
= min1198861198940 119886119894119895119886119894119895119896
119902minus1
sum
119905=119901
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus [
[
119910119894(119905) + Δ119905
times (119886119894
0+
119889
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(4)
where 119910119894(119905) denotes the observed expression value of gene V
119894
at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this
minimum value can be computed by a standard least squaresfitting method
Because network completion is considered to involve theaddition of edges let 119890minus(V
119894) = V
1198951 V
119895119889 be the set of initial
incoming nodes to V119894 Let 120590+
119896119895 119895[119901 119902] denote the minimum
least squares error when adding 119896119895edges to the 119895th node
during the time from 119901 to 119902 which is formally defined as
120590+
119896119895 119895[119901 119902] = min
1198951 1198952119895119896119895
119878119895
119890minus(V119895)cupV1198951 V1198952 V119895119896119895
[119901 119902] (5)
where each V119895119897must be selected from119881minus V
119895minus 119890minus(V119895) In order
to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+
119896119895 119895[119901 119902] = +infin for
119896119895gt 119870 or 119896
119895+ |119890minus(V119895)| ge 119899
Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(6)
where 1198880= 1 and 119888
119861+1minus 1 = 119898
Here we define119863+[119896 119894 119901 119902] as
119863+[119896 119894 119901 119902] = min
1198961+1198962+sdotsdotsdot+119896119894=119896
119894
sum
119895=1
120590+
119896119895 119895[119901 119902]
(7)
The entries of 119863+[119896 119895 119901 119902] can be computed by the
following DP algorithm
119863+[119896 1 119901 119902] = 120590
+
1198961[119901 119902]
119863+[119896 119895 + 1 119901 119902] = min
1198961015840+11989610158401015840=119896
119863+[1198961015840 119895] + 120590
+
11989610158401015840119895+1
[119901 119902]
(8)
It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows
min1198961+1198962+sdotsdotsdot+119896119899=119896
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]
= min1198961015840+11989610158401015840=119896
min1198961+1198962+sdotsdotsdot+119896119899minus1=119896
1015840
119899minus1
sum
119895=1
120590+
119896119895 119895[119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
= min1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 minus 1 119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
(9)
Next we define 119864+[119896 119887 119902] as
119864+[119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(10)
BioMed Research International 5
where 119888119887+1
minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm
119864+[119896 0 119902] = 119863
+[119896 119899 1 119902]
119864+[119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 119901 119902] + 119864
+[11989610158401015840 119887 minus 1 119901]
(11)
The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]
The correctness of this DP algorithm can be seen asfollows
min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
= min119888119887minus1lt119888119887
1198961015840+11989610158401015840=119896
min1198881ltsdotsdotsdotlt119888119887minus1
1198960+sdotsdotsdot+119896
119887minus1=1198961015840
119887minus1
sum
119894=0
min1198961+sdotsdotsdot+119896119899=119896
119894
times[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
+ min1198961+sdotsdotsdot+119896119899=119896
10158401015840
[
[
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]]
]
= min119901lt119902
1198961015840+11989610158401015840=119896
119864+[1198961015840 119887 minus 1 119901] + 119863
+[11989610158401015840 119899 119901 119902]
(12)
23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here
Let 120590ℎ119895 119896119895 119895
[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896
119895edges
to 119890minus(V119895) and deleting ℎ
119895edges from 119890
minus(V119895) where the added
and deleted edges must be disjointed We constrain themaximum 119896
119895and ℎ
119895to the small constants 119870 and 119867 We let
120590ℎ119895 119896119895 119895
[119901 119902] = +infin if 119896119895gt 119870 ℎ
119895gt 119867 119896
119895minusℎ119895+|119890minus(V119895)| ge 119899
or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119861=ℎ
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(13)
Here we define119863[ℎ 119896 119894 119901 119902] as
119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896
ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ
119894
sum
119895=1
120590ℎ119895 119896119895 119895
[119901 119902]
(14)
As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by
119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895
[119901 119902]
119863 [ℎ 119896 119895 + 1 119901 119902]
= min1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590
ℎ1015840101584011989610158401015840119895+1
[119901 119902]
(15)
Next we define 119864[ℎ 119896 119887 119902] as
119864 [ℎ 119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119887=ℎ
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(16)
119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm
119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]
119864 [ℎ 119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ
10158401015840 11989610158401015840 119887 minus 1 119901]
(17)
24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges
First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901
2+ 1199013) time where 119898 is the
number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
Next we analyze the time complexity required for com-puting120590
ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590
ℎ119895 119896119895 119895
is 119874(119898119899119870+1
) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590
ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)
because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 3
Expression
Gene1
Gene2
Gene3
Time1 2 3 4 5 6 7 8 9 10 11
V1
V2
V3V1
V2
V3V1
V2
V3
Inference
E0 E1 E2
c1 = 5 c2 = 9
12 = m
Figure 1 Inference (ie completion starting with the null network) of time varying structure of a genetic networkThis example correspondsto the case of119873 = 1 119899 = 3 119861 = 2 and119898 = 12 The change points are 119888
1= 5 and 119888
2= 9
effectiveness of the proposed methods by comparing ourresults with those of ARTIVA [15]
2 Method
In this section we present DPLSQ-TV a DP-based methodfor the completion of a time-varying networkWe assume thatthere exist119898 time points (1 2 119898) which are divided into119861 + 1 intervals [1 119888
1minus 1] [119888
1 119888
2minus 1] [119888
119861 119898]
where 119861 indicates the number of change points A differentnetwork is associated with each interval We assume that theset of genes does not change therefore only the edge setchanges according to the time interval Let 119881 = V
1 V
119899
be the set of genes Let 119864 be the initial set of directededges (ie initial set of gene regulation relationships) and let1198640 1198641 119864
119861be the sets of directed edges (ie the output)
where 119864119894denotes the edge set for the 119894th interval
Then the problem is defined as follows given an initialnetwork119866(119881 119864) consisting of 119899 genes119873 time series datasetseach of which consists of 119898 time points for 119899 genes andthe positive integers ℎ 119896 and 119861 infer 119861 change points (ie1198881 1198882 119888
119861) and complete the initial network 119866(119881 119864) by
adding 119896 edges and deleting ℎ edges in total such that thetotal least-squares error isminimizedThis results in the set ofedges 119864
0 1198641 119864
119861at the corresponding time intervals (see
Figure 1) It is to be noted that if we start with an empty set ofedges (ie 119864 = 0) the problem corresponds to the inferenceof a time-varying network
21 Model of Differential Equation and Estimation of Param-eters We assume that the dynamics of each node V
119894are
determined by the following differential equation
119889119909119894
119889119905= 119886119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895119909119894119896+ 119887119894120596 (1)
where 119909119894corresponds to the expression value of node V
119894 120596
denotes random noise and V1198941 V
119894ℎare incoming nodes to
i1 i2ih
ai2
ai1
aih
ai12
i
middot middot middot
Figure 2 Dynamics model for a node
V119894 The second and third terms of the right-hand side of the
equation represent the linear and nonlinear effects to nodeV119894 respectively (see Figure 2) where a positive value for 119886119894
119895
or 119886119894
119895119896corresponds to an activation effect and a negative
value for 119886119894119895or 119886119894119895119896
corresponds to an inhibition effect Thismodel is an extension of the linear differential equationmodel [3] It is also a variant of the recurrent neural networkmodel [27] although the sigmoid function is replaced hereby an identify function and nonlinear terms representingcooperating regulations are added instead
In practice we replace the derivative of (1) by thedifference and ignore the noise term as follows
119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895(119905) + sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905))
(2)
where Δ119905 denotes the unit time This kind of discretizationis also employed for linear and recurrent neural networkmodels [3 27]
4 BioMed Research International
In our previous method DPLSQ [20] we assume thattime series data 119910
119894(119905)s which correspond to 119909
119894(119905)s in (2) are
given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910
119894(119905) from an expression value 119909
119894(119905) in
the mathematical model equation (2) Then the parameters119886119894
119895s and 119886
119894
119895119896s are estimated from these time series data by
minimizing the following objective function (ie the sum ofthe least squares errors) for each node V
119894
119898
sum
119905=1
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus[
[
119910119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(3)
It should be noted that 119910119894(119905) is the observed expression
value of gene V119894at time 119905 and V
1198941 V1198942 V
119894ℎare tentative
incoming nodes to node V119894 Incoming nodes to each node
are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ
22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889
[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881
and the set of initial edges 119864 are given Let the current setof incoming nodes to V
119894be V1198941 V
119894119889 We define the least
squares error for V119894during the time period between 119901 and 119902
as119878119894
V1198941 V1198942 V119894119889 [119901 119902]
= min1198861198940 119886119894119895119886119894119895119896
119902minus1
sum
119905=119901
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus [
[
119910119894(119905) + Δ119905
times (119886119894
0+
119889
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(4)
where 119910119894(119905) denotes the observed expression value of gene V
119894
at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this
minimum value can be computed by a standard least squaresfitting method
Because network completion is considered to involve theaddition of edges let 119890minus(V
119894) = V
1198951 V
119895119889 be the set of initial
incoming nodes to V119894 Let 120590+
119896119895 119895[119901 119902] denote the minimum
least squares error when adding 119896119895edges to the 119895th node
during the time from 119901 to 119902 which is formally defined as
120590+
119896119895 119895[119901 119902] = min
1198951 1198952119895119896119895
119878119895
119890minus(V119895)cupV1198951 V1198952 V119895119896119895
[119901 119902] (5)
where each V119895119897must be selected from119881minus V
119895minus 119890minus(V119895) In order
to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+
119896119895 119895[119901 119902] = +infin for
119896119895gt 119870 or 119896
119895+ |119890minus(V119895)| ge 119899
Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(6)
where 1198880= 1 and 119888
119861+1minus 1 = 119898
Here we define119863+[119896 119894 119901 119902] as
119863+[119896 119894 119901 119902] = min
1198961+1198962+sdotsdotsdot+119896119894=119896
119894
sum
119895=1
120590+
119896119895 119895[119901 119902]
(7)
The entries of 119863+[119896 119895 119901 119902] can be computed by the
following DP algorithm
119863+[119896 1 119901 119902] = 120590
+
1198961[119901 119902]
119863+[119896 119895 + 1 119901 119902] = min
1198961015840+11989610158401015840=119896
119863+[1198961015840 119895] + 120590
+
11989610158401015840119895+1
[119901 119902]
(8)
It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows
min1198961+1198962+sdotsdotsdot+119896119899=119896
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]
= min1198961015840+11989610158401015840=119896
min1198961+1198962+sdotsdotsdot+119896119899minus1=119896
1015840
119899minus1
sum
119895=1
120590+
119896119895 119895[119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
= min1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 minus 1 119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
(9)
Next we define 119864+[119896 119887 119902] as
119864+[119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(10)
BioMed Research International 5
where 119888119887+1
minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm
119864+[119896 0 119902] = 119863
+[119896 119899 1 119902]
119864+[119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 119901 119902] + 119864
+[11989610158401015840 119887 minus 1 119901]
(11)
The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]
The correctness of this DP algorithm can be seen asfollows
min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
= min119888119887minus1lt119888119887
1198961015840+11989610158401015840=119896
min1198881ltsdotsdotsdotlt119888119887minus1
1198960+sdotsdotsdot+119896
119887minus1=1198961015840
119887minus1
sum
119894=0
min1198961+sdotsdotsdot+119896119899=119896
119894
times[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
+ min1198961+sdotsdotsdot+119896119899=119896
10158401015840
[
[
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]]
]
= min119901lt119902
1198961015840+11989610158401015840=119896
119864+[1198961015840 119887 minus 1 119901] + 119863
+[11989610158401015840 119899 119901 119902]
(12)
23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here
Let 120590ℎ119895 119896119895 119895
[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896
119895edges
to 119890minus(V119895) and deleting ℎ
119895edges from 119890
minus(V119895) where the added
and deleted edges must be disjointed We constrain themaximum 119896
119895and ℎ
119895to the small constants 119870 and 119867 We let
120590ℎ119895 119896119895 119895
[119901 119902] = +infin if 119896119895gt 119870 ℎ
119895gt 119867 119896
119895minusℎ119895+|119890minus(V119895)| ge 119899
or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119861=ℎ
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(13)
Here we define119863[ℎ 119896 119894 119901 119902] as
119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896
ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ
119894
sum
119895=1
120590ℎ119895 119896119895 119895
[119901 119902]
(14)
As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by
119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895
[119901 119902]
119863 [ℎ 119896 119895 + 1 119901 119902]
= min1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590
ℎ1015840101584011989610158401015840119895+1
[119901 119902]
(15)
Next we define 119864[ℎ 119896 119887 119902] as
119864 [ℎ 119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119887=ℎ
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(16)
119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm
119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]
119864 [ℎ 119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ
10158401015840 11989610158401015840 119887 minus 1 119901]
(17)
24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges
First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901
2+ 1199013) time where 119898 is the
number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
Next we analyze the time complexity required for com-puting120590
ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590
ℎ119895 119896119895 119895
is 119874(119898119899119870+1
) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590
ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)
because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
4 BioMed Research International
In our previous method DPLSQ [20] we assume thattime series data 119910
119894(119905)s which correspond to 119909
119894(119905)s in (2) are
given for 119905 = 0 1 119898 where we distinguish an observedexpression value 119910
119894(119905) from an expression value 119909
119894(119905) in
the mathematical model equation (2) Then the parameters119886119894
119895s and 119886
119894
119895119896s are estimated from these time series data by
minimizing the following objective function (ie the sum ofthe least squares errors) for each node V
119894
119898
sum
119905=1
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus[
[
119910119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(3)
It should be noted that 119910119894(119905) is the observed expression
value of gene V119894at time 119905 and V
1198941 V1198942 V
119894ℎare tentative
incoming nodes to node V119894 Incoming nodes to each node
are determined so that the sum of these values for all nodesis minimized under the constraint that the total numberof edges is equal to the specified number In order tominimize the sum of least squares errors for all genes alongwith determining the incoming nodes and correspondingparameters DP is applied Readers are referred to [20] for thedetails of DPLSQ
22 Completion by Addition of Edges In this subsection wepresent our proposed method for network completion oftime-varying networks by the addition of edges and extendthis to a general case (ie network completion by the additionand deletion of edges) in the following subsection Forsimplicity we assume 119873 = 1 where we can extend themethod to the case of 119873 gt 1 by changing the definitionof 119878119894V1198951 V1198952 V119895119889
[119901 119902] onlyWe assume that the set of nodes (ie the set of genes) 119881
and the set of initial edges 119864 are given Let the current setof incoming nodes to V
119894be V1198941 V
119894119889 We define the least
squares error for V119894during the time period between 119901 and 119902
as119878119894
V1198941 V1198942 V119894119889 [119901 119902]
= min1198861198940 119886119894119895119886119894119895119896
119902minus1
sum
119905=119901
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
119910119894(119905 + 1)
minus [
[
119910119894(119905) + Δ119905
times (119886119894
0+
119889
sum
119895=1
119886119894
119895119910119894119895(119905) +sum
119895lt119896
119886119894
119895119896119910119894119895(119905) 119910119894119896(119905))]
]
100381610038161003816100381610038161003816100381610038161003816100381610038161003816
2
(4)
where 119910119894(119905) denotes the observed expression value of gene V
119894
at time 119905The parameters (ie 1198861198940 119886119894119895 119886119894119895119896) needed to attain this
minimum value can be computed by a standard least squaresfitting method
Because network completion is considered to involve theaddition of edges let 119890minus(V
119894) = V
1198951 V
119895119889 be the set of initial
incoming nodes to V119894 Let 120590+
119896119895 119895[119901 119902] denote the minimum
least squares error when adding 119896119895edges to the 119895th node
during the time from 119901 to 119902 which is formally defined as
120590+
119896119895 119895[119901 119902] = min
1198951 1198952119895119896119895
119878119895
119890minus(V119895)cupV1198951 V1198952 V119895119896119895
[119901 119902] (5)
where each V119895119897must be selected from119881minus V
119895minus 119890minus(V119895) In order
to avoid combinatorial explosion we constrain themaximum119896119895to be a small constant 119870 and let 120590+
119896119895 119895[119901 119902] = +infin for
119896119895gt 119870 or 119896
119895+ |119890minus(V119895)| ge 119899
Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(6)
where 1198880= 1 and 119888
119861+1minus 1 = 119898
Here we define119863+[119896 119894 119901 119902] as
119863+[119896 119894 119901 119902] = min
1198961+1198962+sdotsdotsdot+119896119894=119896
119894
sum
119895=1
120590+
119896119895 119895[119901 119902]
(7)
The entries of 119863+[119896 119895 119901 119902] can be computed by the
following DP algorithm
119863+[119896 1 119901 119902] = 120590
+
1198961[119901 119902]
119863+[119896 119895 + 1 119901 119902] = min
1198961015840+11989610158401015840=119896
119863+[1198961015840 119895] + 120590
+
11989610158401015840119895+1
[119901 119902]
(8)
It is to be noted that119863+[119896 119899 119901 119902] is determined uniquelyregardless of the ordering of nodes in the network Thecorrectness of this DP algorithm can be seen as follows
min1198961+1198962+sdotsdotsdot+119896119899=119896
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]
= min1198961015840+11989610158401015840=119896
min1198961+1198962+sdotsdotsdot+119896119899minus1=119896
1015840
119899minus1
sum
119895=1
120590+
119896119895 119895[119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
= min1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 minus 1 119901 119902] + 120590
+
11989610158401015840119899[119901 119902]
(9)
Next we define 119864+[119896 119887 119902] as
119864+[119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
(10)
BioMed Research International 5
where 119888119887+1
minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm
119864+[119896 0 119902] = 119863
+[119896 119899 1 119902]
119864+[119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 119901 119902] + 119864
+[11989610158401015840 119887 minus 1 119901]
(11)
The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]
The correctness of this DP algorithm can be seen asfollows
min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
= min119888119887minus1lt119888119887
1198961015840+11989610158401015840=119896
min1198881ltsdotsdotsdotlt119888119887minus1
1198960+sdotsdotsdot+119896
119887minus1=1198961015840
119887minus1
sum
119894=0
min1198961+sdotsdotsdot+119896119899=119896
119894
times[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
+ min1198961+sdotsdotsdot+119896119899=119896
10158401015840
[
[
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]]
]
= min119901lt119902
1198961015840+11989610158401015840=119896
119864+[1198961015840 119887 minus 1 119901] + 119863
+[11989610158401015840 119899 119901 119902]
(12)
23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here
Let 120590ℎ119895 119896119895 119895
[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896
119895edges
to 119890minus(V119895) and deleting ℎ
119895edges from 119890
minus(V119895) where the added
and deleted edges must be disjointed We constrain themaximum 119896
119895and ℎ
119895to the small constants 119870 and 119867 We let
120590ℎ119895 119896119895 119895
[119901 119902] = +infin if 119896119895gt 119870 ℎ
119895gt 119867 119896
119895minusℎ119895+|119890minus(V119895)| ge 119899
or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119861=ℎ
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(13)
Here we define119863[ℎ 119896 119894 119901 119902] as
119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896
ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ
119894
sum
119895=1
120590ℎ119895 119896119895 119895
[119901 119902]
(14)
As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by
119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895
[119901 119902]
119863 [ℎ 119896 119895 + 1 119901 119902]
= min1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590
ℎ1015840101584011989610158401015840119895+1
[119901 119902]
(15)
Next we define 119864[ℎ 119896 119887 119902] as
119864 [ℎ 119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119887=ℎ
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(16)
119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm
119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]
119864 [ℎ 119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ
10158401015840 11989610158401015840 119887 minus 1 119901]
(17)
24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges
First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901
2+ 1199013) time where 119898 is the
number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
Next we analyze the time complexity required for com-puting120590
ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590
ℎ119895 119896119895 119895
is 119874(119898119899119870+1
) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590
ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)
because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 5
where 119888119887+1
minus 1 = 119902 119864+[119896 119887 119902] can be computed by thefollowing DP algorithm
119864+[119896 0 119902] = 119863
+[119896 119899 1 119902]
119864+[119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
119863+[1198961015840 119899 119901 119902] + 119864
+[11989610158401015840 119887 minus 1 119901]
(11)
The introduction of 119864+[119896 119887 119902] and the corresponding DPprocedure are themethodologically novel points of this workcompared with our previous work [20]
The correctness of this DP algorithm can be seen asfollows
min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
= min119888119887minus1lt119888119887
1198961015840+11989610158401015840=119896
min1198881ltsdotsdotsdotlt119888119887minus1
1198960+sdotsdotsdot+119896
119887minus1=1198961015840
119887minus1
sum
119894=0
min1198961+sdotsdotsdot+119896119899=119896
119894
times[
[
119899
sum
119895=1
120590+
119896119895 119895[119888119894 119888119894+1
minus 1]]
]
+ min1198961+sdotsdotsdot+119896119899=119896
10158401015840
[
[
119899
sum
119895=1
120590+
119896119895 119895[119901 119902]]
]
= min119901lt119902
1198961015840+11989610158401015840=119896
119864+[1198961015840 119887 minus 1 119901] + 119863
+[11989610158401015840 119899 119901 119902]
(12)
23 Completion by Addition and Deletion of Edges The aboveDP procedure can be modified for the deletion of edges andfor the addition and deletion of edges as inDPLSQ [20] Sincethe former case is a subcase of the latter one we describe onlythe latter one (addition and deletion of edges) here
Let 120590ℎ119895 119896119895 119895
[119901 119902] denote the minimum least squares errorfor the time period between 119901 and 119902 when adding 119896
119895edges
to 119890minus(V119895) and deleting ℎ
119895edges from 119890
minus(V119895) where the added
and deleted edges must be disjointed We constrain themaximum 119896
119895and ℎ
119895to the small constants 119870 and 119867 We let
120590ℎ119895 119896119895 119895
[119901 119902] = +infin if 119896119895gt 119870 ℎ
119895gt 119867 119896
119895minusℎ119895+|119890minus(V119895)| ge 119899
or 119896119895minus ℎ119895+ |119890minus(V119895)| lt 0 hold Then the problem is stated as
min1198881lt1198882ltsdotsdotsdotlt119888119861
1198960+1198961+sdotsdotsdot+119896
119861=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119861=ℎ
119861
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(13)
Here we define119863[ℎ 119896 119894 119901 119902] as
119863[ℎ 119896 119894 119901 119902] = min1198961+1198962+sdotsdotsdot+119896119894=119896
ℎ1+ℎ2+sdotsdotsdot+ℎ119894=ℎ
119894
sum
119895=1
120590ℎ119895 119896119895 119895
[119901 119902]
(14)
As in the previous subsection 119863[ℎ 119896 119895 119901 119902] can becomputed by
119863[ℎ 119896 1 119901 119902] = 120590ℎ119895 119896119895 119895
[119901 119902]
119863 [ℎ 119896 119895 + 1 119901 119902]
= min1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119895 119901 119902] + 120590
ℎ1015840101584011989610158401015840119895+1
[119901 119902]
(15)
Next we define 119864[ℎ 119896 119887 119902] as
119864 [ℎ 119896 119887 119902]
= min1198881lt1198882ltsdotsdotsdotlt119888119887
1198960+1198961+sdotsdotsdot+119896
119887=119896
ℎ0+ℎ1+sdotsdotsdot+ℎ
119887=ℎ
119887
sum
119894=0
min1198961+1198962+sdotsdotsdot+119896119899=119896
119894
ℎ1+ℎ2+sdotsdotsdot+ℎ119899=ℎ119894
[
[
119899
sum
119895=1
120590ℎ119895 119896119895 119895
[119888119894 119888119894+1
minus 1]]
]
(16)
119864[ℎ 119896 119887 119902] can be computed by the following DP algo-rithm
119864 [ℎ 119896 0 119902] = 119863 [ℎ 119896 119899 1 119902]
119864 [ℎ 119896 119887 119902]
= min119901lt119902
1198961015840+11989610158401015840=119896
ℎ1015840+ℎ10158401015840=ℎ
119863 [ℎ1015840 1198961015840 119899 119901 119902] + 119864 [ℎ
10158401015840 11989610158401015840 119887 minus 1 119901]
(17)
24 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-TV Since completion by theaddition of edges and completion by the deletion of edges arespecial cases of completion by the addition and deletion ofedges we focus on completion by the addition and deletionof edges
First we analyze the time complexity required per leastsquares fitting It is known that least squares fitting for a linearsystem can be done in 119874(119898119901
2+ 1199013) time where 119898 is the
number of data points and 119901 is the number of parametersIn our proposed method we assume that the maximumindegree is bounded by a constant and the numbers ofaddition and deletion edges in a given network are boundedby the constants 119870 and119867 respectively In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
Next we analyze the time complexity required for com-puting120590
ℎ119895 119896119895 119895[119901 119902]The total time required to compute120590
ℎ119895 119896119895 119895
is 119874(119898119899119870+1
) [20] where we assume that ℎ and 119896 are 119874(119899)Therefore the time complexity for120590
ℎ119895119896119895119895[119901 119902]s is119874(1198983119899119870+1)
because 119901 and 119902 are 119874(119898)Next we analyze the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s In this computation we note that the
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
6 BioMed Research International
size of table119863[ℎ 119896 119894 119901 119902] is 119874(11989821198993) Furthermore in orderto compute the minimum value for each entry in the DPprocedure we need to examine (119867+1)(119870+1) combinationswhich is119874(1) Hence the time complexity for119863[ℎ 119896 119894 119901 119902]sis 119874(11989821198993)
Finally we analyze the time complexity required for com-puting 119864[ℎ 119896 119887 119902]s We note that the size of table 119864[ℎ 119896 119887 119902]is 119874(119898119899
2) where we assume that 119861 is a constant Since the
number of combinations for computing the minimum valueusing DP is 119874(119898119899) per entry the computation time requiredfor computing 119864[ℎ 119896 119887 119902]s is 119874(11989821198993) Hence the total timecomplexity is
119874(1198983119899119870+1
+ 11989821198993) (18)
It is to be noted that if we use119873 time series datasets eachof which consists of 119898 points the time complexity becomes119874(119873119898
3119899119870+1
+ 11989821198993) Although this complexity is not small
it is allowable in practice if 119870 le 2 and 119899 and 119898 are not toolarge Indeed as shown in Section 42 DPLSQ-TV works forthe completion and inference of time-varying networks witha few tens of genes if 119870 = 2
3 Heuristic Method
Although our previous algorithm DPLSQ-TV is guaranteedto find an optimal solution in polynomial time the degreeof the polynomial is not low preventing the method frombeing applied to the completion of large-scale networksTherefore we propose a heuristic algorithm DPLSQ-HSto significantly improve the computational efficiency byrelaxing the optimality condition The reason why DPLSQ-TV requires a large amount of CPU time is that the leastsquares errors are calculated for each node by consideringall possible combinations of incoming nodes and taking theminimum value of these In order to significantly improvethe computational efficiency we introduce an upper limit onthe number of combinations of incoming nodes AlthoughDPLSQ-HS does not guarantee an optimal solution it allowsus to speed up the calculation of the minimum least squaresin the case of adding edges A schematic illustration of leastsquares computation is given in Section 31 The DPLSQ-HSalgorithm is described in Section 32 and we analyze the timecomplexity of DPLSQ-HS in Section 33
31 Schematic Illustrations of DPLSQ-HS Although DPLSQ-HS can be applied to the addition and deletion of edges weconsider only additions of edges as modification operationsin this subsection We have developed DPLSQ-HS whichcontributes to reducing the time complexity by imposingrestrictions on the number of combinations of incomingnodes to each node In Figure 3 the diagram indicates thatfor each node V
119894 we maintain119872 combinations of 119896 incoming
nodes with119872 lowest errors at the 119896th step Let 119878119896119894denote the
set of 119872 combinations computed at the 119896th step At the 119896thstep for each combination V
1198941 V
119894119896minus1 isin 1198781
119894cup1198782
119894cupsdot sdot sdotcup 119878
119896minus1
119894
where 1198941lt 1198942lt sdot sdot sdot lt 119894
119896minus1 we calculate the least squares error
for each V119895such that 119895 gt 119894
119896minus1is the 119896th incoming node to V
119894
The calculated least squares errors are sorted in descendingorder the top 119872 values are selected and the correspondingcombinations are stored in 119878
119896
119894
32 Algorithm The following is the description of the algo-rithm to compute 120590
+
119896119894[119901 119902] in DPLSQ-HS where 120590
+
119896119894[119901 119902]
does not necessarily mean the minimum value and themeaning of ldquosteprdquo is different from that in Section 31
Step 1 For each period [119901 119902] repeat Steps 2ndash6
Step 2 Let 1198780119894= 0 for all 119894 = 1 119899
Step 3 For 119894 = 1 to 119899 do Steps 4ndash7
Step 4 Repeat Steps 5ndash7 for node V119894from 119896 = 1 to 119896 = 119870
Step 5 For each combination V1198941 V
119894119896minus1 isin 119878
1
119894cup 1198782
119894cup
sdot sdot sdot cup 119878119896minus1
119894and each node V
119895such that 119895 gt 119894
119896minus1(119895 gt 0 if
119896 = 1) calculate the least squares error for the 119896 edge set(V1198941 V119894) (V
119894119896minus1 V119894) (V119895 V119894)
Step 6 Sort the obtained least squares errors in descendingorder and select the top119872 combinations which are stored in119878119896
119894
Step 7 Let 120590+
119896119894[119901 119902] be the minimum least squares error
among these top119872 combinations
The other parts of the algorithm are the same as in DPLSQ-TV
33 Time Complexity Analysis In this subsection we analyzethe time complexity of DPLSQ-HS Since DPLSQ-HS can beapplied to additions and deletions of edges we consider thetime complexity of completion for adding and deleting edges
In our proposed method we assume that the numbers ofadding anddeleting edges in a given network are respectivelybounded by constants 119870 and 119867 In this case the timecomplexity for least squares fitting can be estimated as119874(119898)
As for the time complexity of computing 120590ℎ119895 119896119895 119895
[119901 119902] weassume that the addition of edges is operated only in the caseof adding edges to the nodes with respect to the top119872 of thesorted listTherefore the number of combinations of additionof 119896119895edges which is bounded by a constant119870 is119874(119872119870) It is
well known that the sorting of 119899 data can be done in119874(119899 log 119899)time Based on such an assumption the total time requiredfor the computation of 120590
ℎ119895 119896119895 119895[119901 119902] is 119874(119898119899 log 119899) [20] since
the 119874(119872119870) factor can be regarded as a constant Thereforethe time complexity for 120590
ℎ119895 119896119895119895[119901 119902] is 119874(1198983119899 log 119899) because
119901 and 119902 are 119874(119898)Furthermore for the time complexity required for com-
puting 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s the calculation pro-cess is the same as that in DPLSQ-TV Therefore the com-putation time for both 119863[ℎ 119896 119894 119901 119902]s and 119864[ℎ 119896 119887 119902]s are
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 7
k = 1 k = 2 k = 3
i i i
S1i
S1i S
1iS
2i
S2i
S3i
Figure 3 Schematic illustrations for definition of the top119872 combinationsThis is an example for the cases of119872 = 3 and 119896 le 3 Let 119878119896119894denote
the set of119872 combinations computed at the 119896th step At the 119896th step for each combination V1198941 V
119894119896minus1 isin 1198781
119894cup 1198782
119894cup sdot sdot sdot cup 119878
119896minus1
119894 we calculate
the least squares error for each V119895such that 119895 gt 119894
119896minus1as a 119896th incoming node to V
119894
119874(11989821198993) as described in Section 24 Hence the total time
complexity of DPLSQ-HS is
119874(1198983119899 log 119899 + 119898
21198993) (19)
If we use119873 time series datasets each of which consists of119898 points the time complexity becomes119874(119873119898
3 log 119899+11989821198993)DPLSQ-HS requires less time complexity than DPLSQ-TV because 119874(1198983119899 log 119899) is much smaller than 119874(119898
3119899119870+1
)Indeed as shown in Section 42 DPLSQ-HS is much fasterthan DPLSQ-TV in practice
4 Results
We performed computational experiments using both artifi-cial data and real data All experiments were performed ona PC with an Intel Core(TM)2 Quad CPU (30GHz) Weemployed the liblsq library (httpwww2nictgojpaeristsstmgK5VSSPinstall lsqhtml) for the least squares fittingmethod
41 Completion Using Artificial Data In order to evaluatethe potential effectiveness of DPLSQ-TV andDPLSQ-HS webegin with network completion for time-varying networksusing artificial data We demonstrate that our proposedmethods can determine change time points quite accuratelywhen the network structure changes We employed thestructure of the real biological network WNT5A (Figure 4)[26] as the initial network 119866 and those of three differentnetworks 119866
1 1198662 and 119866
3generated by randomly adding and
deleting edges from the initial network In this method foreach node V
119894with ℎ input nodes we considered the following
model119909119894(119905 + 1)
= 119909119894(119905) + Δ119905(119886
119894
0+
ℎ
sum
119895=1
119886119894
119895119909119894119895+ sum
119895lt119896
119886119894
119895119896119909119894119895(119905) 119909119894119896(119905) + 119887
119894120596)
(20)
where 119886119894
119895s and 119886
119894
119895119896s are constants selected uniformly at
random from [minus05 05] and [minus005 005] respectively Thereason why the domain of 119886
119894
119895119896s is smaller than that for
119886119894
119895s is that nonlinear terms are not considered as strong as
linear terms It should also be noted that 119887119894120596 is a stochastic
term where 119887119894is a constant (we used 119887
119894= 005) and 120596 is
random noise taken uniformly at random from [minus1 1] Forthe artificial generation of the observed data 119910
119894(119905) we used
119910119894(119905) = 119909
119894(119905) + 119900
119894120598 (21)
where 119900119894 is a constant denoting the level of observation errorsand 120598 is random noise taken uniformly at random from[005 minus005]
As for the time series data we generated an originaldataset with 30 time points including two change points 119888
1=
10 1198882= 20 which is generated by merging three datasets for
1198661 1198662 and 119866
3 Since the use of time series data beginning
from only one set of initial values easily resulted in numericalcalculation errors we generated additional time series databeginning from 200 sets of initial values that were obtainedby slightly perturbing the original data Under the abovemodel we conducted computational experiments byDPLSQ-TV in which the initial network119866was modified by randomlyadding 119896
0edges and deleting ℎ
0edges per node resulting in
1198661 1198662 and 119866
3 additionally we also conducted DPLSQ-HS
experiments in which the initial network 119866 was modified byrandomly adding 119896
0edges per node using the default values
of 1198960= ℎ0= 1 We evaluated the performance of this method
by measuring the accuracy of modified edges the time pointerrors for time intervals and the computational time forcompletion (CPU time) Furthermore in order to examinehow CPU time changes as the size of the network grows wegenerated networks with 20 genes 30 genes and 40 genes bymaking 2 3 and 4 copies of the original networks We tookthe average time point errors accuracies and CPU time over10 random modifications with several 119900119894s In addition weperformed computational experiments on DPLSQ-TV and
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
8 BioMed Research International
MMP-3
WNT5A
HADHB
RET-1
S100P
Synuclein
Pirin
MART-1
RHO-C
STC2
Figure 4 Structure of WNT5A network [26]
Table 1 Result on completion with synthetic data
(a) Using DPLSQ-TV
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 160Accuracy 0830 0685 0554 0433
CPU time (sec) 53974 58574 50499 61189
119899 = 20
Time points error 000 000 000 210Accuracy 0781 0637 0541 0299
CPU time (sec) 75898 85391 142903 124400
119899 = 30
Time points error 000 000 035 400Accuracy 0539 0524 0418 0310
CPU time (sec) 264190 244480 234377 276081
119899 = 40
Time points error 000 000 035 350Accuracy 0585 0498 0458 0241
CPU time (sec) 342065 333398 317420 337274
(b) Using DPLSQ-HS
Observation error level01 03 05 07
119899 = 10
Time points error 000 000 000 685Accuracy 0627 0600 0533 0307
CPU time (sec) 32783 32030 35594 30765
119899 = 20
Time points error 000 000 000 870Accuracy 0488 0473 0413 0153
CPU time (sec) 52129 51348 56723 44699
119899 = 30
time points error 000 000 425 880Accuracy 0469 0386 0286 0097
CPU time (sec) 76852 86844 76313 74653
119899 = 40
Time points error 000 000 085 865Accuracy 0510 0413 0308 0093
CPU time (sec) 76360 98398 97657 102813
119899 = 60
Time points error 000 000 000 075Accuracy 0411 0375 0382 0355
CPU time (sec) 371635 395596 372192 391110
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 9
DPLSQ-HS using 60 genes where additional time series databeginning from 100 sets (in place of 200 sets) of initial valueswere used and119866
11198662 and119866
3were obtained by addition and
deletion of edges However DPLSQ-TV took too long time(more than 1000 sec per execution) and thus the result couldnot be included in Table 1
The accuracy is defined as follows
ℎ + 119896 + sum119861
119894=0(10038161003816100381610038161003816119864119894cap 1198641015840
119894
10038161003816100381610038161003816minus1003816100381610038161003816119864119894
1003816100381610038161003816)
ℎ + 119896 (22)
where 119864119894and 119864
1015840
119894(119894 = 0 1 119861) are respectively the sets of
edges in the original network and the completed network ineach time interval This value is 1 if all the added and deletededges are correct and 0 if none of the added and deleted edgesare correct If we regard a correctly (resp incorrectly) addedor deleted edge as a true (resp false) positive sum119861
119894=0(|119864119894| minus
|119864119894cap 1198641015840
119894|) corresponds to the number of false positives and
ℎ + 119896 + sum119861
119894=0(|119864119894cap 1198641015840
119894| minus |119864
119894|) corresponds to the number of
true positives The time point error is the average differencebetween the original and estimated values for change timepoints and is defined as
1
119861
119861
sum
119894=1
10038161003816100381610038161003816119888119894minus 1198881015840
119894
10038161003816100381610038161003816 (23)
where 1198881015840119894(119894 = 1 2 119861) are the estimated change points As
for the computation time we show the average CPU timeThe results of the two methods are shown in Table 1
It can be seen from this table that the change time pointerrors are quite small regardless of the size of the networkwith a low level of observation errors In addition it is alsoseen that the time point errors with DPLSQ-TV are close tothose with DPLSQ-HS with the exception of high levels ofobservation errorsWe observe that CPU time using DPLSQ-TV increases rapidly as the size of the network grows Onthe other hand CPU time by DPLSQ-HS increases graduallyas the size of the network grows It is also observed thatthe DPLSQ-HS algorithm is about 4 times faster than theDPLSQ-TV algorithm in case of 40 genes while maintaininggood accuracy Hence these results suggest that DPLSQ-TVand DPLSQ-HS can correctly identify the change time pointsif the error levels are not very large and that it can completethe initial network by modifying the edges with relativelygood accuracy if the observation error is small
It is also observed that DPLSQ-HS worked reasonablyfast even for 119899 = 60 although DPLSQ-TV took more than1000 seconds per execution and thus the result could not beincluded in Table 1 However the accuracy on DPLSQ-HSbecame around 04 even if the observation error level was low(ie 119900119894 = 01) Therefore the applicability of DPLSQ-HS isalso limited in terms of the accuracy although it may still beuseful for networks with 119899 = 60 if the purpose is to identifychange time points
Since DPLSQ-HS is a heuristic method the results maybe greatly influenced by data Therefore we evaluated thestability of DPLSQ-HS by comparing the variance of theaccuracy with that for DPLSQ-TV where 119899 = 20 The
variances for DPLSQ-TV were 000602 and 000446 for 119900119894 =03 and 119900
119894= 05 respectively The variances for DPLSQ-
HS were 001188 and 000732 for 119900119894= 03 and 119900
119894= 05
respectivelyThis result suggests that DPLSQ-HS is less stablethan DPLSQ-TV However the variances of DPLSQ-HS wereless than twice those of DPLSQ-TVTherefore this result alsosuggests that DPLSQ-HS has some stability
In order to examine the effect of the number of changepoints 119861 and the maximum number of added and deletededges per nodes 119870 and 119867 on the least squares error weperformed computational experiments with varying theseparameters (one experiment per parameter)Then the result-ing least squares errors (ie 119864[sdot sdot sdot ]s) for DPLSQ-TV are5495 7016 7875 3886 and 3799 for (119861 119870119867) = (2 1 1)(3 1 1) (4 1 1) (2 2 2) and (2 4 2) respectively It is seenthat use of larger119870119867 resulted in smaller least squares errorsIt is reasonable that more parameters resulted in better leastsquares fitting However use of larger 119861 did not result insmaller least squares errors It may be because addition ofunnecessary change points increases the error if an enoughnumber of edges are not added It is to be noted that althoughthe least squares errors are reduced use of larger 119870119867 is notalways appropriate because it needs much longer CPU timeand may cause overfitting
We also compared our results with those obtained bythe ARTIVA algorithm [15] It is to be noted that most ofthe other tools for the inference of time-varying networksare unavailable This model is based on a combination ofDBN and RJMCMC sampling where RJMCMC is used forapproximating the posterior distribution and DBN is usedfor inferring simultaneously the change points and resultingnetwork structures We applied ARTIVA to the syntheticdatasets that were generated in the same way as for ourproposed methods We used the default parameter settingsfor ARTIVA and evaluated the results by inferring the changepoints As the result of the comparative experiment there aretwo change time points in the synthetic datasets but ARTIVAcan only infer one change point regardless of the observationerror level as shown in Figure 5 where ARTIVA does notuniquely determine change points but output probabilities ofchange points
42 Inference Using Real Data We examined two types ofproposed methods for the inference of change time pointsusing gene expression microarray data and also comparedour results with those obtained using the ARTIVA algorithmWe applied ourmethods to two real gene expression datasetsmeasured during the life cycle ofD melanogaster and the cellcycle of S cerevisiae
The first microarray dataset is the gene expression timeseries collected by Spellman et al [28] We employed partof the cell cycle network of S cerevisiae extracted from theKEGG database [29] shown in Figure 6 As for time seriesdata we combined and employed four sets of time series data(alpha cdc15 cdc28 and elu) in [28] that were obtained infour different experiments We adopted the datasets of 10genes with 71 time points including three change time pointsSince there were several expression values that were far from
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
10 BioMed Research International
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 01
(a)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 03
(b)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 05
(c)
0
005
01
015
02
025
5 10 15 20 25 30
Poste
rior p
roba
bilit
y
Time point
Observation error level = 07
(d)
Figure 5 Results on inference of change point using ARTIVA algorithm with synthetic data
YOX1CLN1
MCM1
CDC28
SWI4
SWI6CLN3
MBP1
CLB5
YHP1
Figure 6 Structure of part of the yeast cell cycle network
the average in the cdc15 dataset these values were discardedAs a result the alpha cdc15 cdc28 and elu datasets consistof 18 23 17 and 13 time points of gene expression datarespectively
The second microarray dataset is the gene expressiontime series from experiments by Arbeitman et al [30] Thisdata set includes the expression levels of 4028 genes with 67
time points spanning four distinct stages embryonic (31 timepoints) larval (10 time points) pupal (18 time points) andadulthood (8 time points) in the D melanogaster life cycle
We used the expression datasets of 30 genes selected fromthis microarray data with 67 time points which include threechange time points
In this computational analysis with regard to applying thetwo different types of microarray datasets we generated 200
datasets that were obtained by slightly perturbing the originaldata in order to avoid numerical calculation errors Sincethe correct time-varying networks are not known we onlyevaluated the time point errors and the average CPU timewhere 119870 = 3 and 119867 = 0 were used with the S cerevisiae
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 11
Table 2 Result on inference of change points in S cerevisiae data
119888119894(correct answer) 119888
1015840
119894(DPLSQ-TV) 119888
1015840
119894(DPLSQ-HS) ARTIVA
119894 = 1 25 23 23 24119894 = 2 40 40 40 mdash119894 = 3 56 58 58 60
CPU time (sec) 1414718 90880 mdash
Table 3 Result on inference of change points in D melanogasterdata
119888119894
(correct answer)1198881015840
119894
(DPLSQ-TV)1198881015840
119894
(DPLSQ-HS)119894 = 1 31 19 19119894 = 2 41 31 31119894 = 3 60 42 42
CPU time (sec) 12156082 262096
dataset and 119870 = 2 and 119867 = 0 were used with the Dmelanogaster dataset
The results are shown in Tables 2 and 3 119888119894s are the
values of the change point in the original data and 1198881015840
119894s are
the estimated values In the experimental analysis with Scerevisiae data as for the change time points there seems tobe almost no difference between the results of DPLSQ-TVand DPLSQ-HS which can correctly identify the time pointswhere the network topology changes It is also observedfrom Table 2 that the CPU time required for DPLSQ-HSis about 15 times faster than that needed for DPLSQ-TVIn the experiments using data from D melanogaster it isseen from Table 3 that both methods can determine exactlythe same three change points At first glance readers maythink that the errors are large at all change point positionsHowever both methods could precisely identify two timepoints when topology of the network changes excluding thecase of 119894 = 3 From the point of view of computationaltime DPLSQ-HS performs significantly better than DPLSQ-TV DPLSQ-HS runs about 46 times faster than DPLSQ-TVTherefore DPLSQ-HS allows us to significantly decrease thecomputational timeThese results suggest that inmany caseswe can expect DPLSQ-HS to find a near-optimal solutionat least for change time points while also speeding up thecalculation
Furthermore for the ARTIVA analysis we employedboth the above-mentioned S cerevisiae and D melanogastermicroarray datasets which consist of 71 measurements of10 genes and 67 measurements of 30 genes respectivelyand tried to identify the change time points Computationalexperiments on ARTIVA were performed under the samecomputational environment as that used in our methods
The results from the yeast microarray data are shown inTable 2 There are three change time points as described inthis table It is seen from this table that two of them 24 and60 can be determined precisely by ARTIVA but the third isnot In contrast our proposed methods demonstrate goodperformance for inferring the change points at which thenetwork topology changes Lebre et al [15] demonstrated the
Table 4 Comparative experiment for inference of change points
119888119894
(correct answer)1198881015840
119894
(DPLSQ-HS) ARTIVA
119894 = 1 mdash 19 18-19119894 = 2 31 31 31ndash33119894 = 3 40 42 41ndash43119894 = 4 60 55 59ndash61
CPU time (sec) 260640 mdash
number of identified change points withDmelanogaster datausing the ARTIVA algorithm According to this validationit has been observed that the time intervals 18-19 31ndash33 41ndash43 and 59ndash61 contain more than 40 of all change points Inorder to compare with the ARTIVA results we attempted toidentify four change points using our proposedmethodsTheresults of the comparative experiment using D melanogastermicroarray data are shown in Table 4 119888
119894s are three change
time points in original data Although DPLSQ-HS identifiedchange time points similar to those identified byARTIVA theresults of ARTIVA appear to be slightly better This suggeststhat the ARTIVA algorithm shows slightly better perfor-mance with respect to the inference of change points than ourproposed methods However ARTIVA does not determinechange time positions but determines time intervals at whichthe network topologymight changeTherefore DPLSQ-HS ismore suited for identifying change time positions at all-timepoints (Since the comparative experiment byDPLSQ-TVdidnot finish within 3 weeks the results of DPLSQ-TV are notgiven in Table 4)
5 Conclusion
In this paper we have proposed two novel network com-pletion methods for time-varying networks by extendingour previous method DPLSQ [20] In order to identify thechange time points and sets of modified edges in networkcompletion we developed two different types of doubleDP algorithms The first algorithm DPLSQ-TV is intendedto complete and precisely infer time-varying networksAlthough DPLSQ-TV allows us to guarantee the optimalityof its solution it requires a large amount of computationaltime as the size of the network grows
To improve the computational efficiency of DPLSQ-TVwe developed an effective heuristic method DPLSQ-HS byspeeding up the calculation of the minimum least squareserror by posing restrictions to the number of combinationsof incoming nodes We showed that each of these two
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
12 BioMed Research International
methods works in polynomial time if the maximum indegreeis bounded by a constant
The results of computational experiments reveal thatthe two proposed methods can identify change time pointsrather accurately and can infer edges to be deleted andadded with good accuracy DPLSQ-TV provided a widerange of applications not only in network completion butalso in network inference with good accuracy AdditionallyDPLSQ-HS allowed us to identify change time points ratherprecisely while reducing the computational time for bothsynthetic data and microarray data This result suggests thatin many cases DPLSQ-HS can be expected to find near-optimal solutions while speeding up the calculation
Although DPLSQ-HS is much faster than DPLSQ-TV ithas a drawback the accuracy and time point error were worsethan those by DPLSQ-TV especially when the observationerror level was large Therefore we need to improve theaccuracy of DPLSQ-HS without significantly underminingits efficiency In our experiments we specified the numberof change time points and the number of edges to beadded and deleted In real use we may examine severalvalues and select the best one (eg the values with theminimum least squares errors) However as discussed inSection 41 it may lead to overfitting In order to avoidoverfitting use of AIC (Akaikersquos Information Criterion) orother information criteria is useful as demonstrated in [27]for network inference However since network completion ismore complex than network inference the method in [27]cannot be directly applied Therefore incorporation of anappropriate information criterion into network completionis important future work Another issue to be tackled isto take into account the relationship between 119866
119894and 119866
119894+1
Although 119866119894and 119866
119894+1are inferred independently from the
original network 119866 by the proposed method there should besome strong relationship between them Therefore such anextension is also important future work
Conflict of Interests
The authors declare that they have no conflict of interests
Acknowledgments
The authors would like to thank Professor Hideo Matsuda inOsakaUniversity andTakanoriHasegawa inKyotoUniversityfor helpful discussions This work was partially supported byJSPS Japan (Grants-in-Aid 22240009 and 22650045)
References
[1] K-H Cho S-M Choo S H Jung J-R Kim H-S Choi andJ Kim ldquoReverse engineering of gene regulatory networksrdquo IETSystems Biology vol 1 no 3 pp 149ndash163 2007
[2] H Hache H Lehrach and R Herwig ldquoReverse engineering ofgene regulatory networks a comparative studyrdquo Eurasip Journalon Bioinformatics and Systems Biology vol 2009 Article ID617281 2009
[3] M Hecker S Lambecka S Toepferb E van Somerenc and RGuthkea ldquoGene regulatory network inference data integration
in dynamic models a reviewrdquo BioSystems vol 96 pp 86ndash1032009
[4] S Liang S Fuhrman andR Somogyi ldquoReveal a general reverseengineering algorithm for inference of genetic network archi-tecturesrdquo Proceedings of the Pacific Symposium on Biocomputingvol 3 pp 18ndash29 1998
[5] T Akutsu S Miyano and S Kuhara ldquoInferring qualitativerelations in genetic networks and metabolic pathwaysrdquo Bioin-formatics vol 16 no 8 pp 727ndash734 2000
[6] N Friedman M Linial I Nachman and D Persquoer ldquoUsingBayesian networks to analyze expression datardquo Journal ofComputational Biology vol 7 no 3-4 pp 601ndash620 2000
[7] S Imoto S Kim T Goto et al ldquoBayesian network and nonpara-metric heteroscedastic regression for nonlinear modeling ofgenetic networkrdquo Journal of Bioinformatics and ComputationalBiology vol 1 no 2 pp 231ndash252 2003
[8] TThorne andM PH Stumpf ldquoInference of temporally varyingBayesian networksrdquoBioinformatics vol 28 pp 3298ndash3305 2012
[9] P DrsquoHaeseleer S Liang and R Somogyi ldquoGenetic networkinference from co-expression clustering to reverse engineer-ingrdquo Bioinformatics vol 16 no 8 pp 707ndash726 2000
[10] Y Wang T Joshi X Zhang D Xu and L Chen ldquoInferringgene regulatory networks from multiple microarray datasetsrdquoBioinformatics vol 22 no 19 pp 2413ndash2420 2006
[11] H Toh and K Horimoto ldquoInference of a genetic network by acombined approach of cluster analysis and graphical Gaussianmodelingrdquo Bioinformatics vol 18 no 2 pp 287ndash297 2002
[12] R Yoshida S Imoto and T Higuchi ldquoEstimating time-dependent gene networks from time series microarray data bydynamic linear models with Markov switchingrdquo in Proceedingsof the 2005 IEEE Computational Systems Bioinformatics Confer-ence (CSB rsquo05) pp 289ndash298 August 2005
[13] A Fujita J R Sato H M Garay-Malpartida P A MorettinM C Sogayar and C E Ferreira ldquoTime-varying modeling ofgene expression regulatory networks using the wavelet dynamicvector autoregressivemethodrdquoBioinformatics vol 23 no 13 pp1623ndash1630 2007
[14] J W Robinson and A J Hartemink ldquoNon-stationary dynamicBayesian networksrdquo in Proceedings of the 22nd Annual Confer-ence on Neural Information Processing Systems (NIPS rsquo08) pp1369ndash1376 December 2008
[15] S Lebre J Becq F Devaux M P H Stumpf and G LelandaisldquoStatistical inference of the time-varying structure of gene-regulation networksrdquo BMC Systems Biology vol 4 p 130 2010
[16] YW Teh andM I Jordan ldquoHierarchical Bayesian nonparamet-ric models with applicationsrdquo in Bayesian Nonparametrics pp158ndash207 Cambridge University Press Cambridge UK 2010
[17] G Rassol and N Bouaynaya ldquoInference of time-varying genenetworks using constrained and smoothed Kalman filteringrdquoin Proceedings of the International Workshop on Genomic SignalProcessing and Statistics pp 172ndash175 2012
[18] A Ahmed L Song and E P Xing ldquoTime-varying networksrecovering temporally rewiring genetic networks during the lofecycle of drosophilardquo SCS Technical Report Collection CMU-ML-08-118 2008
[19] T Akutsu T Tamura and K Horimoto ldquoCompleting networksusing observed datardquo in Proceedings of the 20th InternationalConference on Algorithmic Learning Theory pp 126ndash140 2009
[20] N Nakajima T Tamura Y Yamanishi K Hiromoto and TAkutsu ldquoNetwork completion using dynamic programmingand least-squares fittingrdquoThe ScientificWorld Journal vol 2012Article ID 957620 8 pages 2012
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
BioMed Research International 13
[21] A Clauset C Moore and M E J Newman ldquoHierarchicalstructure and the prediction of missing links in networksrdquoNature vol 453 no 7191 pp 98ndash101 2008
[22] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Academy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[23] M Kim and J Leskovec ldquoThe network completion probleminferring missing nodes and edges in networksrdquo in Proceedingsof the 2011 SIAM International Conference on Data Mining pp47ndash58 2011
[24] S Hanneke and E P Xing ldquoNetwork completion and surveysamplingrdquo Journal of Machine Learning Research vol 5 pp209ndash215 2009
[25] N Nakajima and T Akutsu ldquoNetwork completion for time-varying genetic networksrdquo in Proceedings of the 7th Interna-tional Conference on Complex Intelligent and Software IntensiveSystems vol 2013 pp 553ndash558 2013
[26] S Kim H Li E R Dougherty et al ldquoCanMarkov chainmodelsmimic biological regulationrdquo Journal of Biological Systems vol10 no 4 pp 337ndash357 2002
[27] N Noman L Palafox and H Iba ldquoOn model selection criteriain reverse engineering gene networks using RNN modelrdquo inProceedings of International Conference on Convergence andHybrid Information Technology pp 155ndash164 2012
[28] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998
[29] M Kanehisa S Goto M Furumichi M Tanabe and MHirakawa ldquoKEGG for representation and analysis of molecularnetworks involving diseases and drugsrdquoNucleic Acids Researchvol 38 no 1 Article ID gkp896 pp D355ndashD360 2009
[30] M N Arbeitman E E M Furlong F Imam et al ldquoGeneexpression during the life cycle of Drosophila melanogasterrdquoScience vol 297 no 5590 pp 2270ndash2275 2002
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Anatomy Research International
PeptidesInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
International Journal of
Volume 2014
Zoology
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Molecular Biology International
GenomicsInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioinformaticsAdvances in
Marine BiologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Signal TransductionJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
Evolutionary BiologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Biochemistry Research International
ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Genetics Research International
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Virolog y
Hindawi Publishing Corporationhttpwwwhindawicom
Nucleic AcidsJournal of
Volume 2014
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Enzyme Research
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Microbiology