11
Automatica 49 (2013) 3520–3530 Contents lists available at ScienceDirect Automatica journal homepage: www.elsevier.com/locate/automatica Efficient Bayesian spatial prediction with mobile sensor networks using Gaussian Markov random fields Yunfei Xu a , Jongeun Choi a,b,1 , Sarat Dass c,d , Tapabrata Maiti d a Department of Mechanical Engineering, Michigan State University, East Lansing, MI 48824, USA b Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA c Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar, 31750 Tronoh, Perak, Malaysia d Department of Statistics & Probability, Michigan State University, East Lansing, MI 48824, USA article info Article history: Received 30 January 2012 Received in revised form 19 February 2013 Accepted 20 August 2013 Available online 15 October 2013 Keywords: Gaussian Markov random fields Mobile sensor networks abstract In this paper, we consider the problem of predicting a large scale spatial field using successive noisy measurements obtained by mobile sensing agents. The physical spatial field of interest is discretized and modeled by a Gaussian Markov random field (GMRF) with uncertain hyperparameters. From a Bayesian perspective, we design a sequential prediction algorithm to exactly compute the predictive inference of the random field. The main advantages of the proposed algorithm are: (1) the computational efficiency due to the sparse structure of the precision matrix, and (2) the scalability as the number of measurements increases. Thus, the prediction algorithm correctly takes into account the uncertainty in hyperparameters in a Bayesian way and is also scalable to be usable for mobile sensor networks with limited resources. We also present a distributed version of the prediction algorithm for a special case. An adaptive sampling strategy is presented for mobile sensing agents to find the most informative locations in taking future measurements in order to minimize the prediction error and the uncertainty in hyperparameters simultaneously. The effectiveness of the proposed algorithms is illustrated by numerical experiments. © 2013 Published by Elsevier Ltd 1. Introduction In recent years, there has been an increasing exploitation of navigation of mobile sensor networks and robotic sensors interact- ing with uncertain environments (Choi, Oh, & Horowitz, 2009; Le Ny & Pappas, 2013; Leonard et al., 2007; Lynch, Schwartz, Yang, & Freeman, 2008; Stanković & Stipanović, 2010; Xu, Choi, & Oh, 2011; Zhang, Siranosian, & Krstić, 2007). A necessity in such scenarios is to design algorithms to process collected observations from envi- ronments (e.g., distributed estimators) for robots such that either the local information about the environment can be used for local control actions or the global information can be estimated asymp- totically. The approach to designing such algorithms takes two different paths depending on whether it uses an environmental This work has been supported by the National Science Foundation through CAREER Award CMMI-0846547. This support is gratefully acknowledged. The material in this paper was partially presented at the 2012 American Control Conference (ACC12), June 27–29, 2012, Montreal, Canada. This paper was recommended for publication in revised form by Associate Editor Brett Ninness under the direction of Editor Torsten Söderström. E-mail addresses: [email protected] (Y. Xu), [email protected] (J. Choi), [email protected] (S. Dass), [email protected] (T. Maiti). 1 Tel.: +1 517 432 3164; fax: +1 517 353 1750. model in space and time or not. Without environmental models, extremum seeking control has been proven to be very effective for finding the source of a signal (chemical, electromagnetic, etc.) (Stanković & Stipanović, 2010; Zhang et al., 2007). The drawback of extremum seeking control is that it limits its task to finding the maximum (or minimum) point of the environmental field. A unifying framework of distributed stochastic gradient algorithms that can deal with coverage control, spatial partitioning, and dy- namic vehicle routing problems in the absence of a priori knowl- edge of the event location distribution, has been presented in Le Ny and Pappas (2013). However, to tackle a variety of useful tasks such as exploration, estimation, prediction and maximum seek- ing of a scalar field, etc., it is essential for robots to have a spatial (and temporal) field model (Choi et al., 2009; Cortés, 2009; Gra- ham & Cortés, 2009, 2012; Krause, Singh, & Guestrin, 2008; Leonard et al., 2007; Lynch et al., 2008; Varagnolo, Pillonetto, & Schen- ato, 2012; Xu & Choi, 2012a,b; Xu, Choi, Dass, & Maiti, 2012b; Xu et al., 2011). Although control algorithms for mobile robots have been developed based on computationally demanding, physics- based field models (e.g., atmospheric dispersion modeling), re- cently phenomenological and statistical modeling techniques such as kriging, Gaussian processes regression, and kernel regres- sion have gained much attention for resource-constrained mobile robots. Among phenomenological spatial models, adaptive control of multiple robotic sensors based on a parametric approach needs 0005-1098/$ – see front matter © 2013 Published by Elsevier Ltd http://dx.doi.org/10.1016/j.automatica.2013.09.008

Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Automatica 49 (2013) 3520–3530

Contents lists available at ScienceDirect

Automatica

journal homepage: www.elsevier.com/locate/automatica

Efficient Bayesian spatial prediction with mobile sensor networksusing Gaussian Markov random fields✩

Yunfei Xu a, Jongeun Choi a,b,1, Sarat Dass c,d, Tapabrata Maiti da Department of Mechanical Engineering, Michigan State University, East Lansing, MI 48824, USAb Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USAc Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar, 31750 Tronoh, Perak, Malaysiad Department of Statistics & Probability, Michigan State University, East Lansing, MI 48824, USA

a r t i c l e i n f o

Article history:Received 30 January 2012Received in revised form19 February 2013Accepted 20 August 2013Available online 15 October 2013

Keywords:Gaussian Markov random fieldsMobile sensor networks

a b s t r a c t

In this paper, we consider the problem of predicting a large scale spatial field using successive noisymeasurements obtained by mobile sensing agents. The physical spatial field of interest is discretized andmodeled by a Gaussian Markov random field (GMRF) with uncertain hyperparameters. From a Bayesianperspective, we design a sequential prediction algorithm to exactly compute the predictive inference ofthe random field. The main advantages of the proposed algorithm are: (1) the computational efficiencydue to the sparse structure of the precisionmatrix, and (2) the scalability as the number of measurementsincreases. Thus, the prediction algorithm correctly takes into account the uncertainty in hyperparametersin a Bayesian way and is also scalable to be usable for mobile sensor networks with limited resources.We also present a distributed version of the prediction algorithm for a special case. An adaptivesampling strategy is presented for mobile sensing agents to find the most informative locations in takingfuture measurements in order to minimize the prediction error and the uncertainty in hyperparameterssimultaneously. The effectiveness of the proposed algorithms is illustrated by numerical experiments.

© 2013 Published by Elsevier Ltd

1. Introduction

In recent years, there has been an increasing exploitation ofnavigation ofmobile sensor networks and robotic sensors interact-ing with uncertain environments (Choi, Oh, & Horowitz, 2009; LeNy & Pappas, 2013; Leonard et al., 2007; Lynch, Schwartz, Yang, &Freeman, 2008; Stanković & Stipanović, 2010; Xu, Choi, &Oh, 2011;Zhang, Siranosian, & Krstić, 2007). A necessity in such scenarios isto design algorithms to process collected observations from envi-ronments (e.g., distributed estimators) for robots such that eitherthe local information about the environment can be used for localcontrol actions or the global information can be estimated asymp-totically. The approach to designing such algorithms takes twodifferent paths depending on whether it uses an environmental

✩ This work has been supported by the National Science Foundation throughCAREER Award CMMI-0846547. This support is gratefully acknowledged. Thematerial in this paper was partially presented at the 2012 American ControlConference (ACC12), June 27–29, 2012, Montreal, Canada. This paper wasrecommended for publication in revised form by Associate Editor Brett Ninnessunder the direction of Editor Torsten Söderström.

E-mail addresses: [email protected] (Y. Xu), [email protected] (J. Choi),[email protected] (S. Dass), [email protected] (T. Maiti).1 Tel.: +1 517 432 3164; fax: +1 517 353 1750.

0005-1098/$ – see front matter© 2013 Published by Elsevier Ltdhttp://dx.doi.org/10.1016/j.automatica.2013.09.008

model in space and time or not. Without environmental models,extremum seeking control has been proven to be very effectivefor finding the source of a signal (chemical, electromagnetic, etc.)(Stanković & Stipanović, 2010; Zhang et al., 2007). The drawbackof extremum seeking control is that it limits its task to findingthe maximum (or minimum) point of the environmental field. Aunifying framework of distributed stochastic gradient algorithmsthat can deal with coverage control, spatial partitioning, and dy-namic vehicle routing problems in the absence of a priori knowl-edge of the event location distribution, has been presented in LeNy and Pappas (2013). However, to tackle a variety of useful taskssuch as exploration, estimation, prediction and maximum seek-ing of a scalar field, etc., it is essential for robots to have a spatial(and temporal) field model (Choi et al., 2009; Cortés, 2009; Gra-ham&Cortés, 2009, 2012; Krause, Singh, &Guestrin, 2008; Leonardet al., 2007; Lynch et al., 2008; Varagnolo, Pillonetto, & Schen-ato, 2012; Xu & Choi, 2012a,b; Xu, Choi, Dass, & Maiti, 2012b; Xuet al., 2011). Although control algorithms for mobile robots havebeen developed based on computationally demanding, physics-based field models (e.g., atmospheric dispersion modeling), re-cently phenomenological and statistical modeling techniquessuch as kriging, Gaussian processes regression, and kernel regres-sion have gained much attention for resource-constrained mobilerobots. Among phenomenological spatial models, adaptive controlof multiple robotic sensors based on a parametric approach needs

Page 2: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Y. Xu et al. / Automatica 49 (2013) 3520–3530 3521

a persistent excitation (PE) condition for convergence of parame-ters (Choi et al., 2009), while control strategies based on Bayesianspatial models do not require such conditions, (e.g., by utilizingpriori distributions as in Kalman filtering (Lynch et al., 2008) orGaussian process regression (Xu et al., 2011)). Hence, control en-gineers become more aware of the usefulness of nonparametricBayesian approaches such as Gaussian processes (defined bymeanand covariance functions) (Cressie, 1986; Rasmussen & Williams,2006) to statisticallymodel physical phenomena for the navigationof mobile sensor networks e.g., Cortés (2009); Graham and Cortés(2009, 2012); Krause et al. (2008); Leonard et al. (2007); Xu andChoi (2012a); Xu et al. (2012b, 2011). Other more data-driven ap-proaches have also developed (without a statistical structure usedin Gaussian processes) such as using kernel regression (Xu & Choi,2012b) and in reproducing kernel Hilbert spaces (Varagnolo et al.,2012). However, such an approachwithout a statistical structure ina random field (as in Xu and Choi (2012b), Varagnolo et al. (2012))requires usually more observations than the one with a statisticalstructure for a decent prediction quality.

The significant computational complexity in Gaussian pro-cess regression due to the growing number of observations (andhence the size of covariance matrix) has been tackled in differentways (Herbrich, Lawrence, & Seeger, 2002; Seeger, 2003; Smola &Bartlett, 2001; Tresp, 2000; Williams & Seeger, 2001). In Xu et al.(2011), the authors analyzed the conditions under which near-optimal prediction can be achieved using only truncated observa-tions when the covariance function is known a priori. On the otherhand, unknown hyperparameters in the covariance function canbe estimated by a maximum likelihood (ML) estimator or a max-imum a posteriori (MAP) estimator and then be used in the pre-diction as true hyperparameters as in an empirical Bayes approach(Xu & Choi, 2011). However, the point estimate (ML or MAP esti-mate) itself needs to be identified using a sufficient amount ofmea-surements and it fails to fully incorporate the uncertainty in theestimated hyperparameters into the prediction in a fully Bayesianperspective.

The advantage of a fully Bayesian approach is that the uncer-tainty in the model parameters are incorporated in the predic-tion (Bishop, 2006). However, the solution often requires MarkovChain Monte Carlo (MCMC) methods, which greatly increases thecomputational complexity. In Graham and Cortés (2009), an iter-ative prediction algorithm without resorting to MCMC methodshas been developed based on analytical closed-form solutions fromresults in Gaudard, Karson, Linder, and Sinha (1999), by assum-ing that the bandwidths in the covariance function of the spatio-temporal Gaussian random field are known a priori. In Xu, Choi,Dass, andMaiti (2011), the authors designed a sequential Bayesianprediction algorithm to deal with unknown bandwidths by using acompactly supported kernel and selecting a subset of collectedmeasurements. In contrast to Xu et al. (2011), in this paper, weseek a fully Bayesian approach over a discretized surveillance re-gion such that the Bayesian spatial prediction utilizes all collectedmeasurements in a scalable fashion.

Recently, there have been efforts to fit a computationally effi-cient Gaussian Markov random field (GMRF) on a discrete latticeto a Gaussian random field on a continuum space (Cressie & Verze-len, 2008; Hartman & Hössjer, 2008; Rue & Tjelmeland, 2002). Ithas been demonstrated that GMRFs with small neighborhoods canapproximate Gaussian fields surprisingly well (Rue & Tjelmeland,2002). This approximated GMRF and its regression are very attrac-tive for resource-constrained mobile sensor networks due to itscomputational efficiency and scalability (Le Ny & Pappas, 2010)as compared to the standard Gaussian process and its regression.Fast kriging of large data sets by using a GMRF as an approxima-tion of a Gaussian field has been proposed in Hartman and Hössjer(2008). In Xu and Choi (2012a), the authors provided a new class of

Gaussian processes that builds on a GMRF and derived formulas forpredictive statistics. However, they both assume the precisionma-trix which is the inverse of the covariance matrix is given or esti-mated a priori.

The contributions of the paper are as follows. First, we modelthe physical spatial field as a GMRF with uncertain hyperparame-ters and formulate the estimation problem from a Bayesian pointof view. Second, we design an sequential Bayesian estimation al-gorithm to effectively and efficiently compute the exact predictiveinference of the spatial field. The proposed algorithm often takesonly seconds to run even for a very large spatial field, as will bedemonstrated in this paper. Moreover, the algorithm is scalable inthe sense that the running time does not grow as the number ofobservations increases. In particular, the scalable prediction algo-rithm does not rely on the subset of samples to obtain scalability(as was done in Xu et al. (2011)), correctly fusing all collectedmea-surements. Thus, in contrast to previous works (Xu & Choi, 2012a;Xu et al., 2011), our prediction algorithm correctly takes into ac-count the uncertainty in hyperparameters in a Bayesian way and isalso scalable to be usable for mobile sensor networks with limitedresources. We then propose a distributed prediction algorithm fora special case such that sensing agents make a prediction by ex-changing only local information with their neighbors. An adaptivesampling strategy for mobile sensor networks is proposed at theend to possibly improve the quality of prediction and to reduce theuncertainty in hyperparameter estimation simultaneously. A pre-liminary version of this paper has been reported in Xu, Choi, Dass,and Maiti (2012a).

The paper is organized as follows. In Section 2, we introduce aspatial field model based on a GMRF, and amobile sensor network.In Section 3, we introduce a Bayesian inference approach to esti-mate the spatial field of interest. A sequential Bayesian predictionalgorithm is proposed in Section 4 to dealwith computational com-plexity. A distributed prediction is then proposed in Section 5. Anadaptive sampling algorithm is proposed in Section 6 for mobilesensing agents in order to minimize the prediction error and theuncertainty in hyperparameters simultaneously. We demonstratethe effectiveness through a simulation study in Section 7.

Standardnotationwill be used throughout thepaper. LetR,R>0,Z>0 denote, respectively, the sets of real, positive real, and posi-tive integer numbers. The positive definiteness of a matrix A is de-noted by A ≻ 0. Let E, and Corr denote, respectively, the operatorsof expectation, and correlation. A random vector x, which has amultivariate normal distribution of mean vector µ and covariancematrix Σ , is denoted by x ∼ N (µ, Σ). For a set A, the absolutecomplement of a subset B ⊂ A is denoted by −B. Let ◦ denotethe element-wise product. Let O(1) denote the running time of analgorithm, which is a constant. Other notation will be explained indue course.

2. Preliminaries

The objective of this paper is to design a sequential Bayesianprediction algorithm and an adaptive sampling algorithm formobile sensing agents to consistently and accurately predict aspatial field of interest. In what follows, we specify the models forthe spatial field and the mobile sensor network.

2.1. Spatial field model

Let F∗ ⊂ Rd denote the spatial field of interest. We discretizethe field into n∗ square areas, whose centers are denoted by S∗ :=

{s1, . . . , sn∗}. Let z∗ = (z1, . . . , zn∗

)T ∈ Rn∗ be the collection ofvalues of the field (e.g., the temperature) on the spatial sites in S∗.Due to the irregular shape a spatial field may have, we extend thefield to F such that n ≥ n∗ sites denoted by S := {s1, . . . , sn}

Page 3: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

3522 Y. Xu et al. / Automatica 49 (2013) 3520–3530

Fig. 1. Example of a spatial field. Much finer grids will be used for real applications.

Fig. 2. Elements of the precision matrix Q related to a single location.

are on a regular grid as can be seen in Fig. 1. Notice that we haveS∗ ⊆ S. The latent variable zi = z(si) ∈ R is modeled by

zi = µ(si) + ηi, ∀1 ≤ i ≤ n, (1)

where si ∈ S ⊂ Rd is the i-th site location. The mean functionµ : Rd

→ R is defined as

µ(si) = f (si)Tβ,

where f (si) = (f1(si), . . . , fp(si))T ∈ Rp is a known regressionfunction, and β = (β1, . . . , βp)

T∈ Rp is an unknown vector of re-

gression coefficients. We define η = (η1, . . . , ηn)T

∈ Rn as a zero-mean Gaussian Markov random field (GMRF) (Rue & Held, 2005)denoted by

η ∼ N0,Q−1

η|θ

,

where the inverse covariance matrix (or precision matrix) Qη|θ ∈

Rn×n is a function of a hyperparameter vector θ ∈ Rm.There exist many different choices of the GMRF (i.e., the preci-

sion matrix Qη|θ ) (Rue & Held, 2005). For instance, Fig. 2 displaysone choice of the elements of the precision matrix related to a sin-gle location. The associated full conditionals are shown in (2) givenin Box I (with obvious notation as shown in Rue and Held (2005)).The hyperparameter vector is defined as θ = (κ, α)T ∈ R2

>0,whereα = a−4. The resultingGMRF accurately represents aGaus-sian random field with the Matérn covariance function (Lindgren,Rue, & Lindström, 2011)

C(r) = σ 2f21−ν

Γ (ν)

√2νrℓ

ν

√2νrℓ

,

where Kν(·) is a modified Bessel function (Rasmussen & Williams,2006), with order ν = 1, a bandwidth ℓ = 1/

√α, and vertical scale

σ 2f = 1/(4πακ). The hyperparameter α > 0 guarantees the pos-

itive definiteness of the precision matrix Qη|θ . In the case whereα = 0, the resulting GMRF is a second-order polynomial intrin-sic GMRF (Rue & Held, 2005; Rue, Martino, & Chopin, 2009). No-tice that the precision matrix is sparse which contains only a smallnumber of non-zero elements. This property will be exploited forfast computation in the following sections.

a

b

c

Fig. 3. Numerically generated spatial fields defined in (1) with µ(si) = β = 20,and Qη|θ constructed using (2) (see Box I) with hyperparameters being (a) θ =

(4, 0.0025)T , (b) θ = (1, 0.01)T , and (c) θ = (0.25, 0.04)T .

Example 1. Consider a spatial field of interest F∗ ∈ (0, 100) ×

(0, 50). We first divide the spatial field into a 100×50 regular gridwith equal areas 1, which makes n∗ = 5000. We then extend thefield such that 120×70 grids (i.e., n = 8400) are constructed on theextended fieldF = (−10, 110)× (−10, 60). The precisionmatrixQη|θ introduced above is chosen with the regular lattices wrappedon a torus (Rue & Held, 2005). (The grid is made to be wrappedon a torus such that the initial prediction error variances are thesame among all grid points.) In this case, only 0.15% elements in thesparse matrix Qη|θ are non-zero. The numerically generated fieldswith the mean function µ(si) = β = 20, and the hyperparametervector θ = (κ, α)T being different values are shown in Fig. 3.

2.2. Mobile sensor network

Consider N spatially distributed mobile sensing agents indexedby i ∈ I = {1, . . . ,N} sampling from n∗ spatial sites in S∗. Agentsare equippedwith identical sensors and sample at time t ∈ Z>0. Attime t , agent i takes a noise corrupted measurement at its currentlocation qt,i ∈ S∗, i.e.,

yt,i = z(qt,i) + ϵt,i, ϵt,ii.i.d.∼ N (0, σ 2

w),

where measurement errors are assumed to be independent andidentically distributed (i.i.d.). The noise level σ 2

w > 0 is assumed tobe known. For notational simplicity, we denote all agents’ locationsat time t by qt = (qTt,1, . . . , q

Tt,N)T and the observationsmade by all

agents at time t by yt = (yt,1, . . . , yt,N)T . Furthermore, we denotethe collection of agents’ locations and the collective observationsfrom time 1 to t by q1:t = (qT1, . . . , q

Tt )

T , and y1:t = (yT1, . . . , yTt )

T ,respectively.

3. Bayesian predictive inference

In this section, we propose a Bayesian inference approach tomake predictive inferences of a spatial field z∗ ∈ Rn∗ .

Page 4: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Y. Xu et al. / Automatica 49 (2013) 3520–3530 3523

2)

E(ηi|η−i, θ) =1

4 + a2

2a

◦ ◦ ◦ ◦ ◦

◦ ◦ • ◦ ◦

◦ • ◦ • ◦

◦ ◦ • ◦ ◦

◦ ◦ ◦ ◦ ◦

− 2

◦ ◦ ◦ ◦ ◦

◦ • ◦ • ◦

◦ ◦ ◦ ◦ ◦

◦ • ◦ • ◦

◦ ◦ ◦ ◦ ◦

− 1

◦ ◦ • ◦ ◦

◦ ◦ ◦ ◦ ◦

• ◦ ◦ ◦ •

◦ ◦ ◦ ◦ ◦

◦ ◦ • ◦ ◦

, (

Var(ηi|η−i, θ) = (4 + a2)κ

Box I.

First,we assign the vector of regression coefficientsβ ∈ Rp witha Gaussian prior, namely β ∼ N

0, T−1

, where the precisionma-

trix T ∈ Rp×p is often chosen as a diagonal matrix with small diag-onal elements when no prior information is available. Hence, thedistribution of latent variables z given β and the hyperparametervector θ is Gaussian, i.e.,

z|β, θ ∼ NFβ,Q−1

η|θ

,

where F = (f (s1), . . . , f (sn))T ∈ Rn×p. For notational simplicity,we denote the full latent field of dimension n+p by x = (zT , βT )T .Then, for a given hyperparameter vector θ , the distribution π(x|θ)is Gaussian, obtained by

π(x|θ) = π(z|β, θ)π(β)

∝ exp

−12(z − Fβ)TQη|θ (z − Fβ) −

12βTTβ

= exp

12xTQx|θx

,

where the precision matrix Qx|θ ∈ R(n+p)×(n+p) is defined by

Qx|θ =

Qη|θ −Qη|θF

−F TQη|θ F TQη|θF + T

.

By the matrix inversion lemma, the covariance matrix Σx|θ ∈

R(n+p)×(n+p) can be obtained by

Σx|θ = Q−1x|θ =

Q−1

η|θ + FT−1F T FT−1

(FT−1)T T−1

.

At time t ∈ Z>0, we have a collection of observational datay1:t ∈ RNt obtained by the mobile sensing agents over time. LetA1:t = (A1, . . . , At) ∈ R(n+p)×Nt , whereAτ ∈ R(n+p)×N is defined by

(Aτ )ij =

1, if i ≤ n and si = qτ ,j,0, otherwise.

Then the covariance matrix of y1:t can be obtained by

R1:t = AT1:tΣx|θA1:t + P1:t ,

where P1:t = σ 2wI ∈ RNt×Nt . By Gaussian process regression (Ras-

mussen & Williams, 2006), the full conditional distribution of x isalso Gaussian, i.e.,

x|θ, y1:t ∼ N (µx|θ,y1:t , Σx|θ,y1:t ), (3)

where

Σx|θ,y1:t = Σx|θ − Σx|θA1:tR−11:t A

T1:tΣx|θ ,

µx|θ,y1:t = Σx|θA1:tR−11:t y1:t .

(4)

The posterior distribution of the hyperparameter vector θ canbe obtained via

π(θ |y1:t) ∝ π(y1:t |θ)π(θ),

where the log likelihood function is defined by

logπ(y1:t |θ) = −12yT1:tR

−11:t y1:t −

12log det R1:t −

Nt2

log 2π. (5)

If a discrete prior on the hyperparameter vector θ is chosen witha support Θ = {θ1, . . . , θL}, the posterior predictive distributionπ(x|y1:t) can be obtained by

π(x|y1:t) =

π(x|θℓ, y1:t)π(θℓ|y1:t). (6)

The predictive mean and variance then follow as

µxi|y1:t =

µxi|θℓ,y1:t π(θℓ|y1:t),

σ 2xi|y1:t =

σ 2xi|θℓ,y1:t π(θℓ|y1:t)

+

(µxi|θℓ,y1:t − µxi|y1:t )2π(θℓ|y1:t),

(7)

where µxi|θℓ,y1:t is the i-th element in µx|θℓ,y1:t , and σ 2xi|θℓ,y1:t

is thei-th diagonal element in Σx|θℓ,y1:t .

Remark 2. The discrete prior π(θ) greatly reduces the computa-tional complexity in that it enables summation in (6) instead ofnumerical integration which has to be performed for a continu-ous prior distribution. However, the computation of the full con-ditional distribution π(x|θ, y1:t) in (4) and the likelihood π(y1:t |θ)(5) still requires the inversion of the covariance matrix R1:t , whosesize grows as time t increases. Thus, the running time grows asnewobservations are collected and itwill soon become intractable.This problem will be addressed in the following sections thanks tothe sparsity structure of the precisionmatrix brought by the GMRFmodel.

4. Sequential Bayesian inference

In this section, we exploit the sparsity of the precision matrix,and derive a sequential Bayesian prediction algorithm which canbe performed in constant time as the number of observationsincreases.

4.1. Sequential update on full conditional distribution

First, we derive the update rule for the full conditional dis-tribution in (3). From here on, we will use Qt|θ := Qx|θ,y1:t andµt|θ := µx|θ,y1:t , for notational simplicity. Moreover, we defineut,i ∈ Rn+p as

(ut,i)j =

1, if j ≤ n and sj = qt,i,0, otherwise.

We have the following propositions.

Proposition 3. The full conditional distribution in (3) can beobtained by

x|θ, y1:t ∼ N (Q−1t|θ bt ,Q−1

t|θ ), (8)

where

Qt|θ = Qt−1|θ +1σ 2

w

Ni=1

ut,iuTt,i,

bt = bt−1 +1σ 2

w

Ni=1

ut,iyt,i,

(9)

Page 5: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

3524 Y. Xu et al. / Automatica 49 (2013) 3520–3530

with initial conditions

Q0|θ := Qx|θ,y1:0 = Qx|θ , and b0 = 0.

Proof. The result followed by the Woodbury identity and simplealgebra.

Proposition 4. The full conditional mean and variance can beupdated by

µt|θ = Q−1t|θ bt ,

diag(Q−1t|θ ) = diag(Q−1

t−1|θ ) −

Ni=1

ht,i|θ ◦ ht,i|θ

σ 2w + uT

t,iht,i|θ,

(10)

where

ht,i|θ = B−1t,i|θut,i,

Bt,i|θ = Qt−1|θ +1σ 2

w

ij=1

ut,juTt,j.

Proof. The result followed by the Sherman–Morrison formula andsimple algebra.

We then have the following lemma.

Lemma 5. For a given θ ∈ Θ , the full conditional mean and variance,i.e., µt|θ and diag(Q−1

t|θ ) in (10), can be updated in O(1) given Qt−1|θand bt−1.Proof. The update ofQt|θ and bt can be obviously computed in con-stant time. Then µt|θ in (10) can be obtained by solving a linearequation Qt|θµt|θ = bt . Due to the sparse structure of Qt|θ , this op-eration can be done in a very short time. Moreover, notice that Qt|θand Qt−1|θ have the same sparsity structure and hence the compu-tational complexity remains fixed. Similarly, ht,i|θ can be obtainedin O(1) and hence diag(Q−1

t|θ ) in (10) can be obtained in O(1).

4.2. Sequential update on likelihood

Next, we derive the update rule for the log likelihood function.We have the following proposition.

Proposition 6. The log likelihood function logπ(y1:t |θ) in (5) can beobtained by

logπ(y1:t |θ) = ct + gt,θ +12bTt µt|θ −

Nt2

log(2πσ 2w), (11)

where

ct = ct−1 −1

2σ 2w

Ni=1

y2t,i, c0 = 0,

gt|θ = gt−1|θ −12

Ni=1

log1 +

1σ 2

w

uTt,iht,i|θ

, g0|θ = 0,

with ht,i|θ defined in (10).Proof. See Appendix.

The computation of the likelihood function is scalable asfollows.

Lemma 7. For a given θ ∈ Θ , the log likelihood function, i.e.,logπ(y1:t |θ) can be updated in O(1).Proof. The result follows directly from Proposition 6.

4.3. Sequential update on predictive distribution

Combining the results in Lemmas 5, 7, and (6), (7), we summa-rize our results in the following theorem.

Theorem 8. The predictive distribution in (6) (or the predictive meanand variance in (7)) can be obtained in O(1) as time t increases.

The proposed sequential Bayesian prediction algorithm issummarized in Table 1.

5. Distributed prediction algorithm

In this section, we first briefly review the distributed computa-tion algorithms. We then propose a distributed approach to effec-tively implement the sequential Bayesian predictive algorithm fora special case.

5.1. Distributed computation

In this subsection, we briefly review two useful methods toconvert the centralized sequential prediction algorithm in Table 1into a distributed algorithm for a mobile sensor network.

5.1.1. Jacobi over-relaxation methodThe Jacobi over-relaxation (JOR) (Bertsekas & Tsitsiklis, 1999)

method provides an iterative solution of a linear system Ax = b,where A ∈ RN×N is a nonsingular matrix and x, b ∈ RN . If agent iknows the rowi(A) ∈ RN and bi, and aij = (A)ij = 0 if agent i andagent j are not neighbors, then the recursion is given by

x(k+1)i = (1 − h)x(k)

i +haii

bi −

j∈Ni

aijx(k)j

. (12)

This JOR algorithm converges to the solution of Ax = b from anyinitial condition if h < 2/N (Cortés, 2009). At the end of the algo-rithm, agent i knows the i-th element of x = A−1b.

In the case where the size of the matrix A is much larger thanthe number of agents N , and each agent knows multiple rows of A,a similar algorithm can be derived.

5.1.2. Discrete-time average consensusThe discrete-time average consensus (DAC) provides a way to

compute the arithmetic mean of elements in the vector c ∈ RN .Assume the graph is connected. If agent i knows the i-th element ofc , the network can compute the arithmetic mean via the followingrecursion (Olfati-Saber, Fax, & Murray, 2007)

x(k+1)i = x(k)

i + ϵj∈Ni

aij(x(k)j − x(k)

i ), (13)

with initial condition x(0) = c , where aij = 1 if j and i are neigh-bors and 0 otherwise, 0 < ϵ < 1/∆, and∆ = maxi(

j=i aij) is the

maximum degree of the network. After the algorithm converges,all nodes in the network know the average of c , i.e.,

ni=1 ci/N .

In the case where the size of c is much larger than the numberof agents N , and each agent knows multiple elements of c , a sum-mation within each agent can be performed beforehand.

5.2. Distributed implementation

To design a distributed version of the proposed algorithm, wemake the following assumptions for a robotic sensor network withlimited resources:

A1. The mean values of the field are known to be µ(si) = 0, i.e.,p = 0 and Qx|θ = Qη|θ ∈ Rn×n.

A2. The extended spatial field containing n spatial sites is parti-tioned into N subregions. Each agent is in charge of one subre-gion as shown in Fig. 4. Notice that each agent can only movewithin a part of the subregion that belongs to the real spatialfield F∗.

Page 6: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Y. Xu et al. / Automatica 49 (2013) 3520–3530 3525

Table 1Sequential Bayesian predictive algorithm.

A3. Agent i knows certain rows of Q and elements of b, denoted byQ [i] and b[i] respectively, the indices ofwhich correspond to thespatial sites within its subregion. If agent i and agent j are notneighbors, then the sub-matrix Q [i][j] is zero. N agents collec-tively have the knowledge of all rows of Q and all n elementsof b.

A4. Each agent can only communicate with its neighbors.

The distributed prediction algorithm can be obtained similarlyas the centralized algorithm (Table 1). In the distributed algorithm,the linear equations in Steps 7 and 14 can be solved in a distributedfashion by using the Jacobi over-relaxation method (Section 5.1.1)with assumption A4. In Step 15, the third term in the expression ofthe likelihood function can be easily evaluated using the discrete-time average consensus algorithm described in Section 5.1.2. Atthe end of the algorithm (Steps 18 and 19), each agent obtains themean and variance on the sites within its own subregion.

Remark 9. By assumptions A2 and A3, each agent is in charge ofthe computation for the spatial sites within its own subregion.

1

2

3

4

5

Fig. 4. Partition of the extended field Q .

Therefore, the partition of the field determines the computationalpower required for each agent. The partitioning could be createdaccording to the agents’ computational capability, for all agentsto finish each step in more or less same time. Such an intelligentpartitioning could be done by a load balancing algorithm in adistributed manner, for example, one as shown in Durham, Carli,Frasca, and Bullo (2009). In this case, if an agent is equipped witha more powerful computer than others, then a larger subregionwould be assigned to this agent.

6. Adaptive sampling

In the previous section, we have designed a sequential Bayesianprediction algorithm and its distributed version for estimating thescalar field at time t . In this section, we propose an adaptive sam-pling strategy for finding most informative sampling locations attime t + 1 for mobile sensing agents in order to improve the qual-ity of prediction and reduce the uncertainty in hyper parameterssimultaneously.

In our previous work (Xu et al., 2011), we have proposed us-ing the conditional entropy H(z∗|θ = θt , y1:t+1) as an optimalitycriterion, where θt = argmaxθ π(θ |y1:t) is the maximum a poste-riori (MAP) estimate based on the cumulative observations up tocurrent time t . Although this approach greatly simplifies the com-putation, it does not count for the uncertainty in estimating thehyperparameter vector θ .

In this paper, we propose to use the conditional entropyH(z∗, θ |y, y1:t)which represents the uncertainty remaining in bothrandom vectors z∗ and θ by knowing future measurements in therandom vector y. Notice that the measurements y1:t have beenobserved and treated as constants. It can be obtained byH(z∗, θ |y, y1:t) = H(z∗|θ, y, y1:t) + H(θ |y, y1:t)

= H(z∗|θ, y, y1:t) + H(y|θ, y1:t)+H(θ |y1:t) − H(y|y1:t).

Notice that we have the following Gaussian distributions (themeans will not be exploited and hence not shown here):z∗|θ, y, y1:t ∼ N (·, Σz∗|θ,y,y1:t ),

y|θ, y1:t ∼ N (·, Σy|θ,y1:t ),

y|y1:tapprox∼ N (·, Σy|y1:t ),

in which the last one is approximated by a Gaussian distributionwithmean and variance computed as in (7). Moreover, the entropyH(θ |y1:t) = c is a constant since y1:t is known. Since the entropyfor a multivariate Gaussian distribution has a closed-from expres-sion (Cover & Thomas, 2006), we have

H(z∗, θ |y, y1:t) =

12log

(2πe)n∗ det(Σz∗|θℓ,y,y1:t )

π(θℓ|y1:t)

+

12log

(2πe)N det(Σy|θℓ,y1:t )

π(θℓ|y1:t)

−12log

(2πe)N det(Σy|y1:t )

+ c.

Page 7: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

3526 Y. Xu et al. / Automatica 49 (2013) 3520–3530

a b c

Fig. 5. Posterior distributions of θ , i.e., π(θ |y1:t ), at (a) t = 1, (b) t = 5, and (c) t = 20 using H(z∗, θ |y, y1:t ) as the optimality criterion.

It can also be shown that

log det(Σz∗|θℓ,y,y1:t ) = log det(M−1(S∗))

= log det(M(−S∗)) − log det(M),

where M = Qt+1|θℓand M(S∗) denotes the submatrix of M formed

by the first 1 to n∗ rows and columns (recall that S∗ = {s1, . . . ,sn∗

}). Notice that the term log det(Qt+1|θℓ)(−S∗) is a constant since

agents only sample at S∗. Hence, the optimal sampling locations attime t+1 can be determined by solving the following optimizationproblem

qt+1 = arg min{qi∈Rt,i}

H(z∗, θ |y, y1:t)

= arg min{qi∈Rt,i}

− log det(Qt+1|θℓ)π(θℓ|y1:t)

+

log det(Σy|θℓ,y1:t )π(θℓ|y1:t) − log det(Σy|y1:t ),

where Rt,i = {s | ∥s − qt,i∥ ≤ r, s ∈ S∗} (in which r ∈ R>0 is themaximum distance an agent can move between time instances) isthe reachable set at time t . This combinatorial optimization prob-lem can be solved using a greedy algorithm, i.e., finding the sub-optimal sampling locations for agents in sequence.

7. Simulation

In this section, we demonstrate the effectiveness of the pro-posed sequential Bayesian inference algorithm and the adaptivesampling strategy through a numerical experiment.

Consider a spatial field introduced in Example 1. Themean func-tion is a constantβ = 20.We choose the precisionmatrixQx|θ withhyperparameters α = 0.01 equivalent to a bandwidth ℓ = 1/

√α

= 10, and κ = 1 equivalent to a vertical scale σ 2f = 1/(4πακ) ≈

8. The numerically generated field is shown in Fig. 6(a). The pre-cision matrix T of β is chosen to be 10−4. The measurement noiselevel σw = 0.2 is assumed to be known. The support of the priordistribution is chosen to be α ∈ {0.000625, 0.0025, 0.01, 0.04,0.16} and κ ∈ {0.0625, 0.25, 1, 4, 16}. Notice that the supportforα is chosen such that the bandwidth is containedwithin 2.5 and40, which is very likely for a field F of size 100 × 50. And the se-lected κ together with α cover a huge selection of σ 2

f . The discreteuniform prior distribution is chosen. N = 5 mobile sensing agentstakemeasurements at time t ∈ Z>0, starting from locations shownin Fig. 6(b) (as white dots). The maximum distance each agent cantravel between time instances is chosen to be r = 5.

Fig. 6 shows the predicted fields and the prediction error vari-ances at times t = 1, 5, 20. It can be seen that agents try to cover

the field of interest as time evolves. The predicted field (the pre-dictive mean) gets closer to the true field (see Fig. 6(a)) and theprediction error variances become smaller as more observationsare collected. Fig. 5 shows the posterior distribution of the hyper-parameters in θ . Clearly, as more measurements are obtained, thisposterior distribution becomes peaked at the true value (1, 0.01).Fig. 7(a) shows the predicted distribution of the estimated mean βas time evolves. In Fig. 7(b), we can see that the RMS error com-puted via

rms(t) =

1n∗

n∗i=1

(µzi|y1:t − zi)2,

decreases as time increases, which shows the effectiveness of theproposed scheme.

Remark 10. For comparison, we also ran the same simulation un-der the same conditions except using H(z∗|θt , y, y1:t) as the opti-mality criterion instead. The estimated β and the RMS error areshown in Fig. 9. The estimated θ at time t = 5 is shown inFig. 8. Compared to Fig. 5(b), we can see that the results usingH(z∗, θ |y, y1:t) as the optimality criterion are slightly better in thesense that θ converges to the true value faster. The RMS errorsbetween the two criteria are not much different for our simula-tion setup as illustrated by Figs. 7(b) and 9(b). These observationsfrom the two simulation results are expected since both the cri-teria minimize the uncertainty in z∗ while the proposed criterionH(z∗, θ |y, y1:t) minimizes the uncertainty in θ in addition. UsingH(z∗, θ |y, y1:t) could potentially be more useful when the uncer-tainty level in θ is large.

To show the robustness of the proposed algorithmwith respectto the selection of the support (the discrete prior distribution), weimplemented the algorithm using exactly same support for thehyperparameters. Notice that this time the field is generated withα = 0.02 and κ = 0.5 which are not on the discrete support. Theroot mean square error is shown in Fig. 10. As can be seen, evenif the true hyperparameter vector is not included in the discreteprior, the prediction is comparable.

Next, we applied our algorithm on a general case in which thetrue field is generated according to

z(j) = 20 + 10 exp

∥s(j) − [30, 20]T∥20

+ 8 exp

∥s(j) − [80, 40]T∥15

.

Page 8: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Y. Xu et al. / Automatica 49 (2013) 3520–3530 3527

a

b c

d e

f g

Fig. 6. The true field is shown in (a). Predicted fields at (b) t = 1, (d) t = 5, and (f) t = 20. Prediction error variances at (c) t = 1, (e) t = 5, and (g) t = 10. This result isobtained by using the optimality criterion H(z∗, θ |y, y1:t ). The trajectories of agents are shown as white circles with the current locations shown as white dots.

a b

Fig. 7. (a) Estimated β , and (b) root mean square error, using H(z∗, θ |y, y1:t ) as the optimality criterion.

As we can see in Fig. 11, the prediction is reasonably good with thesame discrete prior distribution.

The most important contribution is that the computation timeat each time step does not grow as the number of measurements

Page 9: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

3528 Y. Xu et al. / Automatica 49 (2013) 3520–3530

Fig. 8. Posterior distributions of θ , i.e., π(θ |y1:t ), at t = 5 using H(z∗|θt , y, y1:t ) asthe optimality criterion.

increases. For this illustrative simulation example, this fixed run-ning time using Matlab, R2009b (MathWorks) on a Mac (2.4 GHzIntel Core 2 Duo Processor) is about 30 s which is fast enough forreal-world implementation.

8. Conclusion

We have discussed the problem of predicting a large scale spa-tial field using successive noisymeasurements obtained by amulti-agent system. We modeled the spatial field of interest using aGMRF and designed a sequential prediction algorithm for comput-ing the exact predictive inference from a Bayesian point of view.The proposed algorithm is computationally efficient and scalableas the number ofmeasurements increases. A distributed predictionalgorithm was also developed for a special case such that the pre-diction can be computedwithout a central station.We designed anadaptive sampling algorithm for agents to find the sub-optimal lo-cations in order to minimize the prediction error and reduce theuncertainty in hyperparameters simultaneously. The illustrativesimulation results suggested that our computationally efficient al-gorithms can be used for robotic sensors under realistic situations,e.g., large surveillance regions. Future work will consider the ap-proximate Bayesian inference with a continuous prior on the hy-perparameter vector.

Fig. 10. The root mean square error for the robustness test, using H(z∗|θt , y, y1:t )as the optimality criterion.

Acknowledgments

The authors thank the associate editor and the anonymousreviewers for their valuable comments and suggestions.

Appendix. Proof of Proposition 6

Proof. The inverse of the covariancematrix R1:t can be obtained by

R−11:t = (AT

1:tQ−10|θ A1:t + P1:t)−1

= P−11:t − P−1

1:t AT1:t(Q0|θ + A1:tP−1

1:t AT1:t)

−1A1:tP−11:t

= P−11:t − P−1

1:t AT1:tQ

−1t|θ A1:tP−1

1:t .

Similarly, the log determinant of the covariance matrix Σ1:t can beobtained by

log det R1:t = log det(AT1:tQ

−10|θ A1:t + P1:t)

= log detI +

1σ 2

w

AT1:tQ

−10|θ A1:t

+ Nt log σ 2

w

= log det

Q0|θ +

1σ 2

w

tτ=1

Ni=1

uτ ,iuTτ ,i

− log det(Q0|θ ) + Nt log σ 2

w

=

tτ=1

log(1 + uTτQ

−1τ−1|θuτ ) + Nt log σ 2

w.

a b

Fig. 9. (a) Estimated β , and (b) root mean square error, using H(z∗|θt , y, y1:t ) as the optimality criterion.

Page 10: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

Y. Xu et al. / Automatica 49 (2013) 3520–3530 3529

a

b c

d e

f g

Fig. 11. The simulation results for a general field. True field is shown in (a). Predicted fields at (b) t = 1, (d) t = 5, and (f) t = 40. Prediction error variances at (c) t = 1,(e) t = 5, and (g) t = 40. This result is obtained by using the optimality criterion H(z∗, θ |y, y1:t ). The trajectories of agents are shown as white circles with the currentlocations shown as white dots.

Hence, we have

logπ(y1:t |θ) = −12yT1:tR

−11:t y1:t −

12log det R1:t −

Nt2

log 2π

= −12yT1:tP

−11:t y1:t +

12bTt µt|θ

−12

tτ=1

Ni=1

log(1 + uTτ ,iB

−1τ ,i|θuτ ,i)

−Nt2

log(2πσ 2w),

which completes the proof.

References

Bertsekas, D. P., & Tsitsiklis, J. N. (1999). Parallel and distributed computation:numerical methods. Englewood Cliffs: Prentice Hall.

Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.Choi, J., Oh, S., & Horowitz, R. (2009). Distributed learning and cooperative control

for multi-agent systems. Automatica, 45, 2802–2814.

Cortés, J. (2009). Distributed kriged Kalman filter for spatial estimation. IEEETransactions on Automatic Control, 54(12), 2816–2827.

Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.).Hoboken, New Jersey: John Wiley & Sons, Lnc.

Cressie, N. (1986). Kriging nonstationary data. Journal of the American StatisticalAssociation, 81(395), 625–634.

Cressie, N., & Verzelen, N. (2008). Conditional-mean least-squares fitting ofGaussian Markov random fields to Gaussian fields. Computational Statistics &Data Analysis, 52(5), 2794–2807.

Durham, J.W., Carli, R., Frasca, P., & Bullo, F. (2009). Discrete partitioning andcoverage control with gossip communication. In ASME dynamic systems andcontrol conference (pp. 225–232).

Gaudard, M., Karson, M., Linder, E., & Sinha, D. (1999). Bayesian spatial prediction.Environmental and Ecological Statistics, 6(2), 147–171.

Graham, R., & Cortés, J. (2009). Cooperative adaptive sampling of random fieldswith partially known covariance. International Journal of Robust and NonlinearControl, 1, 1–2.

Graham, R., & Cortés, J. (2012). Adaptive information collection by robotic sensornetworks for spatial estimation. IEEE Transactions on Automatic Control, 57(6),1404–1419.

Hartman, L., & Hössjer, O. (2008). Fast kriging of large data sets with GaussianMarkov random fields. Computational Statistics & Data Analysis, 52(5),2331–2349.

Herbrich, Ralf, Lawrence, Neil D, & Seeger, Matthias (2002). Fast sparse Gaussianprocess methods: the informative vector machine. Advances in NeuralInformation Processing Systems, 609–616.

Page 11: Automatica ...xuyunfei/resources/Xu_Automatica2013.pdf · 3522 Y.Xuetal./Automatica49(2013)3520–3530 Fig.1. Exampleofaspatialfield.Muchfinergridswillbeusedforrealapplications. Fig

3530 Y. Xu et al. / Automatica 49 (2013) 3520–3530

Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements inGaussian processes: theory, efficient algorithms and empirical studies. TheJournal of Machine Learning Research, 9, 235–284.

Le Ny, J., & Pappas, G.J. (2010). On trajectory optimization for active sensing inGaussian process models. In Proceedings of the 48th IEEE conference on decisionand control (pp. 6286–6292).

Le Ny, J., & Pappas, G. J. (2013). Adaptive deployment of mobile robotic networks.IEEE Transactions on Automatic Control, 58(3), 654–666.

Leonard, N. E., Paley, D. A., Lekien, F., Sepulchre, R., Fratantoni, D. M., & Davis, R. E.(2007). Collective motion, sensor networks, and ocean sampling. Proceedings ofthe IEEE, 95(1).

Lindgren, F., Rue, H., & Lindström, J. (2011). An explicit link between Gaussian fieldsandGaussianMarkov random fields: the stochastic partial differential equationapproach. Journal of the Royal Statistical Society: Series B, 73(4), 423–498.

Lynch, K. M., Schwartz, I. B., Yang, P., & Freeman, R. A. (2008). Decentralizedenvironmental modeling by mobile sensor networks. IEEE Transactions onRobotics, 24(3), 710–724.

Olfati-Saber, R., Fax, J. A., & Murray, R. M. (2007). Consensus and cooperation innetworked multi-agent systems. Proceedings of the IEEE, 95(1), 215–233.

Rasmussen, C. E., &Williams, C. K. I. (2006). Gaussian processes for machine learning.Cambridge, Massachusetts, London, England: The MIT Press.

Rue, H., & Held, L. (2005). Gaussian Markov random fields: theory and applications.Chapman & Hall.

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latentGaussianmodels by using integrated nested Laplace approximations. Journal ofthe Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392.

Rue, H., & Tjelmeland, H. (2002). Fitting GaussianMarkov random fields to Gaussianfields. Scandinavian Journal of Statistics, 29(1), 31–49.

Seeger, Matthias (2003). Bayesian Gaussian process models: PAC-Bayesian generalisa-tion error bounds and sparse approximations (Ph.D. thesis). School of Informatics,University of Edinburgh.

Smola, A. J., & Bartlett, P. (2001). Sparse greedy Gaussian process regression.Advances in Neural Information Processing Systems, 13.

Stanković, M. S., & Stipanović, D. M. (2010). Extremum seeking under stochasticnoise and applications to mobile sensors. Automatica, 46(8), 1243–1251.

Tresp, V. (2000). A Bayesian committee machine. Neural Computation, 12(11),2719–2741.

Varagnolo, D., Pillonetto, G., & Schenato, L. (2012). Distributed parametric andnonparametric regression with on-line performance bounds computation.Automatica, 48(10), 2468–2481.

Williams, C. K. I., & Seeger, M. (2001). Using the Nyströmmethod to speed up kernelmachines. Advances in Neural Information Processing Systems, 13.

Xu, Y., & Choi, J. (2011). Adaptive sampling for learning Gaussian processes usingmobile sensor networks. Sensors, 11(3), 3051–3066.

Xu, Y., & Choi, J. (2012a). Spatial prediction with mobile sensor networks usingGaussian processes with built-in Gaussian Markov random fields. Automatica,48(8), 1735–1740.

Xu, Y., & Choi, J. (2012b). Stochastic adaptive sampling for mobile sensor networksusing kernel regression. International Journal of Control, Automation and Systems,10(4), 778–786.

Xu, Y., Choi, J., Dass, S., &Maiti, T. (2011) Bayesian prediction and adaptive samplingalgorithms for mobile sensor networks. In Proceedings of the 2011 Americancontrol conference (ACC) (pp. 4095–4200). San Francisco, CA.

Xu, Y., Choi, J., Dass, S., & Maiti, T. (2012a). Efficient Bayesian spatial predictionwithmobile sensor networks usingGaussianMarkov random fields. InAmericancontrol conference (ACC), 2012 (pp. 2171–2176).

Xu, Y., Choi, J., Dass, S., & Maiti, T. (2012b). Sequential Bayesian prediction andadaptive sampling algorithms for mobile sensor networks. IEEE Transactions onAutomatic Control, 57(8), 2078–2084.

Xu, Y., Choi, J., & Oh, S. (2011). Mobile sensor network navigation using Gaussianprocesses with truncated observations. IEEE Transactions on Robotics, 27(6),1118–1131.

Zhang, C., Siranosian, A., & Krstić, M. (2007). Extremum seeking for moderatelyunstable systems and for autonomous vehicle target tracking without positionmeasurements. Automatica, 43(10), 1832–1839.

Yunfei Xu received his Ph.D. in Mechanical Engineeringfrom Michigan State University in 2011. He also receivedhis M.S. and B.S. degrees in Automotive Engineering fromTsinghua University, Beijing, China, in 2004 and 2007, re-spectively. Currently, he is a Research Fellow with theDepartment of Mechanical Engineering at Michigan StateUniversity. His current research interests include environ-mental adaptive sampling algorithms, Gaussian processes,and statistical learning algorithms with applications torobotics and mobile sensor networks. His papers were fi-nalists for the Best Student Paper Award at the DynamicSystem and Control Conference (DSCC) 2011 and 2012.

Jongeun Choi received his Ph.D. and M.S. degrees in Me-chanical Engineering from the University of California atBerkeley in 2006 and 2002 respectively. He also receiveda B.S. degree in Mechanical Design and Production Engi-neering from Yonsei University at Seoul, Republic of Ko-rea in 1998. He is currently an Associate Professorwith theDepartments ofMechanical Engineering and Electrical andComputer Engineering at the Michigan State University.His current research interests include systems and control,system identification, and Bayesian methods, with appli-cations to mobile robotic sensors, environmental adaptive

sampling, engine control, neuromusculoskeletal systems, and biomedical problems.He is an Associate Editor for the Journal of Dynamic Systems, Measurement andControl. His papers were finalists for the Best Student Paper Award at the 24thAmerican Control Conference (ACC) 2005 and theDynamic SystemandControl Con-ference (DSCC) 2011 and 2012. Hewas a recipient of an NSF CAREER Award in 2009.Dr. Choi is a member of ASME.

Sarat Dass received his Ph.D. and M.S. degrees in Statis-tics from Purdue University atWest Lafayette, Indiana, US,in 1995 and 1998, respectively. He also received a B.Stat.(Hons) degree in Statistics from the Indian Statistical In-stitute in 1993. His current research interests include sta-tistical inference for dynamical systems, statistical patternrecognition and image processing, and Bayesian methodswith applications to various fields of engineering. He is anAssociate Editor for Sankhya B, the Journal of the IndianStatistical Institute. He received the Outstanding Statisti-cal Application award in 2012 from the American Statisti-

cal Association (ASA) for his paper entitled ‘‘Assessing Fingerprint Individuality: ACase Study in Spatially Dependent Marked Processes’’. The paper also received theWilcoxon award from Technometrics. Dr. Dass is a member of ASA and ISBA.

TapabrataMaiti is aworld class statistician, a fellow of theAmerican Statistical Association and the Institute of Math-ematical Statistics. He has published research articles intop tier statistics journals such as the journal of the Amer-ican Statistical Association, Annals of Statistics, the Jour-nal of the Royal Statistical Society, Series B, Biometrika,Biometrics, etc. He has also published research articles inengineering, economics, genetics, medicine and social sci-ences. His research has been supported by the NationalScience Foundation and National Institutes of Health. Hepresented his work in numerous national and interna-

tional meetings and in academic departments. Prof. Maiti served in editorial boardof several statistics journals including journal of the American Statistical Associ-ation and journal of Agricultural, Environmental and Biological Statistics. He hasalso served on several professional committees. Currently, he is a professor and thegraduate director in the department of statistics and probability, Michigan StateUniversity. Prior to MSU, he was a tenured faculty member in the department ofstatistics, Iowa State University. Professor Maiti has supervised several Ph.D. stu-dents and regularly teaches statistics and non-stat major graduate students.