Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University

Near-Optimal Sensor Placements in Gaussian Processes

Carlos GuestrinAndreas Krause Ajit Singh

Carnegie Mellon University

Sensor placement applications

Monitoring of spatial phenomena Temperature Precipitation Drilling oil wells ...

Active learning, experimental design, ...

Results today not limited to 2-dimensions

Precipitationdata fromPacific NW

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Temperature data from sensor network

Deploying sensors

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

This deployment:Evenly distributed

sensors

But, what are the optimal placements???i.e., solving combinatorial (non-myopic) optimization

Chicken-and-Egg problem: No data or assumptions

about distribution

Don’t know where to place sensors

assumptionsConsidered in:

Computer science(c.f., [Hochbaum & Maass ’85])

Spatial statistics(c.f., [Cressie ’91])

Strong assumption – Sensing radius

Node predictsvalues of positionswith some radius

Becomes a covering problem

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE

Problem is NP-completeBut there are good algorithms with

(PTAS) -approximation guarantees [Hochbaum & Maass ’85]

Unfortunately, approach is usually not useful… Assumption is wrong on real data!

For example…

Spatial correlation

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Precipitationdata fromPacific NW

Non-local, Non-circular correlations

0.5

0.5

0.55

0.55

0.55

0.6

0.6

0.6

0.6

0.65

0.6

5

0.6

5

0.6

5

0.65

0.7

0.7

0.7

0.7

0.7

0.7

0.7

5

0.75

0.75

0.75

0.75

0.75

0.7

5

0.8

0.80.8

0.8

0.8

0.8

0.8

0.8

0.8

5

0.850.85

0.85

0.8

5

0.85

0.8

5

0.8

50.

85

0.9

0.9

0.9

0.9 0.9

0.9 0.9

0.9

0.9

0.95

0.95

0.95

0.950.950.95

0.95

0.95

0.9

5

0.9

5

0.9

5

0.95

1

1

1

1

5 10 15 20 25 30 35 40

0

5

10

15

20

25

Complex positive and negativecorrelations

-0.15

-0.1

-0.1

-0.1-0

.05

-0.05

-0.05

-0.0

5

-0.0

5

-0.05

0

0

0

0

0

0.05

0.0

5

0.05

0.0

5 0.1

0.1

0.1

0.15

0.1

5

0.1

5

0.2

0.2

0.25

0.25

0.3

0.3

0.3

5

0.4

43 44 45 46 47 48

-124

-123

-122

-121

-120

-119

-118

-117

-0.1

5-0.15

-0.15

-0.1

5

-0.1

-0.1

-0.1-0.1

-0.0

5-0.05

-0.05-0.05

-0.05

-0.05 -0.0

5

-0.0

5

0

0

0

0

0

0

0

0

0

0.05

0.05

0.05 0.05

0.05

0.05

0.050.1

0.1

0.10.1

0.1

0.1

0.1

0.1

0.15

0.15

0.1

5

0.15

0.15

0.2

0.20.2

0.2

0.2

0.250.25

0.3

0.3

0.35

0.4

43 44 45 46 47 48

-124

-123

-122

-121

-120

-119

-118

-117

Complex, noisy correlations

Complex, uneven sensing “region”

Actually, noisy correlations, rather than sensing region

Combining multiple sources of information

Individually, sensors are bad predictors Combined information is more reliable How do we combine information?

Focus of spatial statistics

Temphere?

Gaussian process (GP) - Intuition

x - position

y -

tem

pera

ture

GP – Non-parametric; represents uncertainty;complex correlation functions (kernels)

less sure here

more sure here

Uncertainty after observations are made

Gaussian processes

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540 Posterior

mean temperaturePosteriorvariance

Kernel function: Prediction after observing set of sensors A:

Gaussian processes for sensor placement

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540 Posterior

mean temperaturePosteriorvariance

Goal: Find sensor placement with least uncertainty after observations

Problem is still NP-complete Need approximation

Consider myopically selecting

This can be seen as an attempt to non-myopically maximize

Non-myopic placements

H(A1) + H(A2 | {A1}) + ... + H(Ak | {A1 ... Ak-1})

most uncertain

most uncertaingiven A1

most uncertaingiven A1 ... Ak-1

This is exactly the joint entropyH(A) = H({A1 ... Ak})

Entropy criterion (c.f., [Cressie ’91])

A Ã ; For i = 1 to k

Add location Xi to A, s.t.:

“Wasted” information

observed by[O’Hagan ’78]

EntropyHigh uncertainty

given current set A – X is different

Temperature data placements: Entropy

Uncertainty (entropy) plot

05

1015

20

0

5

10

150

0.5

1

1.5

2

2.5

05

1015

20

0

5

10

150

0.5

1

1.5

2

2.5

05

1015

20

0

5

10

150

0.5

1

1.5

2

2.5

05

1015

20

0

5

10

150

0.5

1

1.5

2

2.5

05

1015

20

0

5

10

150

0.5

1

1.5

2

2.5

Entropy places sensors along borders

Entropy criterion wastes information [O’Hagan ’78], Indirect, doesn’t consider sensing region – No formal non-myopic guarantees

Proposed objective function:Mutual information

Locations of interest V Find locations AµV

maximizing mutual information:

Intuitive greedy rule:

High uncertainty given A –

X is different

Low uncertainty given rest –

X is informative

Uncertainty ofuninstrumented

locationsafter sensing

Uncertainty ofuninstrumented

locationsbefore sensing

Intuitive criterion – Locations thatare both different and informative

We give formal non-myopic guarantees

Temperature data placements: Entropy Mutual information

T1

T2

An important observation

T5

T4

T3

Selecting T1 tells sth.about T2 and T5

Selecting T3 tells sth.about T2 and T4

Now adding T2 would not help much

In many cases, new information is worth less if we know more

(diminishing returns)!

Submodular set functions Submodular set functions are a natural

formalism for this idea:

f(A [ {X}) – f(A)

Maximization of SFs is NP-hard But…

B A {X}

¸ f(B [ {X}) – f(B) for A µ B

Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e.

~ 63%

How can we leverage submodularity?

Theorem [Nemhauser et al. ’78]: The greedy algorithm guarantees (1-1/e) OPT approximation for monotone SFs, i.e.

~ 63%

How can we leverage submodularity?

Mutual information and submodularity

Mutual information is submodular F(A) = I(A;V\A) So, we should be able to use Nemhauser et al.

Mutual information is not monotone!!! Initially, adding sensor increases MI; later

adding sensors decreases MI F(;) = I(;;V) = 0 F(V) = I(V;;) = 0 F(A) ¸ 0

mutu

al in

form

ati

on

A=; A=Vnum. sensors

Even though MI is submodular,can’t apply Nemhauser et al.

Or can we…

Approximate monotonicity of mutual information

If H(X|A) – H(X|V\A) ¸ 0, then MI monotonic Solution: Add grid Z of unobservable

locations If H(X|A) – H(X|ZV\A) ¸ 0, then MI monotonic

X

AV\A

H(X|A) << H(X|V\A)MI not monotonicFor sufficiently fine Z:

H(X|A) > H(X|ZV\A) - MI approximately monotonic

Z – unobservable

Theorem: Mutual information sensor placement

Greedy MI algorithm provides constant factor approximation: placing k sensors, 8 >0:

Optimalnon-myopic

solution

Result ofour algorithm

Constant factor

Approximate monotonicityfor sufficiently discretization –

poly(1/,k,,L,M) – sensor noise, L – Lipschitz const. of kernels,

M – maxX K(X,X)

Different costs for different placements

Theorem 1: Constant-factor approximation of optimal locations – select k sensorsTheorem 2: (Cost-sensitive placements) In practice, different locations may have different costs

Corridor versus inside wall Have a budget B to spend on placing sensors

Constant-factor approximation – same constant (1-1/e) Slightly more complicated than greedy algorithm [Sviridenko / Krause, Guestrin]

Deployment results

“True” temp.prediction

“True” temp.variance

Used initial deployment to select 22 new sensors Learned new GP on test data using just these sensors

Posteriormean

Posteriorvariance

Entropy criterion Mutual information criterion

Mutual information has 3 times less variance than entropy criterion

Model learned from 54 sensors

Comparing to other heuristics

mutu

al in

form

ati

on

Bett

er

Greedy Algorithm we analyze

Random placements Pairwise exchange

(PE) Start with a some

placement Swap locations while

improving solution

Our bound enables a posteriori analysis for any heuristic

Assume, algorithm TUAFSPGP gives results which are 10% better than the results obtained from the greedy algorithmThen we immediately know, TUAFSPGP is within 70% of optimum!

Precipitation data

Bette

r

Entropycriterion

MutualinformationEntropy

Mutual information

Computing the greedy rule

Exploit sparsity in kernel matrix

At each iteration For each candidate position i 2{1,…,N}, must compute:

Requires inversion of NxN matrix – about O(N3)

Total running time for k sensors: O(kN4) Polynomial! But very slow in practice

Local kernels Covariance matrix may have many zeros!

Each sensor location correlated with a small number of other locations

Exploiting locality: If each location correlated with at most d others A sparse representation, and a priority queue

trick Reduce complexity from O(kN4) to:

Only about O(N log N)

=

Usually, matrix is only almost sparse

Approximately local kernels

Covariance matrix may have many elements close to zero E.g., Gaussian kernel Matrix not sparse

What if we set them to zero? Sparse matrix Approximate solution

Theorem: Truncate small entries ! small effect on solution

quality If |K(x,y)| · , set to 0 Then, quality of placements only O() worse

Effect of truncated kernels on solution – Rain data

Improvement in running time

Bette

r

Effect on solution quality

Bette

r

About 3 times faster, minimal effect on solution quality

Summary Mutual information criterion for sensor

placement in general GPs Efficient algorithms with strong

approximation guarantees: (1-1/e) OPT-ε Exploiting local structure improves

efficiency Superior prediction accuracy for several

real-world problems Related ideas in discrete settings

presented at UAI and IJCAI this yearEffective algorithm for sensor placement and experimental design; basis for active learning

A note on maximizing entropy

Entropy is submodular [Ko et al. `95], but… Function F is monotonic iff:

Adding X cannot hurt F(A[X) ¸ F(A)

Remark: Entropy in GPs not monotonic (not even

approximately) H(A[X) – H(A) = H(X|A) As discretization becomes finer H(X|A) ! -1

Nemhauser et al. analysis for submodularfunctions not applicable directly to

entropy

How do we predict temperatures at unsensed locations?

position

tem

pera

ture

Interpolation?

Ove

rfits

Far away points?

How do we predict temperatures at unsensed locations?

x - position

y -

tem

pera

ture

Regression y = a + bx + cx2 + dx3 Few parameters, less overfitting

But, regression function has no notion of uncertainty!!!

How sure are we about the prediction?

less sure here

more sure here

Documents

Near-Optimal Sensor Placements in Gaussian Processes Carlos Guestrin Andreas KrauseAjit Singh Carnegie Mellon University