88
Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D. () 1 / 30

Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Embed Size (px)

Citation preview

Page 1: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling Resampling

Arnaud DoucetDepartments of Statistics & Computer Science

University of British Columbia

A.D. () 1 / 30

Page 2: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

We use a structured IS distribution

qn (x1:n) = qn�1 (x1:n�1) qn (xn j x1:n�1)

= q1 (x1) q2 (x2j x1) � � � qn (xn j x1:n�1)

so if X (i )1:n�1 � qn�1 (x1:n�1) then we only need to sample

X (i )n���X (i )1:n�1 � qn

�xn jX (i )1:n�1

�to obtain X (i )1:n � qn (x1:n)

The importance weights are updated according to

wn (x1:n) =γn (x1:n)

qn (x1:n)= wn�1 (x1:n�1)

γn (x1:n)

γn�1 (x1:n�1) qn (xn j x1:n�1)| {z }αn(x1:n)

A.D. () 2 / 30

Page 3: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

We use a structured IS distribution

qn (x1:n) = qn�1 (x1:n�1) qn (xn j x1:n�1)

= q1 (x1) q2 (x2j x1) � � � qn (xn j x1:n�1)

so if X (i )1:n�1 � qn�1 (x1:n�1) then we only need to sample

X (i )n���X (i )1:n�1 � qn

�xn jX (i )1:n�1

�to obtain X (i )1:n � qn (x1:n)

The importance weights are updated according to

wn (x1:n) =γn (x1:n)

qn (x1:n)= wn�1 (x1:n�1)

γn (x1:n)

γn�1 (x1:n�1) qn (xn j x1:n�1)| {z }αn(x1:n)

A.D. () 2 / 30

Page 4: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .

At time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= wn�1

�X (i )1:n�1

�αn�X (i )1:n

�.

It follows that

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) ,

bZn =1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 3 / 30

Page 5: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .At time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= wn�1

�X (i )1:n�1

�αn�X (i )1:n

�.

It follows that

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) ,

bZn =1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 3 / 30

Page 6: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .At time n � 2

sample X (i )n � qn��jX (i )1:n�1

compute wn�X (i )1:n

�= wn�1

�X (i )1:n�1

�αn�X (i )1:n

�.

It follows that

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) ,

bZn =1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 3 / 30

Page 7: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .At time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= wn�1

�X (i )1:n�1

�αn�X (i )1:n

�.

It follows that

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) ,

bZn =1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 3 / 30

Page 8: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .At time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= wn�1

�X (i )1:n�1

�αn�X (i )1:n

�.

It follows that

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) ,

bZn =1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 3 / 30

Page 9: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling for State-Space Models

State-space models

Hidden Markov process: X1 � µ, Xk j (Xk�1 = xk�1) � f ( �j xk�1)

Observation process: Yk j (Xk = xk ) � g ( �j xk )

Assume we have received y1:n, we are interested in sampling from

πn (x1:n) = p (x1:n j y1:n) =p (x1:n, y1:n)

p (y1:n)

and estimating p (y1:n) where

γn (x1:n) = p (x1:n, y1:n) = µ (x1)n

∏k=2

f (xk j xk�1)n

∏k=1

g (yk j xk ) ,

Zn = p (y1:n) =Z� � �

Zµ (x1)

n

∏k=2

f (xk j xk�1)n

∏k=1

g (yk j xk ) dx1:n.

A.D. () 4 / 30

Page 10: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Importance Sampling for State-Space Models

State-space models

Hidden Markov process: X1 � µ, Xk j (Xk�1 = xk�1) � f ( �j xk�1)

Observation process: Yk j (Xk = xk ) � g ( �j xk )Assume we have received y1:n, we are interested in sampling from

πn (x1:n) = p (x1:n j y1:n) =p (x1:n, y1:n)

p (y1:n)

and estimating p (y1:n) where

γn (x1:n) = p (x1:n, y1:n) = µ (x1)n

∏k=2

f (xk j xk�1)n

∏k=1

g (yk j xk ) ,

Zn = p (y1:n) =Z� � �

Zµ (x1)

n

∏k=2

f (xk j xk�1)n

∏k=1

g (yk j xk ) dx1:n.

A.D. () 4 / 30

Page 11: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Locally Optimal Importance Distribution

The optimal IS distribution qn (xn j x1:n�1) at time n minimizing thevariance of wn (x1:n) is given by

qoptn (xn j x1:n�1) = πn (xn j x1:n�1)

and yields an incremental importance weight of the form

αn (x1:n) =γn (x1:n�1)

γn�1 (x1:n�1)

For state-space models, we have

qoptn (xn j x1:n�1) = p (xn j yn, xn�1) =g (yn j xn) f (xn j xn�1)

p (yn j xn�1),

αn (x1:n) = p (yn j xn�1) .

A.D. () 5 / 30

Page 12: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Locally Optimal Importance Distribution

The optimal IS distribution qn (xn j x1:n�1) at time n minimizing thevariance of wn (x1:n) is given by

qoptn (xn j x1:n�1) = πn (xn j x1:n�1)

and yields an incremental importance weight of the form

αn (x1:n) =γn (x1:n�1)

γn�1 (x1:n�1)

For state-space models, we have

qoptn (xn j x1:n�1) = p (xn j yn, xn�1) =g (yn j xn) f (xn j xn�1)

p (yn j xn�1),

αn (x1:n) = p (yn j xn�1) .

A.D. () 5 / 30

Page 13: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Sequential Importance Sampling is a special case of ImportanceSampling.

Importance Sampling only works decently for moderate size problems.

Today, we discuss how to partially �x this problem.

A.D. () 6 / 30

Page 14: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Sequential Importance Sampling is a special case of ImportanceSampling.

Importance Sampling only works decently for moderate size problems.

Today, we discuss how to partially �x this problem.

A.D. () 6 / 30

Page 15: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Sequential Importance Sampling is a special case of ImportanceSampling.

Importance Sampling only works decently for moderate size problems.

Today, we discuss how to partially �x this problem.

A.D. () 6 / 30

Page 16: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Resampling

Intuitive KEY idea: As the time index n increases, the variance of theunnormalized weights

nwn�X (i )1:n

�otend to increase and all the mass

is concentrated on a few random samples/particles. We propose toreset the approximation by getting rid in a principled way of theparticles with low weights W (i )

n (relative to 1/N) and multiply theparticles with high weights W (i )

n (relative to 1/N).

The main reason is that if a particle at time n has a low weight thentypically it will still have a low weight at time n+ 1 (though I caneasily give you a counterexample).

You want to focus your computational e¤orts on the �promising�parts of the space.

A.D. () 7 / 30

Page 17: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Resampling

Intuitive KEY idea: As the time index n increases, the variance of theunnormalized weights

nwn�X (i )1:n

�otend to increase and all the mass

is concentrated on a few random samples/particles. We propose toreset the approximation by getting rid in a principled way of theparticles with low weights W (i )

n (relative to 1/N) and multiply theparticles with high weights W (i )

n (relative to 1/N).The main reason is that if a particle at time n has a low weight thentypically it will still have a low weight at time n+ 1 (though I caneasily give you a counterexample).

You want to focus your computational e¤orts on the �promising�parts of the space.

A.D. () 7 / 30

Page 18: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Resampling

Intuitive KEY idea: As the time index n increases, the variance of theunnormalized weights

nwn�X (i )1:n

�otend to increase and all the mass

is concentrated on a few random samples/particles. We propose toreset the approximation by getting rid in a principled way of theparticles with low weights W (i )

n (relative to 1/N) and multiply theparticles with high weights W (i )

n (relative to 1/N).The main reason is that if a particle at time n has a low weight thentypically it will still have a low weight at time n+ 1 (though I caneasily give you a counterexample).

You want to focus your computational e¤orts on the �promising�parts of the space.

A.D. () 7 / 30

Page 19: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Multinomial Resampling

At time n, IS provides the following approximation of πn (x1:n)

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) .

The simplest resampling schemes consists of sampling N timeseX (i )1:n � bπn (dx1:n) to build the new approximation

eπn (dx1:n) =1N

N

∑i=1

δeX (i )1:n(dx1:n) .

The new resampled particlesneX (i )1:n

oare approximately distributed

according to πn (x1:n) but statistically dependent. This istheoretically more di¢ cult to study.

A.D. () 8 / 30

Page 20: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Multinomial Resampling

At time n, IS provides the following approximation of πn (x1:n)

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) .

The simplest resampling schemes consists of sampling N timeseX (i )1:n � bπn (dx1:n) to build the new approximation

eπn (dx1:n) =1N

N

∑i=1

δeX (i )1:n(dx1:n) .

The new resampled particlesneX (i )1:n

oare approximately distributed

according to πn (x1:n) but statistically dependent. This istheoretically more di¢ cult to study.

A.D. () 8 / 30

Page 21: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Multinomial Resampling

At time n, IS provides the following approximation of πn (x1:n)

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) .

The simplest resampling schemes consists of sampling N timeseX (i )1:n � bπn (dx1:n) to build the new approximation

eπn (dx1:n) =1N

N

∑i=1

δeX (i )1:n(dx1:n) .

The new resampled particlesneX (i )1:n

oare approximately distributed

according to πn (x1:n) but statistically dependent. This istheoretically more di¢ cult to study.

A.D. () 8 / 30

Page 22: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Note that we can rewrite

eπn (dx1:n) =N

∑i=1

N (i )nN

δX (i )1:n(dx1:n)

where�N (1)n , ...,N (N )n

��M

�N;W (1)

n , ...,W (N )n

�thus

EhN (i )n

i= NW (i )

n , varhN (1)n

i= NW (i )

n

�1�W (i )

n

�.

It follows that the resampling step is an unbiased operation

E [ eπn (dx1:n)j bπn (dx1:n)] = bπn (dx1:n)

but clearly it introduces some errors �locally� in time. That is for anytest function, we have

vareπn [ϕ (X1:n)] � varbπn [ϕ (X1:n)]

Resampling is bene�cial for future time steps (sometimes).

A.D. () 9 / 30

Page 23: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Note that we can rewrite

eπn (dx1:n) =N

∑i=1

N (i )nN

δX (i )1:n(dx1:n)

where�N (1)n , ...,N (N )n

��M

�N;W (1)

n , ...,W (N )n

�thus

EhN (i )n

i= NW (i )

n , varhN (1)n

i= NW (i )

n

�1�W (i )

n

�.

It follows that the resampling step is an unbiased operation

E [ eπn (dx1:n)j bπn (dx1:n)] = bπn (dx1:n)

but clearly it introduces some errors �locally� in time. That is for anytest function, we have

vareπn [ϕ (X1:n)] � varbπn [ϕ (X1:n)]

Resampling is bene�cial for future time steps (sometimes).

A.D. () 9 / 30

Page 24: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Note that we can rewrite

eπn (dx1:n) =N

∑i=1

N (i )nN

δX (i )1:n(dx1:n)

where�N (1)n , ...,N (N )n

��M

�N;W (1)

n , ...,W (N )n

�thus

EhN (i )n

i= NW (i )

n , varhN (1)n

i= NW (i )

n

�1�W (i )

n

�.

It follows that the resampling step is an unbiased operation

E [ eπn (dx1:n)j bπn (dx1:n)] = bπn (dx1:n)

but clearly it introduces some errors �locally� in time. That is for anytest function, we have

vareπn [ϕ (X1:n)] � varbπn [ϕ (X1:n)]

Resampling is bene�cial for future time steps (sometimes).

A.D. () 9 / 30

Page 25: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Strati�ed Resampling

Better resampling steps can be designed such that EhN (i )n

i= NW (i )

n

but VhN (i )n

i< NW (i )

n

�1�W (i )

n

�.

A popular alternative to multinomial resampling consists of selecting

U1 � U�0,1N

�and for i = 2, ...,N

Ui = U1 +i � 1N

= Ui�1 +1N.

Then we set

N (i )n = #

(Uj :

i�1∑m=1

W (m)n � Uj <

i

∑m=1

W (m)n

)where ∑0

m=1 = 0.

It is trivial to check that EhN (i )n

i= NW (i )

n .

A.D. () 10 / 30

Page 26: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Strati�ed Resampling

Better resampling steps can be designed such that EhN (i )n

i= NW (i )

n

but VhN (i )n

i< NW (i )

n

�1�W (i )

n

�.

A popular alternative to multinomial resampling consists of selecting

U1 � U�0,1N

�and for i = 2, ...,N

Ui = U1 +i � 1N

= Ui�1 +1N.

Then we set

N (i )n = #

(Uj :

i�1∑m=1

W (m)n � Uj <

i

∑m=1

W (m)n

)where ∑0

m=1 = 0.

It is trivial to check that EhN (i )n

i= NW (i )

n .

A.D. () 10 / 30

Page 27: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Strati�ed Resampling

Better resampling steps can be designed such that EhN (i )n

i= NW (i )

n

but VhN (i )n

i< NW (i )

n

�1�W (i )

n

�.

A popular alternative to multinomial resampling consists of selecting

U1 � U�0,1N

�and for i = 2, ...,N

Ui = U1 +i � 1N

= Ui�1 +1N.

Then we set

N (i )n = #

(Uj :

i�1∑m=1

W (m)n � Uj <

i

∑m=1

W (m)n

)where ∑0

m=1 = 0.

It is trivial to check that EhN (i )n

i= NW (i )

n .

A.D. () 10 / 30

Page 28: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Strati�ed Resampling

Better resampling steps can be designed such that EhN (i )n

i= NW (i )

n

but VhN (i )n

i< NW (i )

n

�1�W (i )

n

�.

A popular alternative to multinomial resampling consists of selecting

U1 � U�0,1N

�and for i = 2, ...,N

Ui = U1 +i � 1N

= Ui�1 +1N.

Then we set

N (i )n = #

(Uj :

i�1∑m=1

W (m)n � Uj <

i

∑m=1

W (m)n

)where ∑0

m=1 = 0.

It is trivial to check that EhN (i )n

i= NW (i )

n .

A.D. () 10 / 30

Page 29: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

An alternative approach to resampling

Assumeαn (x1:n) � 1 over En (rescale if necessary)

We have

πn (x1:n) =αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)Rαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

= αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�� αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)R

αn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

Looks like measure-valued Metropolis-Hastings algorithm.

A.D. () 11 / 30

Page 30: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

An alternative approach to resampling

Assumeαn (x1:n) � 1 over En (rescale if necessary)

We have

πn (x1:n) =αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)Rαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

= αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�� αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)R

αn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

Looks like measure-valued Metropolis-Hastings algorithm.

A.D. () 11 / 30

Page 31: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

An alternative approach to resampling

Assumeαn (x1:n) � 1 over En (rescale if necessary)

We have

πn (x1:n) =αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)Rαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

= αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�� αn (x1:n) qn (xn j x1:n�1)πn�1 (x1:n�1)R

αn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

Looks like measure-valued Metropolis-Hastings algorithm.

A.D. () 11 / 30

Page 32: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Probabilistic interpretation

We have

πn (x1:n) = αn (x1:n)| {z }accept with proba wn

qn (xn j x1:n�1)πn�1 (x1:n�1)| {z }trial distribution

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�| {z }

rejection probability

πn (x1:n)

Say X (i )1:n�1 � πn�1 and sample X(i )n � qn

��jX (i )1:n�1

�.

With probability αn�X (i )n

�, set eX (i )1:n = X

(i )1:n otherwiseeX (i )1:n � ∑N

i=1W(i )n δ

X (i )1:n(dx1:n).

Remark: Allows to decrease variance if αn (x1:n) ��at" over En; e.g.�ltering with large observation noise.

A.D. () 12 / 30

Page 33: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Probabilistic interpretation

We have

πn (x1:n) = αn (x1:n)| {z }accept with proba wn

qn (xn j x1:n�1)πn�1 (x1:n�1)| {z }trial distribution

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�| {z }

rejection probability

πn (x1:n)

Say X (i )1:n�1 � πn�1 and sample X(i )n � qn

��jX (i )1:n�1

�.

With probability αn�X (i )n

�, set eX (i )1:n = X

(i )1:n otherwiseeX (i )1:n � ∑N

i=1W(i )n δ

X (i )1:n(dx1:n).

Remark: Allows to decrease variance if αn (x1:n) ��at" over En; e.g.�ltering with large observation noise.

A.D. () 12 / 30

Page 34: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Probabilistic interpretation

We have

πn (x1:n) = αn (x1:n)| {z }accept with proba wn

qn (xn j x1:n�1)πn�1 (x1:n�1)| {z }trial distribution

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�| {z }

rejection probability

πn (x1:n)

Say X (i )1:n�1 � πn�1 and sample X(i )n � qn

��jX (i )1:n�1

�.

With probability αn�X (i )n

�, set eX (i )1:n = X

(i )1:n otherwiseeX (i )1:n � ∑N

i=1W(i )n δ

X (i )1:n(dx1:n).

Remark: Allows to decrease variance if αn (x1:n) ��at" over En; e.g.�ltering with large observation noise.

A.D. () 12 / 30

Page 35: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Probabilistic interpretation

We have

πn (x1:n) = αn (x1:n)| {z }accept with proba wn

qn (xn j x1:n�1)πn�1 (x1:n�1)| {z }trial distribution

+

�1�

Zαn (x1:n) qn (dxn j x1:n�1)πn�1 (dx1:n�1)

�| {z }

rejection probability

πn (x1:n)

Say X (i )1:n�1 � πn�1 and sample X(i )n � qn

��jX (i )1:n�1

�.

With probability αn�X (i )n

�, set eX (i )1:n = X

(i )1:n otherwiseeX (i )1:n � ∑N

i=1W(i )n δ

X (i )1:n(dx1:n).

Remark: Allows to decrease variance if αn (x1:n) ��at" over En; e.g.�ltering with large observation noise.

A.D. () 12 / 30

Page 36: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Degeneracy Measures

Resampling at each time step is harmful. We should resample onlywhen necessary.

To measure the variation of the weights, we can use the E¤ectiveSample Size (ESS) or the coe¢ cient of variation CV

ESS =

N

∑i=1

�W (i )n

�2!�1, CV =

1N

N

∑i=1

�NW (i )

n � 1�2!1/2

We have ESS = N and CV = 0 if W (i )n = 1/N for any i .

We have ESS = 1 and CV =pN � 1 if W (i )

n = 1 and W (j)n = 1 for

j 6= i .

A.D. () 13 / 30

Page 37: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Degeneracy Measures

Resampling at each time step is harmful. We should resample onlywhen necessary.

To measure the variation of the weights, we can use the E¤ectiveSample Size (ESS) or the coe¢ cient of variation CV

ESS =

N

∑i=1

�W (i )n

�2!�1, CV =

1N

N

∑i=1

�NW (i )

n � 1�2!1/2

We have ESS = N and CV = 0 if W (i )n = 1/N for any i .

We have ESS = 1 and CV =pN � 1 if W (i )

n = 1 and W (j)n = 1 for

j 6= i .

A.D. () 13 / 30

Page 38: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Degeneracy Measures

Resampling at each time step is harmful. We should resample onlywhen necessary.

To measure the variation of the weights, we can use the E¤ectiveSample Size (ESS) or the coe¢ cient of variation CV

ESS =

N

∑i=1

�W (i )n

�2!�1, CV =

1N

N

∑i=1

�NW (i )

n � 1�2!1/2

We have ESS = N and CV = 0 if W (i )n = 1/N for any i .

We have ESS = 1 and CV =pN � 1 if W (i )

n = 1 and W (j)n = 1 for

j 6= i .

A.D. () 13 / 30

Page 39: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Degeneracy Measures

Resampling at each time step is harmful. We should resample onlywhen necessary.

To measure the variation of the weights, we can use the E¤ectiveSample Size (ESS) or the coe¢ cient of variation CV

ESS =

N

∑i=1

�W (i )n

�2!�1, CV =

1N

N

∑i=1

�NW (i )

n � 1�2!1/2

We have ESS = N and CV = 0 if W (i )n = 1/N for any i .

We have ESS = 1 and CV =pN � 1 if W (i )

n = 1 and W (j)n = 1 for

j 6= i .

A.D. () 13 / 30

Page 40: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

We can also use the entropy

Ent = �N

∑i=1W (i )n log2

�W (i )n

We have Ent = log2 (N) if W(i )n = 1/N for any i . We have Ent = 0

if W (i )n = 1 and W (j)

n = 1 for j 6= i .Dynamic Resampling: If the variation of the weights as measured byESS, CV or Ent is too high, then resample the particles.

A.D. () 14 / 30

Page 41: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

We can also use the entropy

Ent = �N

∑i=1W (i )n log2

�W (i )n

�We have Ent = log2 (N) if W

(i )n = 1/N for any i . We have Ent = 0

if W (i )n = 1 and W (j)

n = 1 for j 6= i .

Dynamic Resampling: If the variation of the weights as measured byESS, CV or Ent is too high, then resample the particles.

A.D. () 14 / 30

Page 42: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

We can also use the entropy

Ent = �N

∑i=1W (i )n log2

�W (i )n

�We have Ent = log2 (N) if W

(i )n = 1/N for any i . We have Ent = 0

if W (i )n = 1 and W (j)

n = 1 for j 6= i .Dynamic Resampling: If the variation of the weights as measured byESS, CV or Ent is too high, then resample the particles.

A.D. () 14 / 30

Page 43: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 44: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .Resample

nX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

o

At time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 45: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .Resample

nX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 46: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .Resample

nX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � qn��jX (i )1:n�1

compute wn�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 47: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .Resample

nX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 48: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Generic Sequential Monte Carlo Scheme

At time n = 1, sample X (i )1 � q1 (�) and set w1�X (i )1

�=

γ1

�X (i )1

�q1�X (i )1

� .Resample

nX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � qn��jX (i )1:n�1

�compute wn

�X (i )1:n

�= αn

�X (i )1:n

�.

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 15 / 30

Page 49: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

At any time n, we have two approximation of πn (x1:n)

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) (before resampling)

eπn (dx1:n) =1N

N

∑i=1

δX (i )1:n(dx1:n) (after resampling).

We also have dZnZn�1

=1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 16 / 30

Page 50: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

At any time n, we have two approximation of πn (x1:n)

bπn (dx1:n) =N

∑i=1W (i )n δ

X (i )1:n(dx1:n) (before resampling)

eπn (dx1:n) =1N

N

∑i=1

δX (i )1:n(dx1:n) (after resampling).

We also have dZnZn�1

=1N

N

∑i=1wn�X (i )1:n

�.

A.D. () 16 / 30

Page 51: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � q��j yn ,X (i )n�1

�compute wn

�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 52: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

o

At time n � 2

sample X (i )n � q��j yn ,X (i )n�1

�compute wn

�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 53: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � q��j yn ,X (i )n�1

�compute wn

�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 54: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � q��j yn ,X (i )n�1

compute wn�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 55: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � q��j yn ,X (i )n�1

�compute wn

�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 56: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Sequential Monte Carlo for Hidden Markov Models

At time n = 1, sample X (i )1 � q1 (�) and set

w1�X (i )1

�=

µ�X (i )1

�g�y1 jX (i )1

�q�X (i )1

���y1� .

ResamplenX (i )1 ,W

(i )1

oto obtain new particles also denoted

nX (i )1

oAt time n � 2

sample X (i )n � q��j yn ,X (i )n�1

�compute wn

�X (i )1:n

�=

f�X (i )n

���X (i )n�1�g� yn jX (i )n �q�X (i )n

���yn ,X (i )n�1� .

ResamplenX (i )1:n ,W

(i )n

oto obtain new particles also denoted

nX (i )1:n

o

A.D. () 17 / 30

Page 57: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Example: Linear Gaussian model

X1 � N (0, 1) , Xn = αXn�1 + σvVn,

Yn = Xn + σwWn

where Vn � N (0, 1) and Wn � N (0, 1).

We know that p (x1:n j y1:n) is Gaussian and its parameters can becomputed using Kalman techniques. In particular p (xn j y1:n) is also aGaussian which can be computed using the Kalman �lter.

We apply the SMC method withq (xn j yn, xn�1) = f (xn j xn�1) = N

�xn; αxn�1, σ2v

�.

A.D. () 18 / 30

Page 58: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Example: Linear Gaussian model

X1 � N (0, 1) , Xn = αXn�1 + σvVn,

Yn = Xn + σwWn

where Vn � N (0, 1) and Wn � N (0, 1).

We know that p (x1:n j y1:n) is Gaussian and its parameters can becomputed using Kalman techniques. In particular p (xn j y1:n) is also aGaussian which can be computed using the Kalman �lter.

We apply the SMC method withq (xn j yn, xn�1) = f (xn j xn�1) = N

�xn; αxn�1, σ2v

�.

A.D. () 18 / 30

Page 59: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Example: Linear Gaussian model

X1 � N (0, 1) , Xn = αXn�1 + σvVn,

Yn = Xn + σwWn

where Vn � N (0, 1) and Wn � N (0, 1).

We know that p (x1:n j y1:n) is Gaussian and its parameters can becomputed using Kalman techniques. In particular p (xn j y1:n) is also aGaussian which can be computed using the Kalman �lter.

We apply the SMC method withq (xn j yn, xn�1) = f (xn j xn�1) = N

�xn; αxn�1, σ2v

�.

A.D. () 18 / 30

Page 60: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

­ 25 ­ 20 ­ 15 ­ 10 ­ 5 00

500

1000

­ 25 ­ 20 ­ 15 ­ 10 ­ 5 00

500

1000

­ 25 ­ 20 ­ 15 ­ 10 ­ 5 00

500

1000

Importance Weights (base 10 logarithm)

Figure: Histograms of the base 10 logarithm of W (i )n for n = 1 (top), n = 50

(middle) and n = 100 (bottom).

By itself this graph does not mean that the procedure is e¢ cient!A.D. () 19 / 30

Page 61: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

This SMC strategy performs remarkably well in terms of estimation ofthe marginals p (xk j y1:k ) . This is what is only necessary in manyapplications thankfully.

However, the joint distribution p (x1:k j y1:k ) is poorly estimated whenk is large; i.e. we have in the previous example

bp (x1:11j y1:24) = δX1:11 (x1:11) .

The same conclusion holds for most sequences of distributionsπk (x1:k ).

Resampling only solves partially our problems.

A.D. () 20 / 30

Page 62: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

This SMC strategy performs remarkably well in terms of estimation ofthe marginals p (xk j y1:k ) . This is what is only necessary in manyapplications thankfully.

However, the joint distribution p (x1:k j y1:k ) is poorly estimated whenk is large; i.e. we have in the previous example

bp (x1:11j y1:24) = δX1:11 (x1:11) .

The same conclusion holds for most sequences of distributionsπk (x1:k ).

Resampling only solves partially our problems.

A.D. () 20 / 30

Page 63: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

This SMC strategy performs remarkably well in terms of estimation ofthe marginals p (xk j y1:k ) . This is what is only necessary in manyapplications thankfully.

However, the joint distribution p (x1:k j y1:k ) is poorly estimated whenk is large; i.e. we have in the previous example

bp (x1:11j y1:24) = δX1:11 (x1:11) .

The same conclusion holds for most sequences of distributionsπk (x1:k ).

Resampling only solves partially our problems.

A.D. () 20 / 30

Page 64: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

This SMC strategy performs remarkably well in terms of estimation ofthe marginals p (xk j y1:k ) . This is what is only necessary in manyapplications thankfully.

However, the joint distribution p (x1:k j y1:k ) is poorly estimated whenk is large; i.e. we have in the previous example

bp (x1:11j y1:24) = δX1:11 (x1:11) .

The same conclusion holds for most sequences of distributionsπk (x1:k ).

Resampling only solves partially our problems.

A.D. () 20 / 30

Page 65: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Another Illustration of the Degeneracy Phenomenon

For the linear Gaussian state-space model described before, we cancompute in closed form

Sn =1n

n

∑k=1

E�X 2k��Y1:n

�using the Kalman techniques.

We compute the SMC estimate of this quantity given by

bSn = 1n

n

∑k=1

N

∑i=1W (i )n

�X (i )k

�2This estimate can be updated sequentially using our SMCapproximation.

A.D. () 21 / 30

Page 66: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Another Illustration of the Degeneracy Phenomenon

For the linear Gaussian state-space model described before, we cancompute in closed form

Sn =1n

n

∑k=1

E�X 2k��Y1:n

�using the Kalman techniques.

We compute the SMC estimate of this quantity given by

bSn = 1n

n

∑k=1

N

∑i=1W (i )n

�X (i )k

�2

This estimate can be updated sequentially using our SMCapproximation.

A.D. () 21 / 30

Page 67: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Another Illustration of the Degeneracy Phenomenon

For the linear Gaussian state-space model described before, we cancompute in closed form

Sn =1n

n

∑k=1

E�X 2k��Y1:n

�using the Kalman techniques.

We compute the SMC estimate of this quantity given by

bSn = 1n

n

∑k=1

N

∑i=1W (i )n

�X (i )k

�2This estimate can be updated sequentially using our SMCapproximation.

A.D. () 21 / 30

Page 68: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure: Su¢ cient statistics computed exactly through the Kalman smoother(blue) and the SMC method (red).

A.D. () 22 / 30

Page 69: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Some Convergence Results for SMC

We will discuss convergence results for SMC later; see (Del Moral,2004).

In particular we have for any bounded function ϕ and any p > 1

E

�����Z ϕn (x1:n) (bπn (dx1:n)� πn (dx1:n))

����p�1/p

� Cn kϕk∞N

.

It looks like a nice result but it is rather useless as Cn increasespolynomially/exponentially with time.

To achieve a �xed precision, this would require to use atime-increasing number of particles N.

A.D. () 23 / 30

Page 70: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Some Convergence Results for SMC

We will discuss convergence results for SMC later; see (Del Moral,2004).

In particular we have for any bounded function ϕ and any p > 1

E

�����Z ϕn (x1:n) (bπn (dx1:n)� πn (dx1:n))

����p�1/p

� Cn kϕk∞N

.

It looks like a nice result but it is rather useless as Cn increasespolynomially/exponentially with time.

To achieve a �xed precision, this would require to use atime-increasing number of particles N.

A.D. () 23 / 30

Page 71: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Some Convergence Results for SMC

We will discuss convergence results for SMC later; see (Del Moral,2004).

In particular we have for any bounded function ϕ and any p > 1

E

�����Z ϕn (x1:n) (bπn (dx1:n)� πn (dx1:n))

����p�1/p

� Cn kϕk∞N

.

It looks like a nice result but it is rather useless as Cn increasespolynomially/exponentially with time.

To achieve a �xed precision, this would require to use atime-increasing number of particles N.

A.D. () 23 / 30

Page 72: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Some Convergence Results for SMC

We will discuss convergence results for SMC later; see (Del Moral,2004).

In particular we have for any bounded function ϕ and any p > 1

E

�����Z ϕn (x1:n) (bπn (dx1:n)� πn (dx1:n))

����p�1/p

� Cn kϕk∞N

.

It looks like a nice result but it is rather useless as Cn increasespolynomially/exponentially with time.

To achieve a �xed precision, this would require to use atime-increasing number of particles N.

A.D. () 23 / 30

Page 73: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

You cannot hope to estimate with a �xed precision a targetdistribution of increasing dimension.

At best, you can expect results of the following form

E

�����Z ϕ (xn�L+1:n) (bπn (dxn�L+1:n)� πn (dxn�L+1:n))

����p�1/p

� ML kϕk∞N

if the model has nice forgetting/mixing properties, i.e.Z ��πn (xn j x1)� πn�xn j x 01

��� dxn � 2λn�1

with 0 � λ < 1.

In the HMM case, it means thatZ ��p (xn j y1:n, x1)� p�xn j y1:n, x 01

��� dxn � λn�1

A.D. () 24 / 30

Page 74: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

You cannot hope to estimate with a �xed precision a targetdistribution of increasing dimension.At best, you can expect results of the following form

E

�����Z ϕ (xn�L+1:n) (bπn (dxn�L+1:n)� πn (dxn�L+1:n))

����p�1/p

� ML kϕk∞N

if the model has nice forgetting/mixing properties, i.e.Z ��πn (xn j x1)� πn�xn j x 01

��� dxn � 2λn�1

with 0 � λ < 1.

In the HMM case, it means thatZ ��p (xn j y1:n, x1)� p�xn j y1:n, x 01

��� dxn � λn�1

A.D. () 24 / 30

Page 75: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

You cannot hope to estimate with a �xed precision a targetdistribution of increasing dimension.At best, you can expect results of the following form

E

�����Z ϕ (xn�L+1:n) (bπn (dxn�L+1:n)� πn (dxn�L+1:n))

����p�1/p

� ML kϕk∞N

if the model has nice forgetting/mixing properties, i.e.Z ��πn (xn j x1)� πn�xn j x 01

��� dxn � 2λn�1

with 0 � λ < 1.

In the HMM case, it means thatZ ��p (xn j y1:n, x1)� p�xn j y1:n, x 01

��� dxn � λn�1

A.D. () 24 / 30

Page 76: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Central Limit Theorems

For SIS we havepN (Ebπn (ϕn (X1:n))�Eπn (ϕn (X1:n)))) N

�0, σ2IS (ϕn)

�where

σ2IS (ϕn) =Z

π2n (x1:n)

qn (x1:n)((ϕn (x1:n))�Eπn (ϕ (x1:n)))

2 dx1:n

We also have pN�bZn � Zn�) N

�0, σ2IS

�where

σ2IS =Z

π2n (x1:n)

qn (x1:n)dx1:n � 1

A.D. () 25 / 30

Page 77: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Central Limit Theorems

For SIS we havepN (Ebπn (ϕn (X1:n))�Eπn (ϕn (X1:n)))) N

�0, σ2IS (ϕn)

�where

σ2IS (ϕn) =Z

π2n (x1:n)

qn (x1:n)((ϕn (x1:n))�Eπn (ϕ (x1:n)))

2 dx1:n

We also have pN�bZn � Zn�) N

�0, σ2IS

�where

σ2IS =Z

π2n (x1:n)

qn (x1:n)dx1:n � 1

A.D. () 25 / 30

Page 78: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

For SMC, we have

σ2SMC (ϕn) =R π2n(x1)q1(x1)

�Rϕn (x2:n)πn (x2:n j x1) dx2:n �Eπn (ϕn (X1:n))

�2 dx1+∑n�1

k=2

R πn(x1:k )2

πk�1(x1:k�1)qk ( xk jxk�1)��R

ϕn (x1:n)πn (xk+1:n j xk ) dxk+1:n �Eπn (ϕn (X1:n))�2 dx1:k

+R πn(x1:n)

2

πn�1(x1:n�1)qn( xn jxn�1) (ϕn (x1:n)�Eπn (ϕn (X1:n)))2 dx1:n.

and

σ2SMC =Z

π2n (x1)q1 (x1)

dx1 +n

∑k=2

Zπn (x1:k )

2

πk�1 (x1:k�1) qk (xk j xk�1)dx1:k � n

A.D. () 26 / 30

Page 79: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Back to our toy example

Consider the case where the target is de�ned on Rn and

π (x1:n) =n

∏n=1

N (xk ; 0, 1) ,

γ (x1:n) =n

∏k=1

exp��x

2k

2

�, Z = (2π)n/2 .

We select an importance distribution

q (x1:n) =n

∏k=1

N�xk ; 0, σ

2� .

A.D. () 27 / 30

Page 80: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Back to our toy example

Consider the case where the target is de�ned on Rn and

π (x1:n) =n

∏n=1

N (xk ; 0, 1) ,

γ (x1:n) =n

∏k=1

exp��x

2k

2

�, Z = (2π)n/2 .

We select an importance distribution

q (x1:n) =n

∏k=1

N�xk ; 0, σ

2� .

A.D. () 27 / 30

Page 81: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

For SMC, the asymptotic variance is �nite only when σ2 > 12 and

VSMC

hbZniZ 2n

� 1N

"Zπ2n (x1)q1 (x1)

dx1 � 1+n

∑k=2

Zπ2n (xk )qk (xk )

dxk � 1#

=nN

"�σ4

2σ2 � 1

�1/2

� 1#

compared to

VIS

hbZniZ 2n

=1N

"�σ4

2σ2 � 1

�n/2

� 1#

for SIS.

If select σ2 = 1.2 then we saw that it is necessary to employ

N � 2� 1023 particles in order to obtain VIS[bZn]Z 2n

= 10�2 forn = 1000.

To obtain the same performance,VSMC[bZn]

Z 2n= 10�2, SMC requires the

use of just N � 104 particles: an improvement by 19 orders ofmagnitude.

A.D. () 28 / 30

Page 82: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

For SMC, the asymptotic variance is �nite only when σ2 > 12 and

VSMC

hbZniZ 2n

� 1N

"Zπ2n (x1)q1 (x1)

dx1 � 1+n

∑k=2

Zπ2n (xk )qk (xk )

dxk � 1#

=nN

"�σ4

2σ2 � 1

�1/2

� 1#

compared to

VIS

hbZniZ 2n

=1N

"�σ4

2σ2 � 1

�n/2

� 1#

for SIS.If select σ2 = 1.2 then we saw that it is necessary to employ

N � 2� 1023 particles in order to obtain VIS[bZn]Z 2n

= 10�2 forn = 1000.

To obtain the same performance,VSMC[bZn]

Z 2n= 10�2, SMC requires the

use of just N � 104 particles: an improvement by 19 orders ofmagnitude.

A.D. () 28 / 30

Page 83: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

For SMC, the asymptotic variance is �nite only when σ2 > 12 and

VSMC

hbZniZ 2n

� 1N

"Zπ2n (x1)q1 (x1)

dx1 � 1+n

∑k=2

Zπ2n (xk )qk (xk )

dxk � 1#

=nN

"�σ4

2σ2 � 1

�1/2

� 1#

compared to

VIS

hbZniZ 2n

=1N

"�σ4

2σ2 � 1

�n/2

� 1#

for SIS.If select σ2 = 1.2 then we saw that it is necessary to employ

N � 2� 1023 particles in order to obtain VIS[bZn]Z 2n

= 10�2 forn = 1000.

To obtain the same performance,VSMC[bZn]

Z 2n= 10�2, SMC requires the

use of just N � 104 particles: an improvement by 19 orders ofmagnitude.

A.D. () 28 / 30

Page 84: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

If you have nice mixing properties, then you can obtain

σ2SMC (ϕ) �CN

for ϕ depending only on Xn�L+1:n.

Under the same assumptions, you can also obtain

σ2SMC �D.TN.

A.D. () 29 / 30

Page 85: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

If you have nice mixing properties, then you can obtain

σ2SMC (ϕ) �CN

for ϕ depending only on Xn�L+1:n.

Under the same assumptions, you can also obtain

σ2SMC �D.TN.

A.D. () 29 / 30

Page 86: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Resampling can drastically improve the performance of SIS in modelshaving �good�mixing properties; e.g. state-space models: this can beveri�ed experimentally and theoretically.

Resampling does not solve all our problems; only the SMCapproximations of the most recent marginals πn (xn�L+1:n) arereliable; i.e. we can have uniform (in time) convergence bounds.

The SMC approximation of πn (x1:n) is only reliable for �small�n.

A.D. () 30 / 30

Page 87: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Resampling can drastically improve the performance of SIS in modelshaving �good�mixing properties; e.g. state-space models: this can beveri�ed experimentally and theoretically.

Resampling does not solve all our problems; only the SMCapproximations of the most recent marginals πn (xn�L+1:n) arereliable; i.e. we can have uniform (in time) convergence bounds.

The SMC approximation of πn (x1:n) is only reliable for �small�n.

A.D. () 30 / 30

Page 88: Sequential Importance Sampling Resampling · Sequential Importance Sampling Resampling Arnaud Doucet Departments of Statistics & Computer Science University of British Columbia A.D

Summary

Resampling can drastically improve the performance of SIS in modelshaving �good�mixing properties; e.g. state-space models: this can beveri�ed experimentally and theoretically.

Resampling does not solve all our problems; only the SMCapproximations of the most recent marginals πn (xn�L+1:n) arereliable; i.e. we can have uniform (in time) convergence bounds.

The SMC approximation of πn (x1:n) is only reliable for �small�n.

A.D. () 30 / 30