Upload
stephane-senecal
View
405
Download
3
Embed Size (px)
Citation preview
Sampling strategies forSequential Monte Carlo methods
Arnaud Doucet1, Stephane Senecal2
1Department of Engineering, University of Cambridge
2The Institute of Statistical Mathematics
2004
thanks to the Japanese Ministry of Education and the Japan Society for
the Promotion of Science
1
Overview
– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling
– Examples, applications
– References
2
Estimation of state space models
xt = ft(xt−1, ut) yt = gt(xt, vt)
p(x0:t|y1:t) → p(xt|y1:t) =
∫p(x0:t|y1:t)dx0:t−1
distribution of x0:t ⇒ computation of estimate x0:t :
x0:t =
∫x0:tp(x0:t|y1:t)dx0:t → Ep(.|y1:t){f(x0:t)}
x0:t = arg maxx0:t
p(x0:t|y1:t)
3
Computation of the estimates
p(x0:t|y1:t) ⇒ multidimensionnal, non-standard distributions :
→ analytical, numerical approximations
→ integration, optimisation methods
⇒ Monte Carlo techniques
4
Monte Carlo approach
compute estimates for distribution π(.) → samples x1, . . . , xN ∼ π
x
\pi(x)
x_1 x_N
⇒ distribution πN = 1N
∑Ni=1 δxi approximates π(.)
5
Monte Carlo estimates
SN (f) =1
N
N∑
i=1
f(xi) −→∫f(x)π(x)dx = Eπ{f(x)}
arg max(xi)1≤i≤N πN (xi) approximates arg maxx π(x)
⇒ sampling xi ∼ π difficult
→ importance sampling techniques
6
Importance Sampling
xi ∼ π → candidate/proposal distribution xi ∼ g
x
g(x)
\pi(x)
x_Nx_1
7
Importance Sampling
xi ∼ g 6= π → (xi, wi) weighted sample
⇒ weight wi =π(xi)
g(xi)
x
g(x)
\pi(x)
x_Nx_1
8
Estimation
importance sampling → computation of Monte Carlo estimates
e. g. expectations Eπ{f(x)} :∫f(x)
π(x)
g(x)g(x)dx =
∫f(x)π(x)dx
N∑
i=1
wif(xi) →∫f(x)π(x)dx = Eπ{f(x)}
dynamic model (xt, yt) ⇒ recursive estimation x0:t−1 → x0:t
Monte Carlo techniques ⇒ sampling sequences x(i)0:t−1 → x
(i)0:t
9
Sequential simulation
sampling sequences x(i)0:t ∼ πt(x0:t) recursively :
time
variablestate
x
p(x,t) target distribution:
t
t2
t1
p(x,t2)
x_t1
x_t2
p(x_t1)
p(x_t2)
p(x,t1)
10
Sequential simulation : importance sampling
samples x(i)0:t ∼ πt(x0:t) approximated by weighted particles
(x(i)0:t, w
(i)t )1≤i≤N
time
p(x,t) target distribution:
p(x,t2)
t
t2
t1
x
p(x,t1)
11
Sequential importance sampling
diffusing particles x(i)0:t1→ x
(i)0:t2
time
p(x,t) target distribution:
p(x,t2)
t
x
p(x,t1)
t2
t1
⇒ sampling scheme x(i)0:t−1 → x
(i)0:t
12
Sequential importance sampling
updating weights w(i)t1 → w
(i)t2
time
p(x,t) target distribution:
p(x,t2)
t
p(x,t1)
x
t2
t1
⇒ updating rule w(i)t−1 → w
(i)t
13
Sequential Importance Sampling
x0:t ∼ πt(x0:t)⇒ (x(i)0:t, w
(i)t )1≤i≤N
Simulation scheme t− 1 → t :
– Sampling step x(i)t ∼ qt(xt|x(i)
0:t−1)
– Updating weights
w(i)t ∝ w(i)
t−1 ×πt(x
(i)0:t−1, x
(i)t )
πt−1(x(i)0:t−1)qt(x
(i)t |x(i)
0:t−1)︸ ︷︷ ︸incremental weight (iw)
normalizing∑Ni=1 w
(i)t = 1
14
Sequential Importance Sampling
x0:t ∼ πt(x0:t)⇒ (x(i)0:t, w
(i)t )1≤i≤N
proposal + reweighting →
\pi(x_t)
x_t
15
Sequential Importance Sampling
proposal + reweighting → var{(w(i)t )1≤i≤N} ↗ with t
x_t
\pi(x_t)
→ w(i)t ≈ 0 for all i except one
16
⇒ Resampling
x_t
\pi(x_t)
0 x_t^(1)
x_t^(j)1x_t^(i)2 x_t^(k)3
x_t^(N)0
→ draw N particles paths from the set (x(i)0:t)1≤i≤N
with probability (w(i)t )1≤i≤N
17
Sequential Importance Sampling/Resampling
Simulation scheme t− 1 → t :
– Sampling step x,(i)t ∼ qt(x,t|x(i)
0:t−1)
– Updating weights w(i)t ∝ w(i)
t−1 ×πt(x
(i)0:t−1,x
,(i)t )
πt−1(x(i)0:t−1)qt(x
,(i)t |x
(i)0:t−1)
→ parallel computing
– ⇒ Resampling step : sample N paths from (x(i)0:t−1, x
,(i)t )1≤i≤N
→ particles interacting : computation at least O(N)
18
SISR for recursive estimation of state space models
xt = ft(xt−1, ut) → p(xt|xt−1)
yt = gt(xt, vt) → p(yt|xt)
Usual SISR : Bootstrap filter (Gordon et al. 93, Kitagawa 96) :
– Sampling step x(i)t ∼ p(xt|x(i)
t−1)
– Updating weights : incremental weight w(i)t ∝ w(i)
t−1 × iw
iw ∝ p(yt|x(i)t )
– Stratified/Deterministic resampling
efficient, easy, fast for a wide class of models
→ tracking, time series
19
Overview - Break ,
– Introduction :
→ state space models
→ estimation, computating estimates via Monte Carlo methods
→ importance sampling
– recursive estimation → sequential simulation
⇒ Sequential Importance Sampling/Resampling
– ⇒ Strategies for sampling :
→ designing/sampling “optimal” candidate distribution
→ considering blocks of variables : reweighting, → sampling
– Examples and applications
20
Improving simulation
sampling multimodal, multidimensional distributions
model with informative observation → peaky likelihood
→ prior dynamics to diffuse particles : poor approximation results
→ efficient propagation for a finite number of particles N
⇒ need for good sampling proposals
21
Improving simulation
Optimal proposal distribution qt(xt|x(i)0:t−1)
→ mimimizing variance of incremental weight (w(i)t ∝ w(i)
t−1 × iw)
iw =πt(x
(i)0:t−1, x
(i)t )
πt−1(x(i)0:t−1)qt(x
(i)t |x(i)
0:t−1)
⇒ 1-step ahead predictive :
πt(xt|x0:t−1) = p(xt|xt−1, yt)
⇒ incremental weight :
iw → πt(x0:t−1)
πt−1(x0:t−1)=
p(x0:t−1|y1:t)
p(x0:t−1|y1:t−1)
∝ p(yt|xt−1) =
∫p(yt|xt)p(xt|xt−1)dxt
22
Approximations
Sampling the predictive distribution πt(xt|x0:t−1) = p(xt|xt−1, yt) :
– expansions of the p.d.f. or log(p.d.f.), Taylor
– mixture models : Gaussian∑i πiN (µi, σ
2i )
– Accept/Reject schemes
– Markov chain schemes : Metropolis-Hastings, Gibbs sampler
– dynamic stochastic simulation (Hybrid Monte Carlo)
– augmented sampling spaces :
→ slice samplers
→ auxiliary variables
23
Auxiliary variables
Pitt and Shephard 99 : approximating predictive p(xt|x(k)t−1, yt)
via augmented sampling space → p(xt, k|x(k)t−1, yt)
x_t
p(x_t/y_t)
x_t^(j)
x_t−1^(1)0
x_t−1
p(x_t−1/y_t−1)
1 x_t−1^(j) 3 x_t−1^(k)
x_t−1(N)0
x_t−1^(i)2
x_t^(i2)x_t^(i1) x_t^(k1) x_t^(k3)
x_t^(k2)
index of particle k (→ number of offsping(s) of particle x(k)t−1) ∼ .|yt
⇒ boost particles with high likelihood
24
Auxiliary variables
→ importance sampling for p(xt, k|x(k)t−1, yt) :
candidate distribution :
g(xt, k|xt−1, yt) ∝ p(yt|µ(k)t )p(xt|x(k)
t−1)
where µ(k)t = mean, mode, draw from xt|x(k)
t−1
x_t
p(x_t/x_t−1^(k))
mean maximummu_t^(k)= ,
25
Auxiliary variables
– sample (x(j)t , kj)1≤j≤R from g(xt, k|x(k)
t−1, yt) :
k ∼ g(k|xt−1, yt) ∝∫p(yt|µ(k)
t )p(xt|x(k)t−1)dxt = p(yt|µ(k)
t )
xt ∼ p(xt|x(k)t−1)
– reweighting (x(j)t , kj) with
wj ∝p(yt|x(j)
t )
p(yt|µ(kj)t )
– resample N paths from (x(kj)0:t−1, x
(j)t )1≤j≤R with second stage
weights wj
26
Improving simulation
sampling/approximating predictive πt(xt|x0:t−1) may not be sufficient
for diffusing particles efficiently : e.g. discrepancy (πt)t>0 high :
⇒ consider a block of variables xt−L:t for a fixed lag L
27
Approaches using a block of variables
– discrete distributions : Meirovitch 85
– reweighting before resampling :
auxiliary variables Pitt and Shephard 99,
Wang et al. 02
⇒ discrete distribution → analytical form for proposal
xt ∼ πt+L(xt|x0:t−1) =
∫πt+L(xt:t+L|x0:t−1)dxt+1:t+L
Meirovitch 85 : growing a polymer, random walk in discrete space
→ complexity ]XL for lag L
28
Reweighting idea
29
Reweighting technique
30
Reweighting + resampling
21
010
0
0
0
1 1
31
Approaches using a block of variables
→ auxiliary variables : Pitt and Shephard 99
proposal distribution :
p(xt, k|x(k)t−1, yt:t+L) ∝
∫p(xt:t+L, k|x(k)
t−1, yt:t+L)dxt+1:t+L
approximated with importance sampling :
g(xt, k|x(k)t−1, yt:t+L) = p(yt+L|µ(k)
t+L) . . . p(yt|µ(k)t )p(xt|x(k)
t−1)
→ sample (x(j)t , kj)1≤j≤R
kj ∼ g(k|yt:t+L) ∝ p(yt+L|µ(k)t+L) . . . p(yt|µ(k)
t )
x(j)t ∼ p(xt|x
(kj)t−1 )
32
Approaches using a block of variables
auxiliary variables → resampling from (x(j)t )1≤j≤R :
→ propagate/sample x(j)t+1 → x
(j)t+L with prior transitions p(xt|xt−1)
→ use second stage weights : w(j)t ∝ w(j)
t−1 × iw
iw ∝ p(yt+L|x(j)t+L) . . . p(yt|x(j)
t )
p(yt+L|µ(kj)t+L) . . . p(yt|µ(kj)
t )
for resampling N paths (x(i)0:t)1≤i≤N
33
Approaches using a block of variables
→ reweighting before resampling : Wang et al. 02
0
x_t^(i)
x_t
t
w_t^(i)
a_t^(i)
t t+L
x_t+L^(i_j)
a_t^(i_j)
propagate particles x(i)t → x
(ij)t+1:t+L for j = 1, . . . , R
compute weights a(ij)t , particle path x
(i)0:t reweighted with e.g.
a(i)t =
(∑Rj=1 a
(ij)t
)α, resampling from the set (x
(i)0:t, a
(i)t )i=1,...,N
34
Reweighting
→ need to sample/propagate xt from a/by block of variables :
πt+L(xt|x0:t−1) =
∫πt+L(xt:t+L|x0:t−1)dxt+1:t+L
⇒ sampling a block of variables
→ design a proposal/candidate distribution
35
Sampling recursively a block of variables
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
36
Sampling a block of variables
t−L t−L+1 tt−1
t−L+1x’(
0
:0 t−1x( )
: t)
direct sampling :
x′t−L+1:t ∼ qt(x′t−L+1:t|x0:t−1)
37
Sampling a block of variables
t−L t−L+1 tt−1
t−L+1x’(
t−L+1x(
0
:0 t−1x(
:0 t−Lx(
: t)
)
)
)t−1:
proposal/candidate distribution for the block :
(x0:t−L, x′t−L+1:t) ∼
∫πt−1(x0:t−1)qt(x
′t−L+1:t|x0:t−1)dxt−L+1:t−1
38
Sampling a block of variables
⇒ Idea : consider extended block of variables
(x0:t−L, x′t−L+1:t)→ (x0:t−L, xt−L+1:t−1, x
′t−L+1:t) = (x0:t−1, x
′t−L+1:t)
t−L t−L+1 tt−1
t−L+1x’(
t−L+1x(
0
:0 t−Lx(
: t)
)
)t−1:
39
Sampling a block of variables
candidate distribution for extended block
(x0:t−L, xt−L+1:t−1, x′t−L+1:t) = (x0:t−1, x
′t−L+1:t) :
(x0:t−1, x′t−L+1:t) ∼ πt−1(x0:t−1)qt(x
′t−L+1:t|x0:t−1)
direct sampling :
(x0:t−L, x′t−L+1:t) ∼
∫πt−1(x0:t−1)qt(x
′t−L+1:t|x0:t−1)dxt−L+1:t−1
40
Sampling a block of variables
target distribution for the block (x0:t−L, x′t−L+1:t) :
πt(x0:t−L, x′t−L+1:t)
⇒ auxiliary target distribution for the extended block
(x0:t−1, x′t−L+1:t) = (x0:t−L, xt−L+1:t−1, x
′t−L+1:t) :
πt(x0:t−L, x′t−L+1:t)rt(xt−L+1:t−1|x0:t−L, x
′t−L+1:t)
with rt = any conditional distribution
⇒ proposal + target distributions → importance sampling
41
Sequential Importance Block Sampling/Resampling
Simulation scheme t− 1 → t (index (i) dropped) :
– Proposal sampling step
x′t−L+1:t ∼ qt(x′t−L+1:t|x0:t−1)
– Updating weights
wt ∝ wt−1 ×πt(x0:t−L, x′t−L+1:t)rt(xt−L+1:t−1|x0:t−L, x′t−L+1:t)
πt−1(x0:t−1)qt(x′t−L+1:t|x0:t−1)
– Resampling step
42
Sampling techniques for a block of variables
sampling the block xt−L+1:t ∼ qt(xt−L+1:t|x0:t−1) :
→ forward-backward recursion : e. g. Carter and Kohn 94
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
43
Sampling techniques for a block of variables
→ forward-backward recursion :
t−L t−L+1 tt−1
xt−L:t−1 → xt−L+1:t : imputing xt and re-imputing xt−L+1:t−1
→ approximations : expansions, mixture models, MCMC, . . .
44
Improving simulation
Optimal proposal qt(x′t−L+1:t|x0:t−1) distribution :
→ mimimizing variance of incremental weight w(i)t ∝ w(i)
t−1 × iw :
iw =πt(x0:t−L, x′t−L+1:t)rt(xt−L+1:t−1|x0:t−L, x′t−L+1:t)
πt−1(x0:t−1)qt(x′t−L+1:t|x0:t−1)
⇒ qt = L-step ahead predictive
πt(x′t−L+1:t|x0:t−L) = p(x′t−L+1:t|xt−L, yt−L+1:t)
For one variable : optimal qt = 1-step ahead predictive
πt(xt|x0:t−1) = p(xt|xt−1, yt)
45
Improving simulation
→ block of variables ⇒ optimal proposal and target distribution
mimimizing variance of incremental weight w(i)t ∝ w(i)
t−1 × iw
iw =πt(x0:t−L, x′t−L+1:t)rt(xt−L+1:t−1|x0:t−L, x′t−L+1:t)
πt−1(x0:t−1)qt(x′t−L+1:t|x0:t−1)
→ optimal conditional distribution rt(xt−L+1:t−1|x0:t−L, x′t−L+1:t)
⇒ rt = (L− 1)-step ahead predictive
πt−1(xt−L+1:t−1|x0:t−L) = p(xt−L+1:t−1|xt−L, yt−L+1:t−1)
46
Improving simulation
For optimal qt and rt, incremental weight w(i)t ∝ w(i)
t−1 × iw :
iw → πt(x0:t−L)
πt−1(x0:t−L)=
p(x0:t−L|y1:t)
p(x0:t−L|y1:t−1)∝ p(yt|xt−L, yt−L+1:t−1)
∝∫p(yt, xt−L+1:t|xt−L, yt−L+1:t−1)dxt−L+1:t
SISR for one variable with optimal proposal qt :
iw → πt(x0:t−1)
πt−1(x0:t−1)= p(yt|xt−1) =
∫p(yt|xt)p(xt|xt−1)dxt
Bootstrap filter : iw = p(yt|xt)
47
Approximations for block sampling
Sampling (sub-)optimal qt(x′t−L+1:t|x0:t−1) :
→ exact/approximated forward-backward recursions
→ approximations : expansions, mixture models, MCMC, . . .
For approximated optimal qt and rt, incremental weight :
πt(x0:t−L, x′t−L+1:t)qt−1(xt−L+1:t−1|x0:t−L)
πt−1(x0:t−1)qt(x′t−L+1:t|x0:t−L)
p(x0:t−L, x′t−L+1:t|y1:t)q(xt−L+1:t−1|xt−L, yt−L+1:t−1)
p(x0:t−1|y1:t−1)q(x′t−L+1:t|xt−L, yt−L+1:t)
48
Overview - Break ,– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling :
→ “optimal” candidate distribution
sampling with e.g. auxiliary variables
→ considering a block of variables : reweighting
⇒ sampling a block of variables :
definition of importance sampling for a block
performing sampling → “optimal” candidate distribution
– ⇒ Examples, applications :
→ simple, complex models
→ why the sampling strategy for particles can be crucial ?
49
Example
Linear and Gaussian state space model :
xt = αxt−1 + ut x0, ut ∼ N (0, 1)
yt = xt + vt vt ∼ N (0, σ2)
Sequential Monte Carlo methods :
– Bootstrap filter, proposal p(xt|xt−1)
– SISR with optimal proposal p(xt|xt−1, yt)
– SISR for blocks with optimal proposal p(xt−L+1:t|xt−L, yt−L+1:t)
computed by forward-backward exact recursions
⇒ estimates compared with Kalman filter results
⇒ approximation of target distribution p(xt|y1:t)
50
Estimation
0 10 20 30 40 50 60 70 80 90 100−4
−3
−2
−1
0
1
2
3
4
time index
x(t)
x(t) and estimates
model (α, σ) = (0.9, 0.1) xt =∑Ni=1 w
(i)t x
(i)t N=100
51
Approximation of the target distribution
⇒ Effective Sample Size :
ESS =1
∑Ni=1[w
(i)t ]2
w(i) = 1N : ESS = N
\pi(x_t)
x_t
w(i) ≈ 0 ∀i except one : ESS = 1
x_t
\pi(x_t)
⇒ Resampling performed for ESS ≤ N2 ,
N10
52
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
time index
ESS
Efficient Sample Size (ESS)
ESS for Bootstrap (→), SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
53
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100, 100 time steps
algorithm ESS resampling steps CPU time
Bootstrap 11.19 99 0.84
optimal SISR 77.1 2 0.12
Block-SISR L = 2 99.1 1 0.23
54
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100, ∞ time steps
algorithm ESS resampling steps CPU time
Bootstrap 10 100% ∝0.84
optimal SISR 75 0.04% ∝0.12
Block-SISR L = 2 99 0% ∝0.23
55
Approximation of the target distribution
Resampling for ESS ≤ N2 , various N
algorithm ESS resampling steps
Bootstrap 10%N 100%
optimal SISR 75%N 0.04%
Block-SISR L = 2 99%N 0%
computational complexity : resampling O(N) → CPU time
56
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
100 150 200 250 300 350 400 450 5000
5
10
15
20
25
30
35
40
Number of particles N
CPU t
ime
Bootstrap (→), SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
57
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
10
20
30
40
50
60
70
number of particles N
CPU t
ime
CPU time/number of particles N
SISR (→) and SISR for blocks
of 2 variables (→) with optimal proposals
58
Sequential Monte Carlo methods
for this model :
– Same estimation results as Kalman filtering
– sampling ⇒ 6= approximation for the target distribution
– N ≤ 500 : computational complexity, CPU time
→ SISR with optimal proposal p(xt|xt−1, yt)
– N ≥ 500 : → block SISR with optimal proposal
p(xt−L+1:t|xt−L, yt−L+1:t)
59
Sampling strategies
xt = αxt−1 + ut x0, ut ∼ N (0, σ2u)
yt = xt + vt vt ∼ N (0, σ2v)
– σv=0.1 → observation yt very informative/prior σu=1.0
⇒ take into account yt for diffusing particles
p(xt|xt−1)→ p(xt|xt−1, yt) ⇒ ESS ↗
– α=0.9 → variables (xt)t correlated
⇒ sampling by block xt−L+1:t
block of observations yt−L+1:t more informative/single one yt
p(xt|xt−1, yt)→ p(xt−L+1:t|xt−L, yt−L+1:t) ⇒ ESS ↗
60
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100
0 20 40 60 80 1000
20
40
60
80
100
0 20 40 60 80 1000
20
40
60
80
100
120
0 20 40 60 80 1000
20
40
60
80
100
120
0 20 40 60 80 1000
20
40
60
80
100
ESS for Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
61
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
100 200 300 400 5000
10
20
30
40
100 200 300 400 5000
5
10
15
20
100 200 300 400 5000
5
10
15
20
100 200 300 400 5000
10
20
30
40
Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
62
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
0 2000 4000 6000 8000 100000
200
400
600
800
0 2000 4000 6000 8000 100000
100
200
300
400
0 2000 4000 6000 8000 100000
200
400
600
800
0 2000 4000 6000 8000 100000
50
100
150
200
250
300
350
SISR with optimal proposals for 1 (→), 2 (→) and 10 variables (→)
σu=1.0, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
63
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100
0 20 40 60 80 1000
20
40
60
80
100
0 20 40 60 80 10030
40
50
60
70
80
90
100
0 20 40 60 80 1000
20
40
60
80
100
0 20 40 60 80 10040
50
60
70
80
90
100
ESS for Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
64
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
100 200 300 400 5000
5
10
15
20
100 200 300 400 5000
5
10
15
20
25
100 200 300 400 5000
5
10
15
20
100 200 300 400 5000
5
10
15
20
Bootstrap (→), SISR with optimal proposals for 1 (→),
2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
65
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
0 2000 4000 6000 8000 100000
100
200
300
400
0 2000 4000 6000 8000 100000
50
100
150
200
250
0 2000 4000 6000 8000 100000
50
100
150
200
250
300
350
0 2000 4000 6000 8000 100000
50
100
150
200
SISR with optimal proposals for 1 (→), 2 (→) and 10 variables (→)
σu=0.1, left/right : σv=0.1/1.0, top/bottom : α=0.9/0.5
66
Overview- Break ,
– Introduction : state space models, Monte Carlo methods
– Sequential Importance Sampling/Resampling
– Strategies for sampling
– Applications : Linear and Gaussian model
⇒ sampling strategy :
→ approximation of the target distribution, CPU time
→ information in observation, dynamic of the state variable
→ nonlinear non-Gaussian models
67
Example
Nonlinear state space model :
xt = α(xt−1 + βx3t−1) + ut x0, ut ∼ N (0, σ2
u)
yt = xt + vt vt ∼ N (0, σ2v)
Sequential Monte Carlo methods :
– Bootstrap filter, proposal p(xt|xt−1)
– SISR with optimal proposal p(xt|xt−1, yt) approximated by
KF/EKF
– SISR for blocks with optimal proposal p(xt−L+1:t|xt−L, yt−L+1:t)
approximated by forward-backward recursions with KF/EKF
Parameters values α=0.9, β=0.2, σu=0.1 and σv=0.05
⇒ approximation of target distribution p(xt|y1:t)
68
Simulation results
algorithm MSE ESS RS CPU
Bootstrap 0.0021 36.8 70.3 % 0.68
SISR-KF 0.0019 64.7 19.3% 0.44
SISR-EKF 0.0019 65.8 19.2% 0.48
BSISR-KF 0.0018 72.3 0.9% 0.21
BSISR-EKF 0.0018 73.5 0.8% 0.24
N = 100 particles, 100 runs of particle filters for a single and for a
block of L = 2 variables (MSE from KF/EKF = 0.0034).
69
Approximation of the target distribution
Resampling for ESS ≤ N2 , N = 100
0 20 40 60 80 100 120 140 160 180 2000
10
20
30
40
50
60
70
80
90
100
time index
Effect
ive Sa
mple S
ize
Approximated ESS vs. time index for a realization of the Bootstrap
filter (dotted), the SISR with Kalman filter proposal for a single
variable (dashdotted) and for a block of L=2 variables (straight).
70
Simulation results
block size L N=100 N=500 N=1000 RS
2 74 370 715 0.9%
3 96 493 985 0.9%
4 99 496 989 1%
5 98 494 988 1%
10 97 486 972 2.5%
Approximated ESS averaged over 100 runs of particle filters for
blocks of L variables, considering N particles.
71
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
100 200 300 400 500 600 700 800 900 10000
1
2
3
4
5
6
7
Number of Particles
Comp
uting T
ime
CPU time vs. N for bootstrap filter (dotted), SISR with KF proposal
for a single variable (KF : dashed, EKF : dashdotted) and for a block
of L=2 variables (straight), 100 realizations.
72
CPU time / number of particles N
Resampling for ESS ≤ N2 , 1,000 time steps
2000 4000 6000 8000 100000
2
4
6
8
10
12
14
Number of Particles
Comp
uting T
ime
Computational time vs. N for block sampling scheme with lags from
L=2 (bottom), 3, 4, 5, 10 (top), 100 realizations.
73
Sequential Monte Carlo methods
for this model :
– Good approximations of the target distribution
– sampling ⇒ 6= approximation for the target distribution
– even for small N , block SISR with approximated optimal proposal
p(xt−L+1:t|xt−L, yt−L+1:t) is efficient for L=3, 4, 5
– → information in observation : σu=0.1, σv=0.05
74
Conclusion
⇒ Importance of proposal/candidate distribution for
Sequential Monte Carlo simulation methods
design of proposal :
→ information in observation, dynamic of the state variable :
p(xt|xt−1)←→ p(xt|yt, xt−1)←→ p(xt|yt)
→ sampling a block/fixed lag of variables can be useful :
– for intermittent/informative observation, correlated variables
– applications ⇒ tracking, radar, navigation, positioning . . .
75
References - SISR, Sequential Monte Carlo
– N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear and non-Gaussian Bayesian state estimation,”
Proceedings IEE-F, vol. 140, pp. 107–113, 1993.
– G. Kitagawa, “Monte carlo filter and smoother for non-Gaussian
nonlinear state space models,” J. Comput. Graph. Statist., vol. 5,
pp. 1–25, 1996.
– A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte
Carlo methods in practice, Statistics for engineering and
information science. Springer, 2001.
76
References - block/fixed lag approaches
– H. Meirovitch, “Scanning method as an unbiased simulation
technique and its application to the study of self-avoiding random
walks,” Phys. Rev. A, vol. 32, pp. 3699–3708, 1985.
– M. K. Pitt and N. Shephard, “Filtering via simulation : auxiliary
particle filter,” J. Am. Stat. Assoc., vol. 94, pp. 590–599, 1999.
– X. Wang, R. Chen, and D. Guo, “Delayed-pilot sampling for
mixture Kalman filter with application in fading channels,” IEEE
Trans. Sig. Proc., vol. 50, pp. 241–253, 2002.
77
References - block/fixed lag sampling methods
– A. Doucet and S. Senecal, “Fixed-Lag Sequential Monte Carlo”,
accepted at EUSIPCO 2004.
– S. Senecal and A. Doucet, “An example of sequential Monte Carlo
block sampling method,” AIC2003 Science of Modeling,
pp. 418-419, 2003.
– C. K. Carter and R. Kohn, “On the Gibbs sampling for state space
models,” Biometrika, vol. 81, pp. 541–553, 1994.
78