Particle Filters in High Dimensions - IRISA · 2018-06-20 · Bain, A., DC, Fundamentals of Stochastic Filtering, Series: Stochastic Modelling and Applied Probability, Vol. 60, Springer

Particle Filters in High Dimensions

Dan Crisan

Imperial College London

Workshop - Simulation and probability: recent trendsThe Henri Lebesgue Center for Mathematics

5-8 June 2018Rennes

Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 1 / 55

. Part 1: Theoretical Considerations

Stochastic Filtering

Particle filters/ Sequential Monte Carlo methods

Convergence Result

Final remarks

◦ DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.

◦ DC, A Doucet, A survey of convergence results on particle filtering methodsfor practitioners, IEEE Transactions on signal processing, 2002.

◦ A Doucet, AM Johansen, A tutorial on particle filtering and smoothing:Fifteen years later, The Oxford handbook of nonlinear filtering, 2011.

◦ P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.

◦ A. Bain, DC, Fundamentals of Stochastic Filtering, Springer, 2009.

◦ DC, B Rozovskii, The Oxford handbook of nonlinear filtering, OxfordUniversity Press, 2011.


. What is stochastic filtering ?

Stochastic Filtering: The process of using partial observations and astochastic model to make inferences about an evolving dynamical system.

X the signal process - “hidden component”Y the observation process - “the data”



The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

Discrete framework: {Xt , Yt}t≥0 Markov process

The signal process

• {Xt}t≥0 Markov chain, X0 ∼ π0 (dx0)

• P (Xt ∈ dxt |Xt−1 = xt−1) = Kt (xt−1, dxt) = ft(xt |xt−1)dt ,

• Example: Xt = b (Xt−1) + σ (Xt−1) Bt , Bt ∼ N (0, 1) i.i.d.

The observationprocess

• P(Yt ∈ dyt |X[0,t] = x[0,t], Y[0,t−1] = y[0,t−1]

)= P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dy

• Example: Yt = h (Xt) + Vt , Vt ∼ N (0, 1) i.i.d.

where X[0,t] , (X0, ..., Xt) , x[0,t] , (x0, ..., xt) .



Notation:• posterior measure: the conditional distribution of the signal Xt given Yt

πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

• predictive measure: the conditional distribution of the signal Xt given Yt−1

pt (A) = P(Xt ∈ A|Yt−1), t ≥ 0, A ∈ B(Rd ).

• If μ is a measure and f is a function, then μ (f ) ,∫

f (x)μ (dx) .• If f is a function and k is a kernel, then kf (x) ,

∫f (y)k (x , dy) .

• If μ is a measure and k is a kernel, then μk (A) ,∫

μ (dx) k (x , A) .

Bayes’ recursion.

Prediction pt = πt−1KtUpdating πt = gt ? pt

In other words, dπtdpt = C−1t gt , where Ct ,

∫Rd gt (yt , xt) pt (dxt).


. Particle filters

Particle filters/Sequential Monte Carlo Methods:

1. Class of approximations:

SMC(aj (t)︸︷︷︸weight

, v1j (t) , . . . , vdj (t)

︸︷︷︸position

)Nj=1

πt πNt =∑N

j=1 aj (t) δvj (t)

2. The law of evolution of the approximation:

SMC

πNt−1

mutation︷︸︸︷−→ pNt

selection︷︸︸︷−→ πNt

3. The measure of the approximating error:

supϕ∈B(Rd )

E [|πnt (ϕ) − πt(ϕ)|] , π̂t − π̂nt .


. The classical/standard/bootstrap/garden-variety particle filter

πn = {πn(t), t ≥ 0} the occupation measure of a system of weighted particles

πn(0) =n∑

i=1

1n

δxni −→ πn(t) =n∑

i=1

āni (t)δV ni (t).

• DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.• P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.


The Filtering Problem Framework: discrete/continuous time

1. Initialisation [t = 0].

For i = 1, ..., N, sample x (i)0 from π0,

πN0 =1N

N∑

i=1

δx (i)0.

2. Iteration [t − 1 to t ].Let x (i)t−1, i = 1, . . . , n be the positions of the particles at time t − 1.

πNt−1 =1N

N∑

i=1

δx (i)t−1.

Step 1.

For i = 1, ..., n, sample x̄ (i)t from ft−1(xt |x(i)t−1)dxt .

pNt =1N

N∑

i=1

δx̄ (i)t.



Compute the (normalized) weight ā(i)t = gt(x̄(i)t )/(

∑nj=1 gt(x̄

(j)t )).

π̄Nt =N∑

i=1

ā(i)t δx̄ (i)t= gt ? pNt .

Step 2.

Replace each particle by ξ(i)t offsprings such that∑n

i=1 ξ(i)t = n.

[Sample with replacement n-times from x̄ (i)t , ]Denote the positions of the particles by x (i)t , i = 1, . . . , n.

πNt =1N

N∑

i=1

δx (i)t.

Further details in:

Bain, A., DC, Fundamentals of Stochastic Filtering, Series: StochasticModelling and Applied Probability, Vol. 60, Springer Verlag, 2009.



Theorem

πn converges to π. Moreover

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

.

and√

N(πN − π) converges to a measure valued process ū = {ūt , t ≥ 0}.



Notation:

• Error(π, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|πNt (ϕ) − πt(ϕ)|]

• Error(p, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|pNt (ϕ) − pt(ϕ)|]

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

.

Theorem

For all T > 0, there exists cT such that

Error(π, T , N) ≤cT√N

, Error(p, T , N) ≤cT√N

if and only if Error(π, 0, N) ≤ c0√N

and, for all T > 0, there exists cT such that

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|pNt (ϕ) − πNt−1Kt(ϕ)|] ≤

cT√N

supt∈[0,T ]

sup{‖ϕ‖∞≤1}

EY [|πNt (ϕ) − π̄Nt (ϕ)|] ≤

cT√N

.



Proof.” ⇒ ”Immediate from the following two inequalities

∣∣∣pNt ϕ − π

Nt−1Ktϕ

∣∣∣ ≤∣∣∣pNt ϕ − ptϕ

∣∣∣+∣∣∣πt−1(Ktϕ) − πNt−1(Ktϕ)

∣∣∣ ,

∣∣∣πNt ϕ − π̄

Nt ϕ∣∣∣ ≤∣∣∣πNt ϕ − πtϕ

∣∣∣+∣∣∣πtϕ − π̄

Nt ϕ∣∣∣

where we used the fact that pt = πt−1Kt .” ⇐ ”Induction. The case t = 0 is assumed. The induction step is obtained asfollows: Since pt = πt−1Kt by the triangle inequality

|pNt ϕ − ptϕ| ≤ |pNt ϕ − π

Nt−1Ktϕ| + |π

Nt−1Ktϕ − πt−1Ktϕ|.

Also

π̄Nt ϕ−πtϕ=pNt (ϕgt)

pNt gt−

pt(ϕgt)ptgt

=−pNt (ϕgt)

pNt gt × ptgt(pNt gt−ptgt)+

(pNt (ϕgt)

ptgt−

pt(ϕgt)ptgt

)

,

and as |pNt (ϕgt)| ≤ ‖ϕ‖∞pNt gt ,

∣∣∣π̄Nt ϕ − πtϕ

∣∣∣ ≤

‖ϕ‖∞ptgt

∣∣∣pNt gt − ptgt

∣∣∣+

1ptgt

∣∣∣pNt (ϕgt) − pt(ϕgt)

∣∣∣ .



Remarks:

Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

The generic SMC method involves sampling from the prior distribution ofthe signal and then using a weighted bootstrap technique (or equivalent)with weights defined by the likelihood of the most recent observation data.

Step 2 can be done by means of sampling with replacement (SIRalgorithm), stratified sampling, Bernoulli sampling,Carpenter-Clifford-Fearnhead-Whitley genetic algorithm, Crisan-LyonsTBBA algorithm. All these methods satisfy the convergence requirement.

If d is small to moderate, then the standard particle filter can perform verywell in the time parameter n.

Under certain conditions, the Monte Carlo error of the estimate of thefilter can be uniform with respect to the time parameter.



Remarks:

The function xk 7→ g(xk , yk ) can convey a lot of information about thehidden state, especially so in high dimensions. If this is the case, usingthe prior transition kernel f (xk−1, xk ) as proposal will be ineffective.It is then known that the standard particle filter will typically performpoorly in this context, often requiring that N = O(κd ).

10−3.5

10−3

10−2.5

10−2

5 10 15 20 25 30DimensionW

allclock

timepertimestep

(secon

ds) Algorithm PF STPF

Figure: Computational cost per time step to achieve a predetermined RMSE versusmodel dimension, for standard particle filter (PF) and STPF.


The Filtering Problem Application to high-dimensional problems

Why is the high-dimensional filtering problem hard ?

A running example

Using particle filers to solve high-dimensional filtering problems

Final remarks

Research partially supported by EPSRC grant EP/N023781/1.Numerical work done by Wei Pan (Imperial College London).

◦ A. Beskos, DC, A. Jasra, Ajay; K. Kamatani, Y. Zhou, Y A stable particle filter for aclass of high-dimensional state-space models. Adv. in Appl. Probab. 49 (2017).

◦ A. Beskos, DC, A. Jasra, On the stability of sequential Monte Carlo methods in highdimensions, Ann. Appl. Probab. 24 (2014).

◦ C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Numerically Modelling StochasticLie Transport in Fluid Dynamics, https://arxiv.org/abs/1801.09729

◦ C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Sequential Monte Carlo forStochastic Advection by Lie Transport (SALT): A case study for the damped and forcedincompressible 2D stochastic Euler equation, in preparation.


The Filtering Problem Why is the high-dimensional problem hard ?

Consider

Π0 = N (0, 1) (mean 0 and variance matrix 1).Π1 = N (1, 1) (mean 1 and variance matrix 1).Πd = N (d , 1) (mean d and variance matrix 1).d(Π0, Π1)TV = 2P [ |X | ≤ 1/2 ], X ∼ N(0, 1).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other.as d increases, it becomes increasingly harder to use standardimportance sampling, to construct a sample from Π3 by using a proposalfrom Π1, weighting it using

dΠddΠ0

and (possibly) resample from it.


The Filtering Problem Why is the high-dimensional problem hard ?

ConsiderΠ0 = N ((0, . . . , 0), Id ) (mean (0, . . . , 0) and covariance matrix Id ).Πd = N ((1, . . . , 1), Id ) (mean (1, . . . , 1) and covariance matrix Id ).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other exponentially fast.it becomes increasingly harder to use standard importance sampling, toconstruct a sample from Πd by using a proposal from Π0.‘Moving’ from Π0 to Πd is equivalent to moving from a standard normaldistribution N (0, 1) to a normal distribution N (d , 1) (the total variationdistance between N (0, 1) and N (d , 1) is the same as that between Π1and Π2).

Add-on techniques:

• Tempering * • Optimal transport prior 7→posterior• Sequential DA in space * • Jittering *• Model Reduction (High 7→Low Res)* • Nudging• Hybrid models • Hamiltonian Monte Carlo• Informed priors • Localization


The Filtering Problem What is DA ?

State estimation in Numerical Weather Prediction

Data Assimilation at the UK Met Officeset of methodologies that combines past knowledge of a system in theform of a numerical model with new information about that system in theform of observations of that system.designed to improve forecasting, reduce model uncertainties and adjustmodel parameters.termen used mainly in the computational geoscience communitymajor component of Numerical Weather Prediction

Variational DA: combines the model and the datathrough the optimisation of a given criterion(minimisation of a so-called cost-function).

Ensemble based DA: uses a set of modeltrajectories/possible scenarios that areintermittently updated according to data and areused to infer the past, current or future position ofa system.

Hurricane Irma forecast: a. ECMWF, b. USA Global Forecast


The Filtering Problem A stochastic transport model

Consider a two dimensional incompressible fluid flow u defined on 2D-torusΩ = [0, Lx ] × [0, Ly ] modelled by the two-dimensional Euler equations withforcing and dampening. Let q = ẑ × curl u denote the vorticity of u, where ẑdenotes the z-axis. For a scalar field g : Ω → R, we write∇⊥ g = (−∂y g, ∂xg)

T . Let ψ : Ω × [0,∞) → R denote the stream function.

∂t q + (u ∙∇) q = Q − rq

u = ∇⊥ ψ

Δψ = q.

Q is the forcing term given by Q = 0.1 sin (8πx)

r is a positive constant - the large scale dissipation time scale.

we consider slip flow boundary condition ψ∣∣∂Ω

= 0.

evolution of Lagrangian fluid parcels

dxtdt

= u(xt , t) .



Domain is [0, 1]2

PDE System | SPDE System

∂tω + u ∙ ∇ω = Q − rω | dq + ū ∙ ∇qdt +∑

i

ξi ∙ ∇q ◦ dW it = (Q − rq) dt

u = ∇⊥ψ | ū = ∇⊥ψ̃

Δψ = ω | Δψ̃ = q

Q = 0.1 sin (8πx), r = 0.01. Boundary Condition ψ|∂Ω = 0 and ψ̃∣∣∣∂Ω

= 0.

PDE SPDEGrid Resolution 512x512 64x64Numerical Δt 0.0025 0.01

Spin-up 40 ett ett: eddy turnover time L/uL ≈ 2.5 time units.Numerical scheme: a mixed continuous and discontinuous Galerkin finiteelement scheme + an optimal third order strong stability preservingRunge-Kutta, [Bernsen et al 2006, Gottlieb 2005].



Initial configuration for the vorticity

ωspin = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy)

+ 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx)(1)

from which we spin–up the system until an energy equilibrium state seems tohave been reached.This equilibrium state, denoted by ωinitial, is then chosen as the initial condition.



Plot of the numerical PDE solution at the initial time tinitial and itscoarse-grained version done via spatial averaging and projection of the finegrid stream-function to the coarse grid.



Plot of the numerical PDE solution at the final time t = tinitial + 146 large eddyturnover times (ett). The coarse-graining is done via spatial averaging andprojection of the fine grid streamfunction to the coarse grid.



Observations:u is observed on a subgrid of the signal grid (9 × 9 points)

Yt (x) =

{uSPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 1uPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 2

α is calibrated to the standard deviation of the true solution over a coarsegrid cell.


The Filtering Problem Initial condition

Initial Condition

A good choice of the initial condition is esential for the successfulimplementation of the filter.

In practice it is a reflection of the level of uncertainty of the estimate ofinitial position of the dynamical system.

We use the initial condition is to obtain an ensemble which containparticles that are reasonably ‘close’ to the truth.Choice for the running example

deformation - physically consistent with the system, casimirs preserved.We take a nominal value ωt0 and deform it using the following ‘modified’Euler equation:

∂tω + βi u(τi) ∙ ∇ω = 0 (2)

where βi ∼ N (0, �), i = 1, . . . , Np are centered Gaussian weights with anapriori variance parameter �, and τi ∼ U (tinitial, t0) , i = 1, . . . , Np are uniformrandom numbers. Thus each u (τi) corresponds to a PDE solution in thetime period [tinitial, t0).



Alternative choicesq + ζ where ζ is gaussian random field, doable but not physical, only worksfor q because it’s the least smooth of the three fields of interest . The otherfields are spatially smooth. also this breaks the SPDE well-posednesstheorem (function space regularity). Figure (ux,uy)



directly perturb ψ, by ψ + ψ̄ where ψ̄ = (I − κΔ)−1 ζ invert ellipticoperator with boundary condition ψ̄ = 0. Figure (ux , uy)


The Filtering Problem Add-on techniques

Model Reduction (High 7→Low Res)*

Model reduction is a valuable methodology that can lead to substantialcomputational savings: For the current numerical example we perform andstate space order reduction from 2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450.

However if applied erroneously order reduction it can introduce large errors.

Recall that the recursion formula for the conditional distribution of the signal

pt = πt−1Kt πt = gt ? pt ,

where dπtdpt = C−1t gt , where Ct ,

∫Rd gt (yt , xt) pt (dxt).

Remark. The conditional distribution of the signal is a continuous function of(π0, g1, ..., gt , K1, ..., Kt). In other words if

limε 7→0

(πε0, gε1 , ..., g

εt , K

ε1 , ..., K

εt ) = (π0, g1, ..., gt , K1, ..., Kt)

in a suitably chosen topology and

pεtΔ= πεt−1K

εt π

εt

Δ= gεt ? p

εt , (3)

then limε 7→0 πεt = πt and limε 7→0 pεt = pt (again, in a suitably chosen topology).


The Filtering Problem Add-on techniques

NB. Note that πεt is no long the solution of a filtering problem, but simply thesolution of the iteration (3)Order reduction can be theoretically justified through the continuity of theconditional distribution of the signal on (π0, g1, ..., gt , K1, ..., Kt). This is thecase when the order reduction is performed through a coarsening of the gridused for the numerical algorithm that approximates the dynamical system.

Example: We use a Stochastic PDE defined on a coarser grid:

∂t q + (u ∙∇) q +∞∑

k=1

(ξk ∙ ∇) q ◦ dBkt = Q − rq

u = ∇⊥ ψ

Δψ = q.

ξk are divergence free given vector fieldsξk are computed from the true solution by using an empirical orthogonalfunctions (EOFs) procedureBkt are scalar independent Brownian motions

dxt = u(xt , t) dt +∑

i

ξi(xt) ◦ dWi(t) .


The Filtering Problem Methodology to calibrate the noise

The reason for this “stochastic parametrization” is grounded in solid physicalconsiderations, see

D.D. Holm, Variational principles for stochastic fluids, Proc.Roy. Soc. A, 2015.

dxt = uft (xt)dt

dxt = uct (xt)dt +∑

i

ξi(xt) ◦ dWi(t)For each m = 0, 1, . . . , M − 1

1 Solve dxfij (t)/dt = uft (x

fij (t)) with initial condition x

fij (mΔT ) = xij .

2 Compute uct by low-pass filtering uft along the trajectory.

3 Compute xcij (t) by solving dxcij (t)/dt = u

ct (x

fij (t)) with the same initial

condition.4 Compute the difference Δxmij = x

fij ((m + 1)ΔT ) − x

cij ((m + 1)ΔT ), which

measures the error between the fine and coarse trajectory.



Having obtained Δxmij , we would like to extract the basis for the noise. Thisamounts to a Gaussian model of the form

Δxmij√δt

= Δ̃xij +N∑

k=1

ξkij ΔWkm,

where ΔW km are i.i.d. Normal random variables with mean zero, variance 1.We estimate ξ by minimising

E

∑

ijm

Δxmij√δt

− Δ̃xij −N∑

k=1

ξkij ΔWkm

2

,

where the choice of N can be informed by using Empirical OrthogonalFunction (EOFs).



Number of EoFs

decide on a case by case basis

too many will slow down the algorithm

On the left: Number of EOFs 90% variance vs 50% (no change).On the right: Normalised spectrum of the Δx covariance operator, showingnumber of eofs required to capture 50%, 70% and 90% of total variance



Model reduction UQ pictures sytems 512x512 vs 128x128 vs 64x64

(a) ux (b) uy



(a) psi (b) q



The aim of the calibration is to capture the statistical properties of the fastfluctuations, rather than the trajectory of the flow.

Validation of stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

Performance of DA algorithm relies on the correct modelling of theunresolved scales.

Evaluating the uncertainty arising from the choice of EOFs is be part ofthe particle filter implementation.



Ensemble Distance from the “truth”

Velocity Field d({

q̂i , i = 1, . . . , Np}

, ω, t)

:= mini∈{1,...,Np}‖ω(t)−q̂i (t)‖

L2(D)

‖ω(t)‖L2(D)



Implementation issues


The Filtering Problem Implementation issues

Number of Particles

decide on a case by case basistoo few will not give a reasonable solutiontoo many will slow down the algorithm

Picture Number of particles 225 (good) vs 500 (no change), 225 (good) vs 25(less good) 25 seems ok but we want as many as computationally feasible totune the algorithm

(a) psi 225 vs 500 (b) psi 225 vs 25


The Filtering Problem SIR fails

Classical Particle Filter fails !

Histogram of weights

Figure: example: loglikelihoods histogram, period 1 ett, 100 particles


The Filtering Problem Tempering

Framework:

{Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dxt ,

{Xt , Yt}t≥0 P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

A tempering procedure

For i = 1 to d

◦ reweight the particle using g1dt and (possibly) resample from it

◦ move particles using an MCMC that leaves gkdt ftπ[0,t−1] invariant

Beskos, DC, Jasra, On the stability of SMC methods in high dimensions, 2014.Kantas, Beskos, Jasra, Sequential Monte Carlo for inverse problems, 2014.



Initialisation t=0: For i = 1, . . . , n, sample qi0 from π0.Iteration (ti−1, ti ]: Given the ensemble {Xn (ti−1)}n=1,...,N ,

1 Evolve Xn (ti−1) using the SPDE to obtain Xn (ti) .2 Given X := {Xn (ti)}n=1,...,N , define normalised tempered weights

λ̄n,i (φ, X ) :=exp (−φΛn,i)∑m exp (−φΛm,i)

where the dependence on X means the Λn,i are computed using X .Define effective sample size

ESSi (φ, X ) :=∥∥λ̄i (φ, X )

∥∥−1

l2 .

Set φ = 1.3 ... While ESSi (φ, X ) < Nthreshold do:

(a) Find 1 − φ < φ′ < 1 such that ESSi (φ′ − (1 − φ) , X ) ≈ Nthreshold. Resampleaccording to λ̄n,i (φ′ − (1 − φ) , X ) and apply MCMC if required (i.e. whenthere are duplicated particles), to obtain a new set of particles X (φ′). Setφ = 1 − φ′ and X = X (φ′) .

(b) If ESSi ≥ Nthreshold then STOP and go to the next filtering step with{(Xn (ti) , λ̄n,i

)}n=1,...,N

.



MCMC Mutation

Algorithm

Given the ensemble {Xn,k (ti)}n=1,...,N corresponding to the k ’th temperingstep with temperatureφk , and proposal step size ρ ∈ [0, 1], repeat thefollowing steps.Propose

X̃n (ti) = G(

Xn (ti−1) , ρW (ti−1 : ti ; ω) +√

1 − ρ2Z (ti−1 : ti ; ω))

where Xn (ti) = G (Xn (ti−1) , W (ti−1 : ti ; ω)) , and W ⊥ Z .Accept X̃n (ti) with probability

1 ∧λ̄(φk , X̃n (ti)

)

λ̄ (φk , Xn (ti))

where λ (φ, x) = exp (−φΛ(x)) is the unnormalised weight function.


The Filtering Problem Resampling Intervals

Resampling Intervals

small resampling intervals lead to an unreasonable increase in thecomputational effort

large resampling intervals make the algorithm fail

the ESS can be used as criterion for choosing the resampling time

adapted resampling time can be used

ESS evolution in time/observation noise


The Filtering Problem Results

DA Solution for DA periods: 1 ETT and 0.2 ETT

(a) ux (b) uy (c) psi (d) q

Figure: DA: obs spde, period 0.2 ett




Figure: DA: obs pde, period 0.2 ett


Figure: DA: obs pde, period 1 ett



Number of tempering steps/Average MCMC steps


. Sequential DA

Space-Time Particle Filter

Assume that there exists an increasing sequence of sets {Ak,j}τk,dj=1 , with

Ak,1 ⊂ Ak,2 ⊂ ∙ ∙ ∙ ⊂ Ak,τk,d = {1 : d}, for some integer 0 < τk,d ≤ d , such thatwe can factorize:

g(xk , yk )f (xk−1, xk ) =τk,d∏

j=1

αk,j(yk , xk−1, xk (Ak,j)),

for appropriate functions αk,j(∙), where xk (A) = {xk (j) : j ∈ A} ∈ R|A|.

Example:

Xn(j) =j−1∑

i=1

βd−j+i+1(Xn(i)) +d∑

i=j

β̄i−j+1(Xn−1(i)) + �jn

Yn(j) = Xn(j) + ξjn

where �n(j)i.i.d.∼ N (0, σ2x ) and ξn(j)

i.i.d.∼ N (0, σ2y ), j ∈ {1, . . . , d}.

Beskos, CD, Jasra, Kamatani, Zhou, A Stable Particle Filter inHigh-Dimensions, 2017.


. Sequential DA

Within a sequential Monte Carlo context, one can think of augmenting thesequence of distributions of increasing dimension

X1:k |Y1:k , 1 ≤ k ≤ n,

moving from Rd(k−1) to Rdk , with intermediate laws on Rd(k−1)+|Ak,j |, forj = 1, . . . , τk,d .

This holds when:• one can obtain a factorization for the prior term f (xk−1, xk ) by marginalisingover subsets of co-ordinates.• the likelihood component g(xk , yk ) can be factorized when the modelassumes a local dependence structure for the observations.


. Sequential DA

For j = 1 to τd − 1

◦ Move particle according to qk+1,j(xk+1(Ak+1,j)|xk , xk+1(Ak+1,j−1)).

◦ weight the particle using αk+1,j (yk+1,xk ,xk+1(Ak+1,j−1))qk+1,j (xk+1(Ak+1,j )|xk ,xk+1(Ak+1,j−1)) and(possibly) resample from it.

Remarks.• Since particle filters work well with regards to the time parameter (they aresequential), we exploit the model structure in to build up a particle filter inspace and time.• We break the k -th time-step of the particle filter into τk,d space-steps andrun a system of N independent particle filters for these steps.• It is necessary that the factorisation is such that allows for a gradualintroduction of the ‘full’ likelihood term g(xk , yk ) along the τk,d steps. Forinstance, trivial choices like

αk,j =

∫f (xk−1, xk )dxk (j + 1 : d)/

∫f (xk−1, xk )dxk (j : d), 1 ≤ j ≤ d − 1,

and

αk,d =(

f (xk−1, xk )/∫

f (xk−1, xk )dxk (d))

g(xk , yk )

are ineffective, as they only introduce the complete likelihood term in the laststep.


. Sequential DA

Numerical Test

Let Xn ∈ Rd be such that we have X0 = 0d (the d-dimensional vector of zeros)and

Xn(j) =j−1∑

i=1

βd−j+i+1Xn(i) +d∑

i=j

βi−j+1Xn−1(i) + �n

where �ni.i.d.∼ N (0, σ2x ) and β1:d are known static parameters. For the

observations, we setYn = Xn + ξn

where ξn(j)i.i.d.∼ N (0, σ2y ), j ∈ {1, . . . , d}.

Comparison between the SIR algorithm and the STPF. Parameters:

• σ2x = σ2y = 1, n = 1000, d-dimensional observations, d ∈ {16, 128, 1024}.

• Both filters use the model transitions as the proposal and the likelihoodfunction as the potential and adaptive resampling.• STPF: N = 100 and Md = d• SIR algorithm NMd particles.


. Sequential DA

The averages of estimators for the posterior mean of the first co-ordinateXn(1) given all data up to time n are illustrated below. The SIR collapseswhen the dimension becomes moderate or large. No meaningful estimateswhen d = 1024 (the estimates completely lose track of the observations andthe analytical mean). The STPF performs reasonably well in all three cases.

Standard Particle Filter Space−Time Particle Filter

0

5

10

15

−10

−5

0

5

−2.5

0.0

2.5

5.0

d = 16d = 128

d = 1024

0 250 500 750 1000 0 250 500 750 1000Time

Mea

n of

esti

mat

or fo

r X(1

)

Observation Kalman Filter Estimator

Figure: Mean of estimators of Xn(1) for across 100 runs.Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 51 / 55

. Sequential DA

The ESS (calculated over global filter for STPF) scaled by the number ofparticles for each time step of the two algorithms. The standard filter strugglessignificantly even in the case d = 16 and collapses when d = 128. Theperformance of the STPF is deteriorating (but not collapsing) when thedimension increases, due to the path degeneracy effect. However, even ford = 1024, it still retains an acceptable level of ESS.

d = 16

d = 128

d = 1024

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0 250 500 750 1000Time

ESS

(scale

d by t

he nu

mber

of pa

rticles

)

Standard Particle Filter Space−Time Particle Filter

Figure: Effective Sample Size plots from a single run.


. Sequential DA

The variance per time step for the estimators of the posterior mean of the firstco-ordinate Xn(1) (given the data up to time n) across 100 runs:

d = 16

d = 128

d = 1024

0.01

1.00

0.01

1.00

0.01

1.00

0 250 500 750 1000Time

Varia

nce o

f esti

mator

for X

(1)Standard Particle Filter Space−Time Particle Filter

Figure: Logarithmic scale variance.


. Jittering

Jittering

Procedure employed to reduce sample degeneracy.

The particles are moved using a suitable chosen kernel

The moves are controlled so that the size of the (stochastic) perturbationremains of the same order as the particle filter error (1/

√N)

D. C., Joaquin Miguez, Nested particle filters for online parameter estimationin discrete-time state-space Markov models, http://arxiv.org/abs/1308.1883.

D. C., Joaquin Miguez, Uniform convergence over time of a nested particlefiltering scheme for recursive parameter estimation in state–space Markovmodels, https://arxiv.org/abs/1603.09005.


. Final Remarks

Particle filters/sequential Monte Carlo methods are theoretically justifiedalgorithms for approximating the state of dynamical systems partially(and noisily) observed.

Standard particle filters do NOT work in high dimensions.

Properly calibrated and modified, particle filters can be used to solve highdimensional problems (see also the work of Peter Jan van Leeuwen,Roland Potthast, Hans Kunsch).

Important parameters: initial condition, number of particle, number ofobservations, correction times, observation error, etc.

One can use methodology to assess the reliability of ensembleforecasting system.


.Part 1: Theoretical ConsiderationsWhat is stochastic filtering ?Particle filtersThe classical/standard/bootstrap/garden-variety particle filter

The Stochastic filtering ProblemThe filtering problem

The Filtering ProblemFramework: discrete/continuous timeApplication to high-dimensional problemsWhy is the high-dimensional problem hard ?What is DA ?A stochastic transport modelInitial conditionAdd-on techniquesMethodology to calibrate the noiseImplementation issuesSIR failsTemperingResampling IntervalsResultsResults

.Sequential DAJitteringFinal Remarks

Documents

Particle Filters in High Dimensions - IRISA · 2018-06-20 · Bain, A., DC, Fundamentals of Stochastic Filtering, Series: Stochastic Modelling and Applied Probability, Vol. 60, Springer