Click here to load reader

Particle Filters in High Dimensions - IRISA · 2018-06-20 · Bain, A., DC, Fundamentals of Stochastic Filtering, Series: Stochastic Modelling and Applied Probability, Vol. 60, Springer

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • Particle Filters in High Dimensions

    Dan Crisan

    Imperial College London

    Workshop - Simulation and probability: recent trendsThe Henri Lebesgue Center for Mathematics

    5-8 June 2018Rennes

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 1 / 55

  • . Part 1: Theoretical Considerations

    Stochastic Filtering

    Particle filters/ Sequential Monte Carlo methods

    Convergence Result

    Final remarks

    ◦ DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.

    ◦ DC, A Doucet, A survey of convergence results on particle filtering methodsfor practitioners, IEEE Transactions on signal processing, 2002.

    ◦ A Doucet, AM Johansen, A tutorial on particle filtering and smoothing:Fifteen years later, The Oxford handbook of nonlinear filtering, 2011.

    ◦ P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.

    ◦ A. Bain, DC, Fundamentals of Stochastic Filtering, Springer, 2009.

    ◦ DC, B Rozovskii, The Oxford handbook of nonlinear filtering, OxfordUniversity Press, 2011.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 2 / 55

  • . What is stochastic filtering ?

    Stochastic Filtering: The process of using partial observations and astochastic model to make inferences about an evolving dynamical system.

    X the signal process - “hidden component”Y the observation process - “the data”

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 3 / 55

  • . What is stochastic filtering ?

    The filtering problem : Find the conditional distribution of the signal Xt givenYt = σ(Ys, s ∈ [0, t ]), i.e.,

    πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

    Discrete framework: {Xt , Yt}t≥0 Markov process

    The signal process

    • {Xt}t≥0 Markov chain, X0 ∼ π0 (dx0)

    • P (Xt ∈ dxt |Xt−1 = xt−1) = Kt (xt−1, dxt) = ft(xt |xt−1)dt ,

    • Example: Xt = b (Xt−1) + σ (Xt−1) Bt , Bt ∼ N (0, 1) i.i.d.

    The observationprocess

    • P(Yt ∈ dyt |X[0,t] = x[0,t], Y[0,t−1] = y[0,t−1]

    )= P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dy

    • Example: Yt = h (Xt) + Vt , Vt ∼ N (0, 1) i.i.d.

    where X[0,t] , (X0, ..., Xt) , x[0,t] , (x0, ..., xt) .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 4 / 55

  • . What is stochastic filtering ?

    Notation:• posterior measure: the conditional distribution of the signal Xt given Yt

    πt (A) = P(Xt ∈ A|Yt), t ≥ 0, A ∈ B(Rd ).

    • predictive measure: the conditional distribution of the signal Xt given Yt−1

    pt (A) = P(Xt ∈ A|Yt−1), t ≥ 0, A ∈ B(Rd ).

    • If μ is a measure and f is a function, then μ (f ) ,∫

    f (x)μ (dx) .• If f is a function and k is a kernel, then kf (x) ,

    ∫f (y)k (x , dy) .

    • If μ is a measure and k is a kernel, then μk (A) ,∫

    μ (dx) k (x , A) .

    Bayes’ recursion.

    Prediction pt = πt−1KtUpdating πt = gt ? pt

    In other words, dπtdpt = C−1t gt , where Ct ,

    ∫Rd gt (yt , xt) pt (dxt).

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 5 / 55

  • . Particle filters

    Particle filters/Sequential Monte Carlo Methods:

    1. Class of approximations:

    SMC(aj (t)︸ ︷︷ ︸weight

    , v1j (t) , . . . , vdj (t)

    ︸ ︷︷ ︸position

    )Nj=1

    πt πNt =∑N

    j=1 aj (t) δvj (t)

    2. The law of evolution of the approximation:

    SMC

    πNt−1

    mutation︷︸︸︷−→ pNt

    selection︷︸︸︷−→ πNt

    3. The measure of the approximating error:

    supϕ∈B(Rd )

    E [|πnt (ϕ) − πt(ϕ)|] , π̂t − π̂nt .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 6 / 55

  • . The classical/standard/bootstrap/garden-variety particle filter

    πn = {πn(t), t ≥ 0} the occupation measure of a system of weighted particles

    πn(0) =n∑

    i=1

    1n

    δxni −→ πn(t) =n∑

    i=1

    āni (t)δV ni (t).

    • DC, Particle Filters. A Theoretical Perspective, Sequential Monte CarloMethods in Practice, 2001.• P. Del Moral. Feynman-Kac Formulae: Genealogical and Interacting ParticleSystems with Applications. Springer, 2004.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 7 / 55

  • The Filtering Problem Framework: discrete/continuous time

    1. Initialisation [t = 0].

    For i = 1, ..., N, sample x (i)0 from π0,

    πN0 =1N

    N∑

    i=1

    δx (i)0.

    2. Iteration [t − 1 to t ].Let x (i)t−1, i = 1, . . . , n be the positions of the particles at time t − 1.

    πNt−1 =1N

    N∑

    i=1

    δx (i)t−1.

    Step 1.

    For i = 1, ..., n, sample x̄ (i)t from ft−1(xt |x(i)t−1)dxt .

    pNt =1N

    N∑

    i=1

    δx̄ (i)t.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 8 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Compute the (normalized) weight ā(i)t = gt(x̄(i)t )/(

    ∑nj=1 gt(x̄

    (j)t )).

    π̄Nt =N∑

    i=1

    ā(i)t δx̄ (i)t= gt ? pNt .

    Step 2.

    Replace each particle by ξ(i)t offsprings such that∑n

    i=1 ξ(i)t = n.

    [Sample with replacement n-times from x̄ (i)t , ]Denote the positions of the particles by x (i)t , i = 1, . . . , n.

    πNt =1N

    N∑

    i=1

    δx (i)t.

    Further details in:

    Bain, A., DC, Fundamentals of Stochastic Filtering, Series: StochasticModelling and Applied Probability, Vol. 60, Springer Verlag, 2009.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 9 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Theorem

    πn converges to π. Moreover

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

    .

    and√

    N(πN − π) converges to a measure valued process ū = {ūt , t ≥ 0}.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 10 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Notation:

    • Error(π, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|πNt (ϕ) − πt(ϕ)|]

    • Error(p, T , N) = supt∈[0,T ] sup{‖ϕ‖∞≤1} EY [|pNt (ϕ) − pt(ϕ)|]

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|πNt (ϕ) − πt(ϕ)|] ≤cT√N

    .

    Theorem

    For all T > 0, there exists cT such that

    Error(π, T , N) ≤cT√N

    , Error(p, T , N) ≤cT√N

    if and only if Error(π, 0, N) ≤ c0√N

    and, for all T > 0, there exists cT such that

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|pNt (ϕ) − πNt−1Kt(ϕ)|] ≤

    cT√N

    supt∈[0,T ]

    sup{‖ϕ‖∞≤1}

    EY [|πNt (ϕ) − π̄Nt (ϕ)|] ≤

    cT√N

    .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 11 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Proof.” ⇒ ”Immediate from the following two inequalities

    ∣∣∣pNt ϕ − π

    Nt−1Ktϕ

    ∣∣∣ ≤∣∣∣pNt ϕ − ptϕ

    ∣∣∣+∣∣∣πt−1(Ktϕ) − πNt−1(Ktϕ)

    ∣∣∣ ,

    ∣∣∣πNt ϕ − π̄

    Nt ϕ∣∣∣ ≤∣∣∣πNt ϕ − πtϕ

    ∣∣∣+∣∣∣πtϕ − π̄

    Nt ϕ∣∣∣

    where we used the fact that pt = πt−1Kt .” ⇐ ”Induction. The case t = 0 is assumed. The induction step is obtained asfollows: Since pt = πt−1Kt by the triangle inequality

    |pNt ϕ − ptϕ| ≤ |pNt ϕ − π

    Nt−1Ktϕ| + |π

    Nt−1Ktϕ − πt−1Ktϕ|.

    Also

    π̄Nt ϕ−πtϕ=pNt (ϕgt)

    pNt gt−

    pt(ϕgt)ptgt

    =−pNt (ϕgt)

    pNt gt × ptgt(pNt gt−ptgt)+

    (pNt (ϕgt)

    ptgt−

    pt(ϕgt)ptgt

    )

    ,

    and as |pNt (ϕgt)| ≤ ‖ϕ‖∞pNt gt ,

    ∣∣∣π̄Nt ϕ − πtϕ

    ∣∣∣ ≤

    ‖ϕ‖∞ptgt

    ∣∣∣pNt gt − ptgt

    ∣∣∣+

    1ptgt

    ∣∣∣pNt (ϕgt) − pt(ϕgt)

    ∣∣∣ .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 12 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Remarks:

    Particle filters are recursive algorithms: The approximation for πt andYt+1 are the only information used in order to obtain the approximation forπt+1. In other words, the information gained from Y1, ..., Yt is embeddedin the current approximation.

    The generic SMC method involves sampling from the prior distribution ofthe signal and then using a weighted bootstrap technique (or equivalent)with weights defined by the likelihood of the most recent observation data.

    Step 2 can be done by means of sampling with replacement (SIRalgorithm), stratified sampling, Bernoulli sampling,Carpenter-Clifford-Fearnhead-Whitley genetic algorithm, Crisan-LyonsTBBA algorithm. All these methods satisfy the convergence requirement.

    If d is small to moderate, then the standard particle filter can perform verywell in the time parameter n.

    Under certain conditions, the Monte Carlo error of the estimate of thefilter can be uniform with respect to the time parameter.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 13 / 55

  • The Filtering Problem Framework: discrete/continuous time

    Remarks:

    The function xk 7→ g(xk , yk ) can convey a lot of information about thehidden state, especially so in high dimensions. If this is the case, usingthe prior transition kernel f (xk−1, xk ) as proposal will be ineffective.It is then known that the standard particle filter will typically performpoorly in this context, often requiring that N = O(κd ).

    10−3.5

    10−3

    10−2.5

    10−2

    5 10 15 20 25 30DimensionW

    allclock

    timepertimestep

    (secon

    ds) Algorithm PF STPF

    Figure: Computational cost per time step to achieve a predetermined RMSE versusmodel dimension, for standard particle filter (PF) and STPF.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 14 / 55

  • The Filtering Problem Application to high-dimensional problems

    Why is the high-dimensional filtering problem hard ?

    A running example

    Using particle filers to solve high-dimensional filtering problems

    Final remarks

    Research partially supported by EPSRC grant EP/N023781/1.Numerical work done by Wei Pan (Imperial College London).

    ◦ A. Beskos, DC, A. Jasra, Ajay; K. Kamatani, Y. Zhou, Y A stable particle filter for aclass of high-dimensional state-space models. Adv. in Appl. Probab. 49 (2017).

    ◦ A. Beskos, DC, A. Jasra, On the stability of sequential Monte Carlo methods in highdimensions, Ann. Appl. Probab. 24 (2014).

    ◦ C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Numerically Modelling StochasticLie Transport in Fluid Dynamics, https://arxiv.org/abs/1801.09729

    ◦ C.J. Cotter, DC, D.D. Holm, W. Pan, I. Shevchenko, Sequential Monte Carlo forStochastic Advection by Lie Transport (SALT): A case study for the damped and forcedincompressible 2D stochastic Euler equation, in preparation.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 15 / 55

  • The Filtering Problem Why is the high-dimensional problem hard ?

    Consider

    Π0 = N (0, 1) (mean 0 and variance matrix 1).Π1 = N (1, 1) (mean 1 and variance matrix 1).Πd = N (d , 1) (mean d and variance matrix 1).d(Π0, Π1)TV = 2P [ |X | ≤ 1/2 ], X ∼ N(0, 1).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other.as d increases, it becomes increasingly harder to use standardimportance sampling, to construct a sample from Π3 by using a proposalfrom Π1, weighting it using

    dΠddΠ0

    and (possibly) resample from it.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 16 / 55

  • The Filtering Problem Why is the high-dimensional problem hard ?

    ConsiderΠ0 = N ((0, . . . , 0), Id ) (mean (0, . . . , 0) and covariance matrix Id ).Πd = N ((1, . . . , 1), Id ) (mean (1, . . . , 1) and covariance matrix Id ).d(Π0, Πd )TV = 2P [ |X | ≤ d/2 ], X ∼ N(0, 1).as d increases, the two measures get further and further apart, becomingsingular w.r.t. each other exponentially fast.it becomes increasingly harder to use standard importance sampling, toconstruct a sample from Πd by using a proposal from Π0.‘Moving’ from Π0 to Πd is equivalent to moving from a standard normaldistribution N (0, 1) to a normal distribution N (d , 1) (the total variationdistance between N (0, 1) and N (d , 1) is the same as that between Π1and Π2).

    Add-on techniques:

    • Tempering * • Optimal transport prior 7→posterior• Sequential DA in space * • Jittering *• Model Reduction (High 7→Low Res)* • Nudging• Hybrid models • Hamiltonian Monte Carlo• Informed priors • Localization

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 17 / 55

  • The Filtering Problem What is DA ?

    State estimation in Numerical Weather Prediction

    Data Assimilation at the UK Met Officeset of methodologies that combines past knowledge of a system in theform of a numerical model with new information about that system in theform of observations of that system.designed to improve forecasting, reduce model uncertainties and adjustmodel parameters.termen used mainly in the computational geoscience communitymajor component of Numerical Weather Prediction

    Variational DA: combines the model and the datathrough the optimisation of a given criterion(minimisation of a so-called cost-function).

    Ensemble based DA: uses a set of modeltrajectories/possible scenarios that areintermittently updated according to data and areused to infer the past, current or future position ofa system.

    Hurricane Irma forecast: a. ECMWF, b. USA Global Forecast

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 18 / 55

  • The Filtering Problem A stochastic transport model

    Consider a two dimensional incompressible fluid flow u defined on 2D-torusΩ = [0, Lx ] × [0, Ly ] modelled by the two-dimensional Euler equations withforcing and dampening. Let q = ẑ × curl u denote the vorticity of u, where ẑdenotes the z-axis. For a scalar field g : Ω → R, we write∇⊥ g = (−∂y g, ∂xg)

    T . Let ψ : Ω × [0,∞) → R denote the stream function.

    ∂t q + (u ∙∇) q = Q − rq

    u = ∇⊥ ψ

    Δψ = q.

    Q is the forcing term given by Q = 0.1 sin (8πx)

    r is a positive constant - the large scale dissipation time scale.

    we consider slip flow boundary condition ψ∣∣∂Ω

    = 0.

    evolution of Lagrangian fluid parcels

    dxtdt

    = u(xt , t) .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 19 / 55

  • The Filtering Problem A stochastic transport model

    Domain is [0, 1]2

    PDE System | SPDE System

    ∂tω + u ∙ ∇ω = Q − rω | dq + ū ∙ ∇qdt +∑

    i

    ξi ∙ ∇q ◦ dW it = (Q − rq) dt

    u = ∇⊥ψ | ū = ∇⊥ψ̃

    Δψ = ω | Δψ̃ = q

    Q = 0.1 sin (8πx), r = 0.01. Boundary Condition ψ|∂Ω = 0 and ψ̃∣∣∣∂Ω

    = 0.

    PDE SPDEGrid Resolution 512x512 64x64Numerical Δt 0.0025 0.01

    Spin-up 40 ett ett: eddy turnover time L/uL ≈ 2.5 time units.Numerical scheme: a mixed continuous and discontinuous Galerkin finiteelement scheme + an optimal third order strong stability preservingRunge-Kutta, [Bernsen et al 2006, Gottlieb 2005].

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 20 / 55

  • The Filtering Problem A stochastic transport model

    Initial configuration for the vorticity

    ωspin = sin(8πx) sin(8πy) + 0.4 cos(6πx) cos(6πy)

    + 0.3 cos(10πx) cos(4πy) + 0.02 sin(2πy) + 0.02 sin(2πx)(1)

    from which we spin–up the system until an energy equilibrium state seems tohave been reached.This equilibrium state, denoted by ωinitial, is then chosen as the initial condition.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 21 / 55

  • The Filtering Problem A stochastic transport model

    Plot of the numerical PDE solution at the initial time tinitial and itscoarse-grained version done via spatial averaging and projection of the finegrid stream-function to the coarse grid.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 22 / 55

  • The Filtering Problem A stochastic transport model

    Plot of the numerical PDE solution at the final time t = tinitial + 146 large eddyturnover times (ett). The coarse-graining is done via spatial averaging andprojection of the fine grid streamfunction to the coarse grid.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 23 / 55

  • The Filtering Problem A stochastic transport model

    Observations:u is observed on a subgrid of the signal grid (9 × 9 points)

    Yt (x) =

    {uSPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 1uPDEt (x) + αzx , zx ∼ N (0, 1) Experiment 2

    α is calibrated to the standard deviation of the true solution over a coarsegrid cell.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 24 / 55

  • The Filtering Problem Initial condition

    Initial Condition

    A good choice of the initial condition is esential for the successfulimplementation of the filter.

    In practice it is a reflection of the level of uncertainty of the estimate ofinitial position of the dynamical system.

    We use the initial condition is to obtain an ensemble which containparticles that are reasonably ‘close’ to the truth.Choice for the running example

    deformation - physically consistent with the system, casimirs preserved.We take a nominal value ωt0 and deform it using the following ‘modified’Euler equation:

    ∂tω + βi u(τi) ∙ ∇ω = 0 (2)

    where βi ∼ N (0, �), i = 1, . . . , Np are centered Gaussian weights with anapriori variance parameter �, and τi ∼ U (tinitial, t0) , i = 1, . . . , Np are uniformrandom numbers. Thus each u (τi) corresponds to a PDE solution in thetime period [tinitial, t0).

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 25 / 55

  • The Filtering Problem Initial condition

    Alternative choicesq + ζ where ζ is gaussian random field, doable but not physical, only worksfor q because it’s the least smooth of the three fields of interest . The otherfields are spatially smooth. also this breaks the SPDE well-posednesstheorem (function space regularity). Figure (ux,uy)

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 26 / 55

  • The Filtering Problem Initial condition

    directly perturb ψ, by ψ + ψ̄ where ψ̄ = (I − κΔ)−1 ζ invert ellipticoperator with boundary condition ψ̄ = 0. Figure (ux , uy)

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 27 / 55

  • The Filtering Problem Add-on techniques

    Model Reduction (High 7→Low Res)*

    Model reduction is a valuable methodology that can lead to substantialcomputational savings: For the current numerical example we perform andstate space order reduction from 2 × 5132 ≈ 0.5 × 106 to 2 × 652 = 8450.

    However if applied erroneously order reduction it can introduce large errors.

    Recall that the recursion formula for the conditional distribution of the signal

    pt = πt−1Kt πt = gt ? pt ,

    where dπtdpt = C−1t gt , where Ct ,

    ∫Rd gt (yt , xt) pt (dxt).

    Remark. The conditional distribution of the signal is a continuous function of(π0, g1, ..., gt , K1, ..., Kt). In other words if

    limε 7→0

    (πε0, gε1 , ..., g

    εt , K

    ε1 , ..., K

    εt ) = (π0, g1, ..., gt , K1, ..., Kt)

    in a suitably chosen topology and

    pεtΔ= πεt−1K

    εt π

    εt

    Δ= gεt ? p

    εt , (3)

    then limε 7→0 πεt = πt and limε 7→0 pεt = pt (again, in a suitably chosen topology).

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 28 / 55

  • The Filtering Problem Add-on techniques

    NB. Note that πεt is no long the solution of a filtering problem, but simply thesolution of the iteration (3)Order reduction can be theoretically justified through the continuity of theconditional distribution of the signal on (π0, g1, ..., gt , K1, ..., Kt). This is thecase when the order reduction is performed through a coarsening of the gridused for the numerical algorithm that approximates the dynamical system.

    Example: We use a Stochastic PDE defined on a coarser grid:

    ∂t q + (u ∙∇) q +∞∑

    k=1

    (ξk ∙ ∇) q ◦ dBkt = Q − rq

    u = ∇⊥ ψ

    Δψ = q.

    ξk are divergence free given vector fieldsξk are computed from the true solution by using an empirical orthogonalfunctions (EOFs) procedureBkt are scalar independent Brownian motions

    dxt = u(xt , t) dt +∑

    i

    ξi(xt) ◦ dWi(t) .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 29 / 55

  • The Filtering Problem Methodology to calibrate the noise

    The reason for this “stochastic parametrization” is grounded in solid physicalconsiderations, see

    D.D. Holm, Variational principles for stochastic fluids, Proc.Roy. Soc. A, 2015.

    dxt = uft (xt)dt

    dxt = uct (xt)dt +∑

    i

    ξi(xt) ◦ dWi(t)For each m = 0, 1, . . . , M − 1

    1 Solve dxfij (t)/dt = uft (x

    fij (t)) with initial condition x

    fij (mΔT ) = xij .

    2 Compute uct by low-pass filtering uft along the trajectory.

    3 Compute xcij (t) by solving dxcij (t)/dt = u

    ct (x

    fij (t)) with the same initial

    condition.4 Compute the difference Δxmij = x

    fij ((m + 1)ΔT ) − x

    cij ((m + 1)ΔT ), which

    measures the error between the fine and coarse trajectory.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 30 / 55

  • The Filtering Problem Methodology to calibrate the noise

    Having obtained Δxmij , we would like to extract the basis for the noise. Thisamounts to a Gaussian model of the form

    Δxmij√δt

    = Δ̃xij +N∑

    k=1

    ξkij ΔWkm,

    where ΔW km are i.i.d. Normal random variables with mean zero, variance 1.We estimate ξ by minimising

    E

    ijm

    Δxmij√δt

    − Δ̃xij −N∑

    k=1

    ξkij ΔWkm

    2

    ,

    where the choice of N can be informed by using Empirical OrthogonalFunction (EOFs).

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 31 / 55

  • The Filtering Problem Methodology to calibrate the noise

    Number of EoFs

    decide on a case by case basis

    too many will slow down the algorithm

    On the left: Number of EOFs 90% variance vs 50% (no change).On the right: Normalised spectrum of the Δx covariance operator, showingnumber of eofs required to capture 50%, 70% and 90% of total variance

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 32 / 55

  • The Filtering Problem Methodology to calibrate the noise

    Model reduction UQ pictures sytems 512x512 vs 128x128 vs 64x64

    (a) ux (b) uy

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 33 / 55

  • The Filtering Problem Methodology to calibrate the noise

    (a) psi (b) q

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 34 / 55

  • The Filtering Problem Methodology to calibrate the noise

    The aim of the calibration is to capture the statistical properties of the fastfluctuations, rather than the trajectory of the flow.

    Validation of stochastic parameterisation in terms of uncertaintyquantification for the SPDE.

    Performance of DA algorithm relies on the correct modelling of theunresolved scales.

    Evaluating the uncertainty arising from the choice of EOFs is be part ofthe particle filter implementation.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 35 / 55

  • The Filtering Problem Methodology to calibrate the noise

    Ensemble Distance from the “truth”

    Velocity Field d({

    q̂i , i = 1, . . . , Np}

    , ω, t)

    := mini∈{1,...,Np}‖ω(t)−q̂i (t)‖

    L2(D)

    ‖ω(t)‖L2(D)

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 36 / 55

  • The Filtering Problem Methodology to calibrate the noise

    Implementation issues

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 37 / 55

  • The Filtering Problem Implementation issues

    Number of Particles

    decide on a case by case basistoo few will not give a reasonable solutiontoo many will slow down the algorithm

    Picture Number of particles 225 (good) vs 500 (no change), 225 (good) vs 25(less good) 25 seems ok but we want as many as computationally feasible totune the algorithm

    (a) psi 225 vs 500 (b) psi 225 vs 25

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 38 / 55

  • The Filtering Problem SIR fails

    Classical Particle Filter fails !

    Histogram of weights

    Figure: example: loglikelihoods histogram, period 1 ett, 100 particles

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 39 / 55

  • The Filtering Problem Tempering

    Framework:

    {Xt}t≥0 Markov chain P (Xt ∈ dxt |Xt−1 = xt−1) = ft(xt |xt−1)dxt ,

    {Xt , Yt}t≥0 P (Yt ∈ dyt |Xt = xt) = gt(yt |xt)dyt

    A tempering procedure

    For i = 1 to d

    ◦ reweight the particle using g1dt and (possibly) resample from it

    ◦ move particles using an MCMC that leaves gkdt ftπ[0,t−1] invariant

    Beskos, DC, Jasra, On the stability of SMC methods in high dimensions, 2014.Kantas, Beskos, Jasra, Sequential Monte Carlo for inverse problems, 2014.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 40 / 55

  • The Filtering Problem Tempering

    Initialisation t=0: For i = 1, . . . , n, sample qi0 from π0.Iteration (ti−1, ti ]: Given the ensemble {Xn (ti−1)}n=1,...,N ,

    1 Evolve Xn (ti−1) using the SPDE to obtain Xn (ti) .2 Given X := {Xn (ti)}n=1,...,N , define normalised tempered weights

    λ̄n,i (φ, X ) :=exp (−φΛn,i)∑m exp (−φΛm,i)

    where the dependence on X means the Λn,i are computed using X .Define effective sample size

    ESSi (φ, X ) :=∥∥λ̄i (φ, X )

    ∥∥−1

    l2 .

    Set φ = 1.3 ... While ESSi (φ, X ) < Nthreshold do:

    (a) Find 1 − φ < φ′ < 1 such that ESSi (φ′ − (1 − φ) , X ) ≈ Nthreshold. Resampleaccording to λ̄n,i (φ′ − (1 − φ) , X ) and apply MCMC if required (i.e. whenthere are duplicated particles), to obtain a new set of particles X (φ′). Setφ = 1 − φ′ and X = X (φ′) .

    (b) If ESSi ≥ Nthreshold then STOP and go to the next filtering step with{(Xn (ti) , λ̄n,i

    )}n=1,...,N

    .

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 41 / 55

  • The Filtering Problem Tempering

    MCMC Mutation

    Algorithm

    Given the ensemble {Xn,k (ti)}n=1,...,N corresponding to the k ’th temperingstep with temperatureφk , and proposal step size ρ ∈ [0, 1], repeat thefollowing steps.Propose

    X̃n (ti) = G(

    Xn (ti−1) , ρW (ti−1 : ti ; ω) +√

    1 − ρ2Z (ti−1 : ti ; ω))

    where Xn (ti) = G (Xn (ti−1) , W (ti−1 : ti ; ω)) , and W ⊥ Z .Accept X̃n (ti) with probability

    1 ∧λ̄(φk , X̃n (ti)

    )

    λ̄ (φk , Xn (ti))

    where λ (φ, x) = exp (−φΛ(x)) is the unnormalised weight function.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 42 / 55

  • The Filtering Problem Resampling Intervals

    Resampling Intervals

    small resampling intervals lead to an unreasonable increase in thecomputational effort

    large resampling intervals make the algorithm fail

    the ESS can be used as criterion for choosing the resampling time

    adapted resampling time can be used

    ESS evolution in time/observation noise

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 43 / 55

  • The Filtering Problem Results

    DA Solution for DA periods: 1 ETT and 0.2 ETT

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs spde, period 0.2 ett

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 44 / 55

  • The Filtering Problem Results

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs pde, period 0.2 ett

    (a) ux (b) uy (c) psi (d) q

    Figure: DA: obs pde, period 1 ett

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 45 / 55

  • The Filtering Problem Results

    Number of tempering steps/Average MCMC steps

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 46 / 55

  • . Sequential DA

    Space-Time Particle Filter

    Assume that there exists an increasing sequence of sets {Ak,j}τk,dj=1 , with

    Ak,1 ⊂ Ak,2 ⊂ ∙ ∙ ∙ ⊂ Ak,τk,d = {1 : d}, for some integer 0 < τk,d ≤ d , such thatwe can factorize:

    g(xk , yk )f (xk−1, xk ) =τk,d∏

    j=1

    αk,j(yk , xk−1, xk (Ak,j)),

    for appropriate functions αk,j(∙), where xk (A) = {xk (j) : j ∈ A} ∈ R|A|.

    Example:

    Xn(j) =j−1∑

    i=1

    βd−j+i+1(Xn(i)) +d∑

    i=j

    β̄i−j+1(Xn−1(i)) + �jn

    Yn(j) = Xn(j) + ξjn

    where �n(j)i.i.d.∼ N (0, σ2x ) and ξn(j)

    i.i.d.∼ N (0, σ2y ), j ∈ {1, . . . , d}.

    Beskos, CD, Jasra, Kamatani, Zhou, A Stable Particle Filter inHigh-Dimensions, 2017.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 47 / 55

  • . Sequential DA

    Within a sequential Monte Carlo context, one can think of augmenting thesequence of distributions of increasing dimension

    X1:k |Y1:k , 1 ≤ k ≤ n,

    moving from Rd(k−1) to Rdk , with intermediate laws on Rd(k−1)+|Ak,j |, forj = 1, . . . , τk,d .

    This holds when:• one can obtain a factorization for the prior term f (xk−1, xk ) by marginalisingover subsets of co-ordinates.• the likelihood component g(xk , yk ) can be factorized when the modelassumes a local dependence structure for the observations.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 48 / 55

  • . Sequential DA

    For j = 1 to τd − 1

    ◦ Move particle according to qk+1,j(xk+1(Ak+1,j)|xk , xk+1(Ak+1,j−1)).

    ◦ weight the particle using αk+1,j (yk+1,xk ,xk+1(Ak+1,j−1))qk+1,j (xk+1(Ak+1,j )|xk ,xk+1(Ak+1,j−1)) and(possibly) resample from it.

    Remarks.• Since particle filters work well with regards to the time parameter (they aresequential), we exploit the model structure in to build up a particle filter inspace and time.• We break the k -th time-step of the particle filter into τk,d space-steps andrun a system of N independent particle filters for these steps.• It is necessary that the factorisation is such that allows for a gradualintroduction of the ‘full’ likelihood term g(xk , yk ) along the τk,d steps. Forinstance, trivial choices like

    αk,j =

    ∫f (xk−1, xk )dxk (j + 1 : d)/

    ∫f (xk−1, xk )dxk (j : d), 1 ≤ j ≤ d − 1,

    and

    αk,d =(

    f (xk−1, xk )/∫

    f (xk−1, xk )dxk (d))

    g(xk , yk )

    are ineffective, as they only introduce the complete likelihood term in the laststep.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 49 / 55

  • . Sequential DA

    Numerical Test

    Let Xn ∈ Rd be such that we have X0 = 0d (the d-dimensional vector of zeros)and

    Xn(j) =j−1∑

    i=1

    βd−j+i+1Xn(i) +d∑

    i=j

    βi−j+1Xn−1(i) + �n

    where �ni.i.d.∼ N (0, σ2x ) and β1:d are known static parameters. For the

    observations, we setYn = Xn + ξn

    where ξn(j)i.i.d.∼ N (0, σ2y ), j ∈ {1, . . . , d}.

    Comparison between the SIR algorithm and the STPF. Parameters:

    • σ2x = σ2y = 1, n = 1000, d-dimensional observations, d ∈ {16, 128, 1024}.

    • Both filters use the model transitions as the proposal and the likelihoodfunction as the potential and adaptive resampling.• STPF: N = 100 and Md = d• SIR algorithm NMd particles.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 50 / 55

  • . Sequential DA

    The averages of estimators for the posterior mean of the first co-ordinateXn(1) given all data up to time n are illustrated below. The SIR collapseswhen the dimension becomes moderate or large. No meaningful estimateswhen d = 1024 (the estimates completely lose track of the observations andthe analytical mean). The STPF performs reasonably well in all three cases.

    Standard Particle Filter Space−Time Particle Filter

    0

    5

    10

    15

    −10

    −5

    0

    5

    −2.5

    0.0

    2.5

    5.0

    d = 16d = 128

    d = 1024

    0 250 500 750 1000 0 250 500 750 1000Time

    Mea

    n of

    esti

    mat

    or fo

    r X(1

    )

    Observation Kalman Filter Estimator

    Figure: Mean of estimators of Xn(1) for across 100 runs.Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 51 / 55

  • . Sequential DA

    The ESS (calculated over global filter for STPF) scaled by the number ofparticles for each time step of the two algorithms. The standard filter strugglessignificantly even in the case d = 16 and collapses when d = 128. Theperformance of the STPF is deteriorating (but not collapsing) when thedimension increases, due to the path degeneracy effect. However, even ford = 1024, it still retains an acceptable level of ESS.

    d = 16

    d = 128

    d = 1024

    0.00

    0.25

    0.50

    0.75

    0.00

    0.25

    0.50

    0.75

    0.00

    0.25

    0.50

    0.75

    0 250 500 750 1000Time

    ESS

    (scale

    d by t

    he nu

    mber

    of pa

    rticles

    )

    Standard Particle Filter Space−Time Particle Filter

    Figure: Effective Sample Size plots from a single run.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 52 / 55

  • . Sequential DA

    The variance per time step for the estimators of the posterior mean of the firstco-ordinate Xn(1) (given the data up to time n) across 100 runs:

    d = 16

    d = 128

    d = 1024

    0.01

    1.00

    0.01

    1.00

    0.01

    1.00

    0 250 500 750 1000Time

    Varia

    nce o

    f esti

    mator

    for X

    (1)Standard Particle Filter Space−Time Particle Filter

    Figure: Logarithmic scale variance.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 53 / 55

  • . Jittering

    Jittering

    Procedure employed to reduce sample degeneracy.

    The particles are moved using a suitable chosen kernel

    The moves are controlled so that the size of the (stochastic) perturbationremains of the same order as the particle filter error (1/

    √N)

    D. C., Joaquin Miguez, Nested particle filters for online parameter estimationin discrete-time state-space Markov models, http://arxiv.org/abs/1308.1883.

    D. C., Joaquin Miguez, Uniform convergence over time of a nested particlefiltering scheme for recursive parameter estimation in state–space Markovmodels, https://arxiv.org/abs/1603.09005.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 54 / 55

  • . Final Remarks

    Particle filters/sequential Monte Carlo methods are theoretically justifiedalgorithms for approximating the state of dynamical systems partially(and noisily) observed.

    Standard particle filters do NOT work in high dimensions.

    Properly calibrated and modified, particle filters can be used to solve highdimensional problems (see also the work of Peter Jan van Leeuwen,Roland Potthast, Hans Kunsch).

    Important parameters: initial condition, number of particle, number ofobservations, correction times, observation error, etc.

    One can use methodology to assess the reliability of ensembleforecasting system.

    Dan Crisan (Imperial College London) Particle Filters in High Dimensions 7-8 June 2018 55 / 55

    .Part 1: Theoretical ConsiderationsWhat is stochastic filtering ?Particle filtersThe classical/standard/bootstrap/garden-variety particle filter

    The Stochastic filtering ProblemThe filtering problem

    The Filtering ProblemFramework: discrete/continuous timeApplication to high-dimensional problemsWhy is the high-dimensional problem hard ?What is DA ?A stochastic transport modelInitial conditionAdd-on techniquesMethodology to calibrate the noiseImplementation issuesSIR failsTemperingResampling IntervalsResultsResults

    .Sequential DAJitteringFinal Remarks