Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
Data assimilation – Schrödinger’s perspective
Sebastian Reich (www.sfb1294.de)
Universität Potsdam/ University of Reading
IMS NUS, August 30, 2018
Universität Potsdam/ University of Reading 1
Core components of DA
Interacting Particle Systems
Coupling of Measures
Mean Field Equations
Universität Potsdam/ University of Reading 2
Entropic couplings
Universität Potsdam/ University of Reading 3
Ensemble prediction I
Ensemble prediction system with M members:d
dtZit
= f (Zit) , Zi0 ∼ π0 , i = 1, . . . ,M .
Source: ECMWF
Universität Potsdam/ University of Reading 4
Ensemble prediction II
Source: The quiet revolution of numerical weather prediction, Nature, 2015
Universität Potsdam/ University of Reading 5
Ensemble–based data assimilation
Continuous–in–time assimilation of precipitation data yt:
d
dtZit
= f (Zit) + α1Qt(Z
it− Zt) + α2Kt(yt − h(Zi
t))
Additional terms:
É Inflation: α1 > 0, Qt ∈ RNz×Nz spd,
Zt =1
M
∑
i
Zit
É Nudging: α2 > 0, gain matrix Kt ∈ RNz×Ny , forward operator h.
Universität Potsdam/ University of Reading 6
SDEs & inflation I
Forward SDEdZ+
t = ft(Z+t )dt + γ1/2dW+
t ,
X+0 ∼ π0, t ∈ [0,T], W+
t standard Brownian motion forward in time.
Generates probability measure P[0,T] over C([0,T],RNz ) with marginaldensities πt, i.e. Zt ∼ πt.
The same measure is generated by backward SDE
dZ−t
= bt(Z−t
)dt + γ1/2dW−t,
W−t Brownian motion backward in time, X−T ∼ πT .
It holds thatbt(z) = ft(z)− γ∇z logπt(z) .
Universität Potsdam/ University of Reading 7
SDEs & inflation II
Fokker-Planck equation for marginals:
∂tπt = −∇z · (πtft) +γ
2∆zπt = −∇z · (πtbt) −
γ
2∆zπt
= −∇z · (πtut)
with
ut(z) =1
2(ft(z) + bt(z)) = ft(z)−
γ
2∇z logπt(z) .
Replace forward SDE by mean field equation
d
dtZt = ft(Zt)−
γ
2∇z logπt(Zt) , Z0 ∼ π0 .
Remark. Generates path measure Q[0,T] which is different from SDEmeasure P[0,T]; only marginals πt agree!
Universität Potsdam/ University of Reading 8
SDEs & inflation III
Lagrangian interacting particles (Gaussian approximation to πt):
d
dtZit
= ft(Zit) +
γ
2(Pt)
−1(Zit− Zt) ,
Zi0 ∼ π0, i = 1, . . . ,M, empirical covariance matrix
Pt =1
M− 1
∑
i
(Zit− Zt)(Zi
t− Zt)T .
Connection to inflation:
Qt = (Pt)−1, α1 = γ/2 .
Universität Potsdam/ University of Reading 9
References
É Daley, R., Atmospheric Data Analysis, Cambridge University Press,1991.
É Nelson, E., Dynamical Theories of Brownian Motion, PrincetonUniversity Press, 1967.
É del Moral, P., Mean field simulation for Monte Carlo integration, CRCPress, 2013.
É SR, Cotter, J., Probabilistic Forecasting and Bayesian DataAssimilation, Cambridge University Press, 2015.
É Law, K. et al., Data Assimilation – A Mathematical Introduction,Springer, 2015.
É Qiang Liu, Stein Variational Gradient Descent as Gradient Flow,arXiv:1704.07520v2, 2017.
Universität Potsdam/ University of Reading 10
Recap: Assimilation of precipitation data
Continuous–in–time assimilation of precipitation data yt:
d
dtZit
= f (Zit) + α1Qt(Z
it− Zt) + α2Kt(yt − h(Zi
t))
É Inflation: α1 > 0, Qt ∈ RNz×Nz spd,
Zt =1
M
∑
i
Zit
É Nudging: α2 > 0, gain matrix Kt ∈ RNz×Ny , forward operator h.
Universität Potsdam/ University of Reading 11
SDEs & nudging I
Given a likelihood function
L(z[0,T]) := exp
�
−∫ T
0Vt(zt)dt
�
.
For example
Vt(z) =β
2‖h(z)− yt‖2 .
Bayes theorem (Radon–Nikodym):
dbP[0,T]
dP[0,T](z[0,T]) :=
L(z[0,T])
P[0,T][L].
The measure bP[0,T] solves the filtering/smoothing problem of SDEinference.
Universität Potsdam/ University of Reading 12
SDEs & nudging II
Mean–field formulation:
dbZt =�
ft(bZt) + Pt∇zψt(bZt)
dt +p
γdWt
with the potential ψt satisfying the elliptic PDE
∇z · (bπtPt∇zψt) = bπt(Vt − Vt)
bZ0 ∼ bπ0 = π0, Vt = bπt[Vt], Pt = cov (bZt).
Universität Potsdam/ University of Reading 13
SDEs & nudging III
If bπt Gaussian and h(z) = Hz in Vt, then
Pt∇zψt(z) = βPtHT
yt −Hz +HZt
2
!
.
Compare to nudging scheme:
α2Kt(yt − Hz) ,
i.e., Kt = PtHT, β = α2, but innovation different.
Universität Potsdam/ University of Reading 14
Combining nudging and inflation
Ensemble Kalman–Bucy filter (Gaussian approximation to πt):
d
dtZit
= ft(Zit) +
γ
2(Pt)
−1(Zit− Zt) + βKt
yt −h(Zi
t) + ht
2
!
Zi0 ∼ π0, i = 1, . . . ,M, empirical covariance matrices
Pt =1
M− 1
∑
i
(Zit− Zt)(Zi
t− Zt)T ,
Kt =1
M− 1
∑
i
(Zit− Zt)(h(Zi
t)− ht)T .
Remark. Feedback particle filter for likelihood with
Vt(z)dt⇒1
2‖h(z)‖2dt − h(z)Tdyt .
Universität Potsdam/ University of Reading 15
References
É Moser, J., On the volume elements on a manifold, Trans. Amer. Math.Soc., 1965.
É SR, A dynamical systems framework for intermittent dataassimilation, BIT, 2010.
É Daum, F., Huang, J., Particle filter for nonlinear filters, ICASSP, IEEE,2011.
É Bergemann, K, SR, An ensemble Kalman–Bucy filter forcontinuous–time data assimilation, Meteorolog. Zeitschrift, 2012.
É Yang, T. et al., Feedback particle filter, IEEE Trans. Autom. Control,2013.
É Taghvaei, A. et al., Kalman filter and its modern extensions for thecontinuous–time nonlinear filtering problem, J. Dyn. Sys., Meas., andControl, 2018.
É de Wiljes, et al., Long–time stability and accuracy of the ensembleKalman–Bucy filter for fully observed processes and smallmeasurement noise, SIAM J. Appl. Dyn. Sys., 2018.
Universität Potsdam/ University of Reading 16
Discrete–time observations I
StateState StateModel Model
Data
Assimilation
Data
Assimilation
Data
Assimilation
time
Universität Potsdam/ University of Reading 17
Discrete–time observations II
Discrete–time observations:
ytn = h(Ztn) +R1/2 Ξtn , n = 1, . . . ,N .
Likelihood function:
L(z[0,T]) := exp
�
−1
2
∑
n
(ytn − h(ztn))TR−1(ytn − h(ztn))
�
.
Bayes:dbP[0,T]
dP[0,T](Z+
[0,T]) :=L(Z+
[0,T])
P[0,T][L].
Universität Potsdam/ University of Reading 18
Discrete–time observations III
For simplicity: single observation, i.e.
N = 1, R = I, t1 = T, L(z) =1
2‖yT − h(z)‖2 .
But keep recursive nature of sequential DA in mind!
Four main players:
É last analysis: π0
É forecast based on last analysis: πTÉ new analysis at time t = T (Bayes, filtering distribution): bπTÉ smoothing distribution at t = 0: bπ0
Universität Potsdam/ University of Reading 19
Discrete–time observations IV
⇡0 ⇡1Prediction
Filtering
b⇡1b⇡0
Smoothing
Optimal Control
Data
Timet = 0 t = 1
Schrödinger
y1
Universität Potsdam/ University of Reading 20
Example: Gaussian mixture I
π1 is a Gaussian mixture, Gaussian likelihood, bπ1 weighted Gaussianmixture.
prior
-1 -0.5 0 0.5 1
space
0
0.02
0.04
0.06
0.08
0.1
-4 -2 0 2 4
space
0
0.1
0.2
0.3
0.4
0.5prediction
smoothing
-1 -0.5 0 0.5 1
space
0
0.05
0.1
0.15
0.2
-4 -2 0 2 4
space
0
0.5
1
1.5filtering
Universität Potsdam/ University of Reading 21
Example: Gaussian mixture II
-2 -1.5 -1 -0.5 0 0.5 1 1.5
space
0
0.5
1
1.5
2
2.5optimal control proposals
-2 -1.5 -1 -0.5 0 0.5 1 1.5
space
0
0.5
1
1.5
2
2.5Schroedinger proposal
Applied to smoothing PDF bπ0. Applied to π0 directly!
Universität Potsdam/ University of Reading 22
Example: Brownian dynamics
Scalar Brownian dynamics under a double well potential (γ = 0.5):
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
variable z
0
2
4
6
8
10
po
ten
tia
l
initial mean
observation
-5 0 5
0
0.5
1
1.5
210
4 last analysis
-5 0 5
0
1
2
3
410
4 forecast
-5 0 5
0
1
2
3
410
4 smoother
-5 0 5
0
2
4
610
4new analysis
The forecast and the new analysis at T = 0.5 are nearly singular withrespect to each other.
The relation between the last analysis (π0) and the smoother (bπ0) issomewhat better.
Universität Potsdam/ University of Reading 23
Discrete–time observations III
Forward-backward smoother iteration:
É Forward:dZ+
t = f (Z+t )dt +
p
γW+t ,
Z+0 ∼ π0. Yields πt.
É Backward:
dbZ−t
= f (bZ−t
)dt − γ∇z logπt(bZ−t )dt +p
γW−t,
with bZ−T ∼ bπT andbπT(z) ∝ L(z)πT(z) .
Yields bπt.
Smoother:
dbZ+t = f (bZ+
t )dt + γ∇z logbπt
πt(bZ+
t )dt +p
γW+t
bZ+0 ∼ bπ0, bZ+
T ∼ bπT .
Universität Potsdam/ University of Reading 24
Example: Brownian dynamics II
Scalar Brownian dynamics under a double well potential (γ = 0.5):
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
variable z
0
2
4
6
8
10
po
ten
tia
l
initial mean
observation
-5 0 5
0
0.5
1
1.5
210
4 last analysis
-5 0 5
0
1
2
3
410
4 forecast
-5 0 5
0
1
2
3
410
4 smoother
-5 0 5
0
2
4
610
4new analysis
The forward smoother SDE links the smoother measure bπ0 with bπT .
Still requires transforming π0 into bπ0 (but now at t = 0).
Universität Potsdam/ University of Reading 25
Schrödinger problem of DA I
A different perspective on sequential DA:
Schrödinger problem. Find the measureeP[0,T] which minimises the Kullback-Leiblerdivergence
eP[0,T] = arg infQ�P
KL(Q[0,T]||P[0,T])
subject to the constraints
eπ0 = q0 = π0, eπT = qT = bπT .
-5 0 5
0
0.5
1
1.5
210
4 last analysis
-5 0 5
0
1
2
3
410
4 forecast
-5 0 5
0
1
2
3
410
4 smoother
-5 0 5
0
2
4
610
4new analysis
The measure eP[0,T] is generated by a controlled SDE
deZ+t = f (eZ+
t )dt + ut(eZ+t )dt +
p
γdW+t .
Universität Potsdam/ University of Reading 26
Schrödinger problem of DA II
Find an initial distribution ϕ+0 and its evolution ϕ+
t under the forward SDE
dZ+t = f (Z+
t ) dt +p
γdW+t
such that the associated backward SDE
dZ−t
=�
f (Z−t
)− γ∇z logϕ+t (Z−
t)�
dt +p
γdW−t
with final condition ϕ−T := bπT leads to marginals ϕ−t such that
ϕ−0 = π0 .
Then the control
ut = γ∇z logϕ−t
ϕ+t
solves the Schrödinger problem.
Universität Potsdam/ University of Reading 27
Example I: Lorenz–63
Starting point:
(i) Euler-Maruyama discretization
Zn+1 = Zn + f (Zn)∆t + (γ∆t)1/2Ξn
giving rise to transition kernel
q+(zn+1|zn) .
(ii) Given M samples zjn from Zn and M samples zi
n+1 from Zn+1 definethe M× M matrix Q by
qij = q+(zin+1|z
jn
) .
(iii) Observations at time tn+1 lead to importance weights win+1.
Universität Potsdam/ University of Reading 28
Example II: Lorenz–63
Schrödinger problem:
P∗ = argminP∈Π
KL(P||Q) , KL(P||Q) :=M∑
i,j=1
pij logpij
qij,
subject to
Π =
(
P ∈ RM×M : pij ≥ 0,M∑
i=1
pij = 1/M,M∑
j=1
pij = wi
)
Application to DA
a) Schrödinger resample: Use P∗ to sample from bπn+1
P[bZjn+1] = zi
n+1] = Mp∗ij.
b) Schrödinger transform: Set
bzjn+1 = M
M∑
i=1
zin+1p
∗ij
+ (γ∆t)Ξj .
Universität Potsdam/ University of Reading 29
Example III: Lorenz–63
0.05 0.1 0.15 0.2
1/M
10-3
10-2
RM
SE
RMS errors as a function M
particle filter
Schrodinger transform
Schrodinger resample
EnKBF
Stochastic Lorenz–63, observations at every time–step, observation errorscaled with inverse time–step.
Universität Potsdam/ University of Reading 30
References
É Schrödinger, E., Über die Umkehrung der Naturgesetze,Sitzungsberichte der Preuss. Phys. Math. Klasse, 1932.
É Yongxin Chen et al., On the relation between optimal transport andSchrödinger bridges: A stochastic control viewpoint, J. Optim. Theoryand Appl., 2016.
É Fearnhead, P., Künsch, H.R., Particle filters and data assimilation,Annual Review of Statistics and Its Application, 2018.
É Guarniero et al., The iterated auxiliary particle filter, J. AmericanStatist. Assoc., 2017.
É van Leeuwen, P.-J., Nonlinear data assimilation, Frontiers in AppliedDynamical Systems, Vol. 2, 2015.
É Peyre, G., Cuturi, M., Computational Optimal Transport,arXiv:1803.00567, 2018.
É SR, Data assimilation, Acta Numerica, 2019, preliminary version onarXiv 1807.08351.
Universität Potsdam/ University of Reading 31
OT meets IS I
Importance sampling (IS): Available realisations ZiT∼ πT with
importance weights
wi ∝bπT
πT(Zi
T) .
Optimal transport (OT): Instead of resampling, findcoupling/transformation
bZT = ∇zψ(ZT) ,
ZT ∼ πT and bZ ∼ bπT .
More abstractly,
bZT(a) =
∫
ZT(a′)δ(a′ − ∇aψ(a))da′ ,
where A is some random reference variable. For example, A = ZT .
Universität Potsdam/ University of Reading 32
OT meets IS II
Replace the integral by a sum and formally write
bZjT =
M∑
i=1
ZiTdij
RequiresM∑
i=1
dij = 1,1
M
∑
j
dij = wi .
Select an "optimal" transformation through maximising correlation
V(D) =1
M
∑
ij
dijZiT· ZjT =
1
M
∑
j
bZjT · Z
jT .
In addition, either dij ≥ 0 (Ensemble Transform Particle Filter) or
1
M− 1
M∑
i=1
(bZiT− bz̄T)(bZi
T− bz̄T)T =
M∑
i=1
wi(ZiT− bz̄T)(Zi
T− bz̄T)T
(Nonlinear Ensemble Transform Filter).
Universität Potsdam/ University of Reading 33
Numerical example I
Lorenz-63 model, first component observed infrequently (∆t = 0.12) andwith large measurement noise (R = 8):
Figure: RMSEs for various second-order accurate LETFs compared to the ETPF, theESRF, and the SIR PF as a function of the sample size, M.
Universität Potsdam/ University of Reading 34
Numerical example II
Hybrid filter: P := PESRF(α) PETPF(1− α).
Figure: RMSEs for hybrid ESRF (α = 0) and 2nd-order corrected NETF/ETPF (α = 1)as a function of the sample size, M.
Universität Potsdam/ University of Reading 35
References
É Evensen, G., Data assimilation – the ensemble Kalman filter,Springer, 2009.
É SR, A nonparametric ensemble transform method for Bayesianinference, SIAM J. Sci. Comput., 2013.
É SR, Cotter, J., Probabilistic Forecasting and Bayesian DataAssimilation, Cambridge University Press, 2015.
É Tödter J., Ahrens, B., A second–order exact ensemble square rootfilter for nonlinear data assimilation, Month. Weather Rev., 2015.
É Chustagulprom, N. et al., A hybrid ensemble transform particle filterfor nonlinear and spatially extended systems, SIAM/ASA J. UQ, 2016.
É Acevedo, W., et al., Second–order accurate ensemble transformparticle filters, SIAM J. Sci. Comput., 2017.
É Fearnhead, P., Künsch, H.R., Particle filters and data assimilation,Annual Review of Statistics and Its Application, 2018.
Universität Potsdam/ University of Reading 36
Summary
É Continuous–in–time DA naturally leads to various interacting particlesystems.
É Schrödinger problem provides an "optimal" mathematical frameworkfor sequential DA with discrete–in–time observations.
É Numerical implementation nontrivial; good drift corrections can bederived using Gaussian approximations or kernel methods.
É Coupling arguments are central to derivation of interacting particlesystems.
É Relevant to rare event simulations, optimal control problems andderivative–free optimization.
Universität Potsdam/ University of Reading 37
The End
(source: J.B. Handelsman, New Yorker, 05/31/1969)
Universität Potsdam/ University of Reading 38
Collaborators
É Walter Acevedo
É Kay Bergemann
É Yuan Cheng
É Nawinda Chustagulprom
É Colin Cotter
É Jana de Wiljes
É Prashant Mehta
É Wilhelm Stannat
É Amari Taghvaei
Universität Potsdam/ University of Reading 39