Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Issues of nonlinearity and non-gaussianity
A brief tour in non-Gaussian data assimilation
with a view to large geophysical systems
Marc Bocquet([email protected])
CEREA, Ecole des Ponts ParisTech / EDF R&DUniversite Paris-Est and INRIA
Thanks to Lin Wu for his valuable suggestions.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 1 / 28
Why not non-Gaussian (from the start) ?
Outline
1 Why not non-Gaussian (from the start) ?
2 Dealing with non-Gaussianity in a Gaussian framework
3 Bridging the gap between Gaussian and non-Gaussian data assimilation
4 Conclusions
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 2 / 28
Why not non-Gaussian (from the start) ?
Nonlinear statistical estimation: discrete approach
Dynamics, observation and statistics
xk+1 = Mk(xk)+wk and yk = Hk (xk )+vk
p(xk+1|xk ) = pW (xk −Mk(xk))︸ ︷︷ ︸transition kernel
, p(yk |xk) = pV (yk −Hk(xk ))︸ ︷︷ ︸likelihood
are known
Smoothing approach: Given Xk = {x1,x2, . . . ,xN} and Yk = {y1,y2, . . . ,yN}, recursiveapplication of Bayes and transition rules lead to
p(Xk |Yk ) =K
∏k=1
[pV (yk −Hk(xk))pW (xk+1 −Mk(xk ))
]p(x0)
Maximum a posteriori of ln(p(Xk |Yk)) defines the variational cost function.
Sequential approach (filtering problem):
Forecast (Chapman-Kolmogorov): p(xk+1|Yk ) =∫
dxkpW (xk+1 −Mk(xk ))p(xk |Yk ).
Analysis (Bayes): p(xk |Yk ) =pv(yk −H(xk ))p(xk |Yk−1)∫
dxkpV (yk −H(xk))p(xk |Yk−1).
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 3 / 28
Why not non-Gaussian (from the start) ?
Nonlinear statistical estimation: Fokker-Plank and Zakai equations
Continuous time, state space discretized as x = {x1,x2, . . . ,xN}†. Model equation
dxt = f(xt , t)dt +g(xt , t) ·dwt .
Fokker-Planck equation for the relative probability density function (Q = gtg†t )
∂pt
∂ t= −∇ · (f(x, t)pt)+
1
2 ∑i ,j
∂ 2
∂xi ∂xj
([Q]ij pt
)= LFP(pt) .
Adding the observation equation dyt = h(xt , t)dt +√
Rdvt ,leads to the Zakai (or normalized Kushner) equation:
dpt = LFP(pt)dt +pth†tR
−1t dyt .
From RN to P(R)⊗
N
, the maths exist but the complexity is too high !
Similar to the passage from classical to quantum physics . . .[Miller et al. 1999]
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 4 / 28
Why not non-Gaussian (from the start) ?
Numerics: particle filter
Monte Carlo approaches to solve these nonlinear filtering equations are called particlefilters. Most intuitive one: bootstrap filter
Particles {x1,x2, . . . ,xI } sample the pdf pt(x): pt(x) ≃ ∑Ii=1 wi δ (x−xi
t).
Propagation of the particles trough the model: pt+1(x) ≃ ∑Ii=1 wi δ (x−xi
t+1).
Analysis (weights altered by likelihood): w it+1 ∝ w i
tp(yt+1|xit+1).
When necessary, resampling of the ensemble, using the unbalanced weights wi .
t t+2p pp
t+1
observation
pt+1+−+ −
resampling
+
[Handschin et al. 1969, Gordon et al. 1993, Van Leeuwen 2002, Zhou 1996]
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 5 / 28
Why not non-Gaussian (from the start) ?
Curse of dimensionality
Problem: particle filters work fine up to N ∼ 4−8. When the state space and/or theobservation space get bigger, degeneracy/collapse of the weights: only one particleremains likely. This implies a failure of the filter as a modal estimator.Resampling helps but is not solving the issue.
0 0.2 0.4 0.6 0.8 10
0.05
0.1
0.15
0.2
Max
imum
wei
ght f
requ
ency N = 40 variables
Balanced Lorenz-96 particle filter
0 0.2 0.4 0.6 0.8 10
0.05
0.1
0.15
0.2
Max
imum
wei
ght f
requ
ency N = 80 variables
Degenerate Lorenz-96 particle filter
The required ensemble size scales exponentially with the state space size N, theobservation space size or the innovation variance.[Snyder, Bengtsson et al. 2007-2008]
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 6 / 28
Why not non-Gaussian (from the start) ?
Gaussian as maskeshift ?
Lorenz-96 (N = 10 variables, F = 8) experiment
10 100 1000 10000 1e+05Number of particles
0.4
0.8
1.6
3.2
Ana
lysi
s rm
s er
ror
Bootstrap particle filterEnsemble Kalman filter
Still, Gaussian estimation leads to a complexity of P(R)⊗P(R).
Mathematically tractable (if the full covariance matrix is not made explicit).
Supported by central limit theorem.
Least committed distribution when only first and second moments are known.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 7 / 28
Dealing with non-Gaussianity in a Gaussian framework
Outline
1 Why not non-Gaussian (from the start) ?
2 Dealing with non-Gaussianity in a Gaussian framework
3 Bridging the gap between Gaussian and non-Gaussian data assimilation
4 Conclusions
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 8 / 28
Dealing with non-Gaussianity in a Gaussian framework
Sources of non-gaussianity in a geophysical context
nonlinear transition
Gaussian pdf non−Gaussian pdf
x1
x 2
Nonlinearities in models generate non-Gaussian pdfs
Nonlinearity of Navier-Stokes leading to chaos, thresholds (cloud, rain), chemistry,increase in resolution (precipitation at convective scale), etc.
Observation operator model.
Non-Gaussian priors sometimes more adequate description
Background information in state/control space: humidity (Gaussian anamorphosisthough), emission inventories in atmospheric chemistry.
Observation error prior: Huber norm, combination of l1 and l2 (Gaussian +account for gross errors), l∞ norm, log-normal and multiplicative errors, etc.
Advanced model error prior.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 9 / 28
Dealing with non-Gaussianity in a Gaussian framework
Dealing with nonlinearity in a Gaussian framework
The priors can be assumed Gaussian, but the models remain nonlinear [Gauthier 1992,
Stensrud et al. 1992, Miller et al. 1994, Pires et al. 1996], and it must be dealt with . . .
4D-Var solutions to deal with nonlinearity
Risk: Gaussian-based Bayesian estimation may rigorously lead to multimodaldistribution, whenever nonlinear operators involved.
Fixes: Outer loop to enforce the full (high-res) nonlinear model. Inner loop towarranty fast optimization (conjugate gradient), and (local) uniqueness ofminimum.
4D−Var
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 10 / 28
Dealing with non-Gaussianity in a Gaussian framework
Dealing with nonlinearity in a Gaussian framework
The priors can be assumed Gaussian, but the models remain nonlinear . . .
EnKF solutions to deal with nonlinearities
Ensemble encodes all statistics. Ensemble propagated by model without proxy.
Fixes: Ensemble statistics assumed Gaussian (a priori and a posteriori, eventhough they may not be) so has to keep ensemble coherence.
t t+2p pp
t+1
++
+
+
Gaussian proxy of statistics
p −t+1
−
Gaussian ensemble filter
meanmean
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 11 / 28
Dealing with non-Gaussianity in a Gaussian framework
Measuring non-Gaussianity: how much do we loose being Gaussian ?
Relative entropy
◮ Fundamental measure of the discrepancy between two pdfs: relative entropy
K (p,q) =∫
dp lnp
q.
Geophysical applications in predictibility [Kleeman 2002], in statistics of geophysicaldynamical systems [Majda], in inverse modeling [Bocquet 2005], in modeling of prior pdfs[Eyink et al., Pires et al., 2004-2008].◮ Difficult to handle in high-dimensional systems.◮ p = prediction or analysis uncertainty pdf.◮ q = Gaussian proxy of the pdf with the same first and second-order moments.
Gram-Charlier/Edgeworth expansions of K
Gram-Charlier/Edgeworth expansion of p/q, leads to (skewness and kurtosis order)
K (p,q) ≃Gra.
1
12 ∑i ,j ,k
(κi ,j ,k)2 +1
48 ∑i ,j ,k,l
(κi ,j ,k,l )2
K (p,q) ≃Edg.
1
12 ∑i ,j ,k
(κi ,j ,k)2 +O
(1
I 3/2
)
where κi1,i2,...,in are the standardized cumulants of p of order n.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 12 / 28
Dealing with non-Gaussianity in a Gaussian framework
Measuring non-Gaussianity: how much do we loose being Gaussian ?
Multivariate test of normality
◮ Numerous various test of normality (univariate): Kolmogorov-Smirnov,Anderson-Darling, Shapiro-Wilk test.◮ Multivariate case: a few test, difficult to handle for large sample size and largenumber degrees of freedom.◮ Necessary but insufficient test: comparing the Mahalanobis norm of members to a χ2
law, using an univariate null-hypothesis test, marginals of pdf, . . .
Lorenz63Free runInitially:Gaussianensemble
with σ = 0.1
0.0 0.5 1.0 1.5 2.0-20
-15
-10
-5
0
5
10
15
20
0.0 0.5 1.0 1.5 2.0
-40
-20
0
20
40
0.0 0.5 1.0 1.5 2.00
10
20
30
40
50
0.000
0.015
0.030
0.045
0.060
0.075
0.090
0.105
0.120
0.135
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 13 / 28
Dealing with non-Gaussianity in a Gaussian framework
Measuring non-Gaussianity: how much do we loose being Gaussian ?
0.0 0.5 1.0 1.5 2.0-20
-15
-10
-5
0
5
10
15
20
0.0 0.5 1.0 1.5 2.0
-40
-20
0
20
40
0.0 0.5 1.0 1.5 2.00
10
20
30
40
50
0.000
0.015
0.030
0.045
0.060
0.075
0.090
0.105
0.120
0.135
0 0.5 1 1.5 2time
0
1
2
3
4
5
6
Rel
ativ
e en
trop
y K
K from full pdf
K Edgeworth expansion O(l -3/2
)K from univariate marginalsK from bivariate marginals
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 14 / 28
Dealing with non-Gaussianity in a Gaussian framework
Reducing nonlinearity impact: divide and conquer
With finer discretizations, nature becomes Gaussian (as long as it becomes linear) . . .
Adaptive data assimilation
◮ Assimilation could adapt to the varying instability of the flow. For instance, theefficient variational assimilation window length of τeff(x) ∝ λ−1(x) [Pires et al. 1996], whereλ (x) is the typical local Lyapunov exponent: smaller delay between analyses required.◮ Identify low dimensional manifold to deploy particle filters [Berliner & Wickle 2007].
Localizing strategies for particle filters
◮ A smaller number of particles for smaller areas.◮ But contrary to localized EnKF, not trivial glueing of the subsequent local estimatesfrom the analysis, . . . [van Leeuven, 2004-2008]
Gaussian mixtures
◮ Many components mixture: ultimately as difficult as part. filters [Bengtsson et al., 2003].◮ Can be used to estimate with a finite number of components a non-Gaussian pdfwith analytically tractable estimation equations.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 15 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Outline
1 Why not non-Gaussian (from the start) ?
2 Dealing with non-Gaussianity in a Gaussian framework
3 Bridging the gap between Gaussian and non-Gaussian data assimilation
4 Conclusions
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 16 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Gaussian on non-Gaussian grounds: deviation from climatology
Maximum entropy filter [Eyink & Kim 2006]
◮ The pdf of an ensemble should be, given its mean and variance, the closest to theclimatology pdf q. Distance measured by the relative entropy:
K (p,q) =∫
dp ln(p/q) .
Ensemble second-order statistics:
y =1
I
I
∑i=1
Hxi and Y =1
I
I
∑i=1
Hxi (Hxi )†.
pdf generic form: p(x,λ ,Λ) ∝ q(x)exp(
λ †Hx− 12x†H†ΛHx
).
Dual parameter estimations:
λ ,Λ = argmin
(ln(Z(λ ,Λ))−λ †y+
1
2Tr
(YΛ
))
◮ Assuming linear observation operator H and Gaussian error (observation y with errorstatistics R), the pdf is updated using Bayes rule, within a dual framework.Dual parameters update:
λ+ = λ− +R−1y and Λ+ = Λ− +R−1 .
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 17 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Gaussian on non-Gaussian grounds: deviation from climatology
Maximum entropy filter [Eyink & Kim 2006]
◮ Resampling (like in deterministic analysis filter)◮ In essence: it is a dual ensemble Kalman filter upon a reference pdf given by theclimatology. It is efficient on Lorenz-63.
t+2pp
t
+
pt+1
pt+1− +
Maximum entropy filter
pdf of climatology
−+
Lorenz-63 analysis error r.m.s.
∆t EnKF MEF
1/6 1.0457 1.7846
1/3 1.5034 1.6041
2/3 1.1548 1.0200
4/3 0.7212 0.6529Eyink and Kim, J.Stat.Phys., 2006
◮ Degraded version (first moments only): mean-field filter.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 18 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Non-Gaussian on Gaussian grounds: importance filtering
Main ideas of importance sampling
◮ Empirical representation with a mix of particles trajectories and weights:
pt(Xt |Yt) ≃N
∑i=1
w it δ (Xt −Xi
t) .
where the particles trajectories are drawn from a known proposal pdf q.This is possible if the weights are of the form
w it ∝
pt(Yt |Xit)p(Xi
t)
qt(Xit |Yt)
.
◮ Sequential filtering version:
w it ∝ w i
t−1
pt(yt |xit)pt(x
it |xi
t−1)
qt(xit |Xi
t−1,Yt).
◮ If proposal qt(xit |Xi
t−1,Yt) , pt(xt |xt−1) then this is a bootstrap filter !◮ To avoid too unlikely trajectories, particles should be drawn from a proposal makinguse of yt , but this is not considered easy, unless one practices ensemble-based Kalmanfilters . . .
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 19 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Non-Gaussian on Gaussian grounds: importance sampling
Observation-dependent proposal: Gaussian filters [van der Merwe et al. 2000, Papadakis 2007]
If x it and Pi
t are mean and covariance of an ensemble-based Gaussian filter: EKF, UKF,EnKF, ETKF, etc, then
q(xit |Xi
t−1,Yt) , N(x it ,P
it)
q(xit |Xi
t−1,Yt) , N(x t ,Pt) .
Second one: kind of weighted EnKF but it’s a particle filter !
Lorenz-96 experimentN = 5 variables, F = 8
w it ∝ w i
t−1
pt(yt |xit)pt(x
it |xi
t−1)
N(x t ,Pt).
Less particles are wasted !It has the good asymptotics !
10 100 1000 10000Number of particles
0.4
0.8
1.6
3.2
Ana
lysi
s rm
s er
ror
Bootstrap particle filterEnsemble Kalman filterEnKF-based particle filter
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 20 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Non-Gaussian prior construction
Measuring innovation non-Gaussianity [Pires & Talagrand 2004-2008]
◮ Compute the deviation from gaussianity of the innovation q = y −H(xb)
Statistics: skewness s =E
[(q−q)3
]
E[(q−q)2]3/2and kurtosis k =
E[(q−q)4
]
E[(q−q)2]2−3.
◮ Deviation from Gaussianity estimated by a Gram-Charlier expansion (1d case)◮ Compute the least committed pdf consistent with skewness and kurtosis ofinnovations to construct a joint prior ν(εo ,εb), using the maximum entropy principle.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Innovation = Observation minus Background (K)
0
0.5
1
1.5
2
2.5
Pro
babi
lity
Den
sity
Fun
ctio
n
Maximum Entropy FitGaussian FitNormalized Histogram
HIRS - Channel 4
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
Errors (K)
0
2
4
6
8
10
A-P
riori
Pro
babi
lity
Den
sity
Fun
ctio
n
Gaussian Observation Non-Gaussian Background
HIRS - Channel 4
Innovation pdf fit [Pires & Talagrand, 2008] Priors pdf from ME
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 21 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Linear models acting on non-Gaussian priors
Linear models
◮ System driven by forcing field or initial condition x ∈ RN with model/observation
error e ∈ Rp :
y = Hx+e ,
with H the (up to 4D) model/observation Jacobian.◮ Statistical modelling: prior pdf on controls and errors: ν(x,e), posterior pdf : p(x,e).
Bayesian inference + maximum a posteriori [Bocquet 2007]
◮ Primal cost function (≡ 4D-Var in Gaussian context):
L (x) = − lnν(x,y−Hx) .
◮ If convexity proven, dual cost function (≡ PSAS in Gaussian context):
L (λ ) = (− lnν)∗(H†λ ,λ
)−y†λ ,
where ν∗ is the Legendre-Fenchel conjugate of ν.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 22 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Linear models acting on non-Gaussian priors
Thanks to nonlinear convex analysis . . .
Maximum entropy on the mean [Bocquet 2005-2008]
◮ Fully non-Gaussian generalization of 4D-Var / PSAS when models are linear
primal dual
level2 L = K (p,ν)+λ †Ep[µ −Hx−e]
log−Laplace
��
contraction
��
___________
gg
gg
gg
gg
gg
gg
gg
WW
WW
WW
WW
WW
WW
WW L = ν(H†λ ,λ )−λ †µ
Legendre−Fenchellllevel1 L = ν∗(x,e)+λ † (µ −Hx−e)
◮ Equivalence of all cost functions thanks to convexity.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 23 / 28
Bridging the gap between Gaussian and non-Gaussian data assimilation
Linear models acting on non-Gaussian priors
Example of forecast of the ETEX-I plume (103 obs. used, 2×105 control variables).
45°N
50°N
55°N
60°N
65°N10°W 0° 10°E 20°E 30°E
+3 h
10°W 0° 10°E 20°E 30°E
+3 h
45°N
50°N
55°N
60°N
65°N
+12 h +12 h
45°N
50°N
55°N
60°N
65°N
+24 h +24 h
45°N
50°N
55°N
60°N
65°N
+48 h +48 h
45°N
50°N
55°N
60°N
65°N
10°W 0° 10°E 20°E 30°E
+72 h +72 h
0.01 0.05 0.1 0.3 1.0 50.0
Reference knowing the release / Gaussian assimilation / non-Gaussian assimilation.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 24 / 28
Conclusions
Outline
1 Why not non-Gaussian (from the start) ?
2 Dealing with non-Gaussianity in a Gaussian framework
3 Bridging the gap between Gaussian and non-Gaussian data assimilation
4 Conclusions
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 25 / 28
Conclusions
Summary
Fully non-Gaussian numerical solutions of estimation still not affordable.
Mathematical tools exist that can objectively measure the departure fromGaussianity.
Expansion (more or less affordable) around Gaussian filtering is possible
In specific cases, and sometimes in high dimensions, non-perturbative methods arepossible.
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 26 / 28
Conclusions
Comments
So do we need non-Gaussian modelling after all ?
Nonlinearity of models: nothing that will be ultimately be dealt with local in spaceand/or time ?
Non-Gaussian approaches: just refinements (deviations from Gaussianity) ?
Still need to model non-Gaussian priors (that may result from the nonlinearity ofmodels).
How do we measure the deviations from Gaussianity: criteria based on the flow(singular vectors, breeding modes) or uncertainty based (relative entropy,statistical tests, validation) ?
So far, very orientated towards getting the best estimator.What about really getting the pdf (or higher order moments) ?May become a strong issue when passing from best estimate obtained from dataassimilation to best ensemble estimate obtained from data assimilation (≃calibration of ensemble by data assimilation).
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 27 / 28
Conclusions
Thank you !
M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 28 / 28