Issues of nonlinearity and non-gaussianity

Issues of nonlinearity and non-gaussianity

A brief tour in non-Gaussian data assimilation

with a view to large geophysical systems

Marc Bocquet([email protected])

CEREA, Ecole des Ponts ParisTech / EDF R&DUniversite Paris-Est and INRIA

Thanks to Lin Wu for his valuable suggestions.

M. Bocquet 4D-VAR and EnKF inter-comparisons workshop, Buenos-Aires, 10-13 November 2008 1 / 28

Why not non-Gaussian (from the start) ?

Outline

1 Why not non-Gaussian (from the start) ?

2 Dealing with non-Gaussianity in a Gaussian framework

3 Bridging the gap between Gaussian and non-Gaussian data assimilation

4 Conclusions



Nonlinear statistical estimation: discrete approach

Dynamics, observation and statistics

xk+1 = Mk(xk)+wk and yk = Hk (xk )+vk

p(xk+1|xk ) = pW (xk −Mk(xk))︸︷︷︸transition kernel

, p(yk |xk) = pV (yk −Hk(xk ))︸︷︷︸likelihood

are known

Smoothing approach: Given Xk = {x1,x2, . . . ,xN} and Yk = {y1,y2, . . . ,yN}, recursiveapplication of Bayes and transition rules lead to

p(Xk |Yk ) =K

∏k=1

[pV (yk −Hk(xk))pW (xk+1 −Mk(xk ))

]p(x0)

Maximum a posteriori of ln(p(Xk |Yk)) defines the variational cost function.

Sequential approach (filtering problem):

Forecast (Chapman-Kolmogorov): p(xk+1|Yk ) =∫

dxkpW (xk+1 −Mk(xk ))p(xk |Yk ).

Analysis (Bayes): p(xk |Yk ) =pv(yk −H(xk ))p(xk |Yk−1)∫

dxkpV (yk −H(xk))p(xk |Yk−1).



Nonlinear statistical estimation: Fokker-Plank and Zakai equations

Continuous time, state space discretized as x = {x1,x2, . . . ,xN}†. Model equation

dxt = f(xt , t)dt +g(xt , t) ·dwt .

Fokker-Planck equation for the relative probability density function (Q = gtg†t )

∂pt

∂ t= −∇ · (f(x, t)pt)+

1

2 ∑i ,j

∂ 2

∂xi ∂xj

([Q]ij pt

)= LFP(pt) .

Adding the observation equation dyt = h(xt , t)dt +√

Rdvt ,leads to the Zakai (or normalized Kushner) equation:

dpt = LFP(pt)dt +pth†tR

−1t dyt .

From RN to P(R)⊗

N

, the maths exist but the complexity is too high !

Similar to the passage from classical to quantum physics . . .[Miller et al. 1999]



Numerics: particle filter

Monte Carlo approaches to solve these nonlinear filtering equations are called particlefilters. Most intuitive one: bootstrap filter

Particles {x1,x2, . . . ,xI } sample the pdf pt(x): pt(x) ≃ ∑Ii=1 wi δ (x−xi

t).

Propagation of the particles trough the model: pt+1(x) ≃ ∑Ii=1 wi δ (x−xi

t+1).

Analysis (weights altered by likelihood): w it+1 ∝ w i

tp(yt+1|xit+1).

When necessary, resampling of the ensemble, using the unbalanced weights wi .

t t+2p pp

t+1

observation

pt+1+−+ −

resampling

+

[Handschin et al. 1969, Gordon et al. 1993, Van Leeuwen 2002, Zhou 1996]



Curse of dimensionality

Problem: particle filters work fine up to N ∼ 4−8. When the state space and/or theobservation space get bigger, degeneracy/collapse of the weights: only one particleremains likely. This implies a failure of the filter as a modal estimator.Resampling helps but is not solving the issue.

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

Max

imum

wei

ght f

requ

ency N = 40 variables

Balanced Lorenz-96 particle filter

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

Max

imum

wei

ght f

requ

ency N = 80 variables

Degenerate Lorenz-96 particle filter

The required ensemble size scales exponentially with the state space size N, theobservation space size or the innovation variance.[Snyder, Bengtsson et al. 2007-2008]



Gaussian as maskeshift ?

Lorenz-96 (N = 10 variables, F = 8) experiment

10 100 1000 10000 1e+05Number of particles

0.4

0.8

1.6

3.2

Ana

lysi

s rm

s er

ror

Bootstrap particle filterEnsemble Kalman filter

Still, Gaussian estimation leads to a complexity of P(R)⊗P(R).

Mathematically tractable (if the full covariance matrix is not made explicit).

Supported by central limit theorem.

Least committed distribution when only first and second moments are known.


Dealing with non-Gaussianity in a Gaussian framework

Outline




4 Conclusions



Sources of non-gaussianity in a geophysical context

nonlinear transition

Gaussian pdf non−Gaussian pdf

x1

x 2

Nonlinearities in models generate non-Gaussian pdfs

Nonlinearity of Navier-Stokes leading to chaos, thresholds (cloud, rain), chemistry,increase in resolution (precipitation at convective scale), etc.

Observation operator model.

Non-Gaussian priors sometimes more adequate description

Background information in state/control space: humidity (Gaussian anamorphosisthough), emission inventories in atmospheric chemistry.

Observation error prior: Huber norm, combination of l1 and l2 (Gaussian +account for gross errors), l∞ norm, log-normal and multiplicative errors, etc.

Advanced model error prior.



Dealing with nonlinearity in a Gaussian framework

The priors can be assumed Gaussian, but the models remain nonlinear [Gauthier 1992,

Stensrud et al. 1992, Miller et al. 1994, Pires et al. 1996], and it must be dealt with . . .

4D-Var solutions to deal with nonlinearity

Risk: Gaussian-based Bayesian estimation may rigorously lead to multimodaldistribution, whenever nonlinear operators involved.

Fixes: Outer loop to enforce the full (high-res) nonlinear model. Inner loop towarranty fast optimization (conjugate gradient), and (local) uniqueness ofminimum.

4D−Var



Dealing with nonlinearity in a Gaussian framework

The priors can be assumed Gaussian, but the models remain nonlinear . . .

EnKF solutions to deal with nonlinearities

Ensemble encodes all statistics. Ensemble propagated by model without proxy.

Fixes: Ensemble statistics assumed Gaussian (a priori and a posteriori, eventhough they may not be) so has to keep ensemble coherence.

t t+2p pp

t+1

++

+

+

Gaussian proxy of statistics

p −t+1

−

Gaussian ensemble filter

meanmean



Measuring non-Gaussianity: how much do we loose being Gaussian ?

Relative entropy

◮ Fundamental measure of the discrepancy between two pdfs: relative entropy

K (p,q) =∫

dp lnp

q.

Geophysical applications in predictibility [Kleeman 2002], in statistics of geophysicaldynamical systems [Majda], in inverse modeling [Bocquet 2005], in modeling of prior pdfs[Eyink et al., Pires et al., 2004-2008].◮ Difficult to handle in high-dimensional systems.◮ p = prediction or analysis uncertainty pdf.◮ q = Gaussian proxy of the pdf with the same first and second-order moments.

Gram-Charlier/Edgeworth expansions of K

Gram-Charlier/Edgeworth expansion of p/q, leads to (skewness and kurtosis order)

K (p,q) ≃Gra.

1

12 ∑i ,j ,k

(κi ,j ,k)2 +1

48 ∑i ,j ,k,l

(κi ,j ,k,l )2

K (p,q) ≃Edg.

1

12 ∑i ,j ,k

(κi ,j ,k)2 +O

(1

I 3/2

)

where κi1,i2,...,in are the standardized cumulants of p of order n.




Multivariate test of normality

◮ Numerous various test of normality (univariate): Kolmogorov-Smirnov,Anderson-Darling, Shapiro-Wilk test.◮ Multivariate case: a few test, difficult to handle for large sample size and largenumber degrees of freedom.◮ Necessary but insufficient test: comparing the Mahalanobis norm of members to a χ2

law, using an univariate null-hypothesis test, marginals of pdf, . . .

Lorenz63Free runInitially:Gaussianensemble

with σ = 0.1

0.0 0.5 1.0 1.5 2.0-20

-15

-10

-5

0

5

10

15

20

0.0 0.5 1.0 1.5 2.0

-40

-20

0

20

40

0.0 0.5 1.0 1.5 2.00

10

20

30

40

50

0.000

0.015

0.030

0.045

0.060

0.075

0.090

0.105

0.120

0.135




0.0 0.5 1.0 1.5 2.0-20

-15

-10

-5

0

5

10

15

20

0.0 0.5 1.0 1.5 2.0

-40

-20

0

20

40

0.0 0.5 1.0 1.5 2.00

10

20

30

40

50

0.000

0.015

0.030

0.045

0.060

0.075

0.090

0.105

0.120

0.135

0 0.5 1 1.5 2time

0

1

2

3

4

5

6

Rel

ativ

e en

trop

y K

K from full pdf

K Edgeworth expansion O(l -3/2

)K from univariate marginalsK from bivariate marginals



Reducing nonlinearity impact: divide and conquer

With finer discretizations, nature becomes Gaussian (as long as it becomes linear) . . .

Adaptive data assimilation

◮ Assimilation could adapt to the varying instability of the flow. For instance, theefficient variational assimilation window length of τeff(x) ∝ λ−1(x) [Pires et al. 1996], whereλ (x) is the typical local Lyapunov exponent: smaller delay between analyses required.◮ Identify low dimensional manifold to deploy particle filters [Berliner & Wickle 2007].

Localizing strategies for particle filters

◮ A smaller number of particles for smaller areas.◮ But contrary to localized EnKF, not trivial glueing of the subsequent local estimatesfrom the analysis, . . . [van Leeuven, 2004-2008]

Gaussian mixtures

◮ Many components mixture: ultimately as difficult as part. filters [Bengtsson et al., 2003].◮ Can be used to estimate with a finite number of components a non-Gaussian pdfwith analytically tractable estimation equations.


Bridging the gap between Gaussian and non-Gaussian data assimilation

Outline




4 Conclusions



Gaussian on non-Gaussian grounds: deviation from climatology

Maximum entropy filter [Eyink & Kim 2006]

◮ The pdf of an ensemble should be, given its mean and variance, the closest to theclimatology pdf q. Distance measured by the relative entropy:

K (p,q) =∫

dp ln(p/q) .

Ensemble second-order statistics:

y =1

I

I

∑i=1

Hxi and Y =1

I

I

∑i=1

Hxi (Hxi )†.

pdf generic form: p(x,λ ,Λ) ∝ q(x)exp(

λ †Hx− 12x†H†ΛHx

).

Dual parameter estimations:

λ ,Λ = argmin

(ln(Z(λ ,Λ))−λ †y+

1

2Tr

(YΛ

))

◮ Assuming linear observation operator H and Gaussian error (observation y with errorstatistics R), the pdf is updated using Bayes rule, within a dual framework.Dual parameters update:

λ+ = λ− +R−1y and Λ+ = Λ− +R−1 .



Gaussian on non-Gaussian grounds: deviation from climatology

Maximum entropy filter [Eyink & Kim 2006]

◮ Resampling (like in deterministic analysis filter)◮ In essence: it is a dual ensemble Kalman filter upon a reference pdf given by theclimatology. It is efficient on Lorenz-63.

t+2pp

t

+

pt+1

pt+1− +

Maximum entropy filter

pdf of climatology

−+

Lorenz-63 analysis error r.m.s.

∆t EnKF MEF

1/6 1.0457 1.7846

1/3 1.5034 1.6041

2/3 1.1548 1.0200

4/3 0.7212 0.6529Eyink and Kim, J.Stat.Phys., 2006

◮ Degraded version (first moments only): mean-field filter.



Non-Gaussian on Gaussian grounds: importance filtering

Main ideas of importance sampling

◮ Empirical representation with a mix of particles trajectories and weights:

pt(Xt |Yt) ≃N

∑i=1

w it δ (Xt −Xi

t) .

where the particles trajectories are drawn from a known proposal pdf q.This is possible if the weights are of the form

w it ∝

pt(Yt |Xit)p(Xi

t)

qt(Xit |Yt)

.

◮ Sequential filtering version:

w it ∝ w i

t−1

pt(yt |xit)pt(x

it |xi

t−1)

qt(xit |Xi

t−1,Yt).

◮ If proposal qt(xit |Xi

t−1,Yt) , pt(xt |xt−1) then this is a bootstrap filter !◮ To avoid too unlikely trajectories, particles should be drawn from a proposal makinguse of yt , but this is not considered easy, unless one practices ensemble-based Kalmanfilters . . .



Non-Gaussian on Gaussian grounds: importance sampling

Observation-dependent proposal: Gaussian filters [van der Merwe et al. 2000, Papadakis 2007]

If x it and Pi

t are mean and covariance of an ensemble-based Gaussian filter: EKF, UKF,EnKF, ETKF, etc, then

q(xit |Xi

t−1,Yt) , N(x it ,P

it)

q(xit |Xi

t−1,Yt) , N(x t ,Pt) .

Second one: kind of weighted EnKF but it’s a particle filter !

Lorenz-96 experimentN = 5 variables, F = 8

w it ∝ w i

t−1

pt(yt |xit)pt(x

it |xi

t−1)

N(x t ,Pt).

Less particles are wasted !It has the good asymptotics !

10 100 1000 10000Number of particles

0.4

0.8

1.6

3.2

Ana

lysi

s rm

s er

ror

Bootstrap particle filterEnsemble Kalman filterEnKF-based particle filter



Non-Gaussian prior construction

Measuring innovation non-Gaussianity [Pires & Talagrand 2004-2008]

◮ Compute the deviation from gaussianity of the innovation q = y −H(xb)

Statistics: skewness s =E

[(q−q)3

]

E[(q−q)2]3/2and kurtosis k =

E[(q−q)4

]

E[(q−q)2]2−3.

◮ Deviation from Gaussianity estimated by a Gram-Charlier expansion (1d case)◮ Compute the least committed pdf consistent with skewness and kurtosis ofinnovations to construct a joint prior ν(εo ,εb), using the maximum entropy principle.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Innovation = Observation minus Background (K)

0

0.5

1

1.5

2

2.5

Pro

babi

lity

Den

sity

Fun

ctio

n

Maximum Entropy FitGaussian FitNormalized Histogram

HIRS - Channel 4

-0.6 -0.4 -0.2 0 0.2 0.4 0.6

Errors (K)

0

2

4

6

8

10

A-P

riori

Pro

babi

lity

Den

sity

Fun

ctio

n

Gaussian Observation Non-Gaussian Background

HIRS - Channel 4

Innovation pdf fit [Pires & Talagrand, 2008] Priors pdf from ME



Linear models acting on non-Gaussian priors

Linear models

◮ System driven by forcing field or initial condition x ∈ RN with model/observation

error e ∈ Rp :

y = Hx+e ,

with H the (up to 4D) model/observation Jacobian.◮ Statistical modelling: prior pdf on controls and errors: ν(x,e), posterior pdf : p(x,e).

Bayesian inference + maximum a posteriori [Bocquet 2007]

◮ Primal cost function (≡ 4D-Var in Gaussian context):

L (x) = − lnν(x,y−Hx) .

◮ If convexity proven, dual cost function (≡ PSAS in Gaussian context):

L (λ ) = (− lnν)∗(H†λ ,λ

)−y†λ ,

where ν∗ is the Legendre-Fenchel conjugate of ν.




Thanks to nonlinear convex analysis . . .

Maximum entropy on the mean [Bocquet 2005-2008]

◮ Fully non-Gaussian generalization of 4D-Var / PSAS when models are linear

primal dual

level2 L = K (p,ν)+λ †Ep[µ −Hx−e]

log−Laplace

��

contraction

��

___________

gg

gg

gg

gg

gg

gg

gg

WW

WW

WW

WW

WW

WW

WW L = ν(H†λ ,λ )−λ †µ

Legendre−Fenchellllevel1 L = ν∗(x,e)+λ † (µ −Hx−e)

◮ Equivalence of all cost functions thanks to convexity.




Example of forecast of the ETEX-I plume (103 obs. used, 2×105 control variables).

45°N

50°N

55°N

60°N

65°N10°W 0° 10°E 20°E 30°E

+3 h

10°W 0° 10°E 20°E 30°E

+3 h

45°N

50°N

55°N

60°N

65°N

+12 h +12 h

45°N

50°N

55°N

60°N

65°N

+24 h +24 h

45°N

50°N

55°N

60°N

65°N

+48 h +48 h

45°N

50°N

55°N

60°N

65°N

10°W 0° 10°E 20°E 30°E

+72 h +72 h

0.01 0.05 0.1 0.3 1.0 50.0

Reference knowing the release / Gaussian assimilation / non-Gaussian assimilation.


Conclusions

Outline




4 Conclusions


Conclusions

Summary

Fully non-Gaussian numerical solutions of estimation still not affordable.

Mathematical tools exist that can objectively measure the departure fromGaussianity.

Expansion (more or less affordable) around Gaussian filtering is possible

In specific cases, and sometimes in high dimensions, non-perturbative methods arepossible.


Conclusions

Comments

So do we need non-Gaussian modelling after all ?

Nonlinearity of models: nothing that will be ultimately be dealt with local in spaceand/or time ?

Non-Gaussian approaches: just refinements (deviations from Gaussianity) ?

Still need to model non-Gaussian priors (that may result from the nonlinearity ofmodels).

How do we measure the deviations from Gaussianity: criteria based on the flow(singular vectors, breeding modes) or uncertainty based (relative entropy,statistical tests, validation) ?

So far, very orientated towards getting the best estimator.What about really getting the pdf (or higher order moments) ?May become a strong issue when passing from best estimate obtained from dataassimilation to best ensemble estimate obtained from data assimilation (≃calibration of ensemble by data assimilation).


Conclusions

Thank you !


Documents

Issues of nonlinearity and non-gaussianity