PowerPoint at Readingdarc/nceo_training/reading2011/ensemble_da.… · kkk K P H HP H R 11 af P I KH P kk a kk k k M w w xx x M x a kk k k H w w xx x H x. NCEO 2 Day Intensive Course

NCEO 2 Day Intensive Course on Data Assimilation 1

Ensemble methods for data assimilation

Stefano Migliorini

National Centre for Earth Observation

University of Reading


Contents

• The weather prediction problem

• Need for data assimilation

• The Kalman filter

• Data assimilation techniques for NWP

• Ensemble methods: features and issues

• Examples with idealized models


Numerical weather prediction

• The mass conservation equation, the momentum

equation, the energy equation, the concentration

equation for humidity and the equation of state

give expressions for the rates of change of x(r,t)

= (v, T, p, ρ, q)

• all the properties of the system can in principle

be calculated from nonlinear partial differential

equations (PDE’s) and boundary and

initial conditions.1 0

( )t tMx x


Predictability

• Consider two initial atmospheric states and

which differ by an infinitesimally small quantity

(or “error”)

• A initial difference (or “error”) between two

solutions (or “analogs”) to the equations for the

evolution of the (atmospheric) state may grow

with time

0tx

0tx

0 x

t0

xt0

x

t1

M(xt0

) M(xt0

0) M(x

t0

)M(xt0

)0

1 M(x

t0

) M(xt0

) M(xt0

)0

1 0 0 0 0( ) ( ) ( : )nn t t nt t

M x M x M


xt0

xt0

ε0

xt1=M(xt0)

xt1=M(xt0)=M(xt0)+M(xt0)ε

ε1

xt0=xt0+ε0

xt1=xt1+ε1


The need for data assimilation• Weather and climate forecasts are uncertain due

to uncertainty in– Initial and boundary conditions

– Models (e.g., hydrostatic assumption or shallow water approximation; discretization)

– Parameters (e.g., CO2 sources and sinks)

• Not only we need accurate models but also good knowledge of initial (and boundary) conditions

• It is essential that good quality, widespread and frequent measurements (e.g., from satellites) are assimilated in NWP models in order to

achieve good quality forecasts


Data coverage at ECMWFsurface stations buoys

aircraft radiosondesCourtesy ECMWF


The global satellite operationalobserving system


Data coverage at ECMWFInfrared AMVs Microwave imagers

IASI Bending angleCourtesy ECMWF


Observation model

• Discrete stochastic observation model

• Where we assume:

• E{k}=0 for all k

• E{x0kT} = 0

• E{khjT} = 0

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 … tk …

y

xk

k

( )k k kH y x ε

{ }kT

k j

j kE

j k

Rε ε

0

Observational issues

• Bias monitoring

• Quality control

obs rejected when

• Correlated observation error

Whithening filter:


HIRS channel 5 on NOAA-14 satellite HIRS channel 5 on NOAA-16 satellite

cov( ) T

b y Hx R HBH

Courtesy Dick Dee ECMWF

2 2 2 1/2( )b o by hx h

1/2 1/2 1/2

t o

R y R Hx R ε


Stochastic-dynamic model

• In the presence of model error the dynamical system is described by

• Where x0 random with E{x0} = m0 and E{(x0 -m0)(x0 - m0

)T} = P0

• We assume (simplifying assumptions!):

• E{hk} = 0 for all k and

• Model error uncorrelated with the initial state:

E{hk x0T} = 0

• This means xk is random with probability density p(xk)

• xk (or, equivalently, p(xk)) is to be determined

h

k

1( )k k kM x x η

{ }kT

k j

j kE

j k

Qη η

0


The data assimilation problem

• Consider a set of realizations of observations

• When k≤l (i.e., for tk≤tl), the conditional

probability density p(xk|Yl) provides the solution

of the data assimilation (or filtering) problem

• From p(xk|Yl) we could calculate E{xk|Yl} (the

estimate of xk at time tk or analysis) and the

covariance matrix

{ 1 2Y , , ,l l y y y


Data assimilation for NWP

• Is data assimilation (or probabilistic forecasting) a

solved problem?? Certainly not for NWP

applications!

• If we sample the probability to find a random

variable between 0 and 1 with a 0.01 resolution,

we need 101 sample points (say 100 for simplicity)

• How many sample points do we need to map the

probability space in the case of a random vector

with 7 components? (think of x(r,t) = (v, T, p, ρ,

q))


The “curse of dimensionality”• We need, you guessed it, 1007 = 1014 sample

points. And to store such a number on a computer we need a hundred terabyte’s memory space, for each grid point! This is a lot of hard-

disks…

• Note that the current Met Office operational global NWP model considers 1024 x 769 grid

points over 70 vertical levels (~55 million grid points) at a given time!

• This means that, by today standards, it is not possible to sample the pdf of the state vector used in NWP


The Kalman filter• The good news is: under certain conditions we may

not need to keep track of the whole joint state pdf

• If all relevant errors (i.e., model, initial conditions

and observations errors) are Gaussian, their pdfs are

completely known if their means and covariances are

known

• When the model and the observation operator are

linear, an optimal estimate of the mean and the

covariance of p(xk|Yl) are given by the Kalman filter,

that is the solution of the data assimilation problem.

• In the presence of (moderate) nonlinearities (typical

case in NWP): extended Kalman filter


The extended Kalman filter • Let now consider the nonlinear case with

Gaussian errors

• Forecast step

• Data assimilation step

1 1( , )k kN ε 0 R

1 ( )f a

k kM x x

1 1( , )k kN η 0 Q( , )a f

k k kNx x P

1 1( )k k kM x x η

1 1

f a T

k k k P MP M Q

( )k k kH y x ε

1 1 1 1( ( ))a f f

k k k kH x x K y x 1

1 1 1( )f T f T

k k k

K P H HP H R

1 1( )a f

k k P I KH P

( )

ak k

k

k

M

x x

xM

x

( )

ak k

k

k

H

x x

xH

x


Example: scalar and linear case

• Updating at tk+1 prior information valid at tk with yk+1

1 1k k kx ax h 1 1 1k k ky x

2 2 2 2

1 1( ) ( ) ( )f f f

k k k ka h P

2

1

2 2

1

( )

( ) ( )

f

k

f

k k

K2

11 12 2

1

( )( )

( ) ( )

fa f fkk k k kf

k k

x ax y ax

2 2

2 2 2 2 2

1 1 12 2 2 2

1 1

( ) ( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

a a f fk kk k k k kf f

k k k k

a h

P

1

f f

k kx ax


Evolution of the pdf (with update)

a = 1.2x0 = 50

f)2 = 0.5(2

h)0 = 0.1


Evolution of the pdf: cycling


• To compute (not just store) the covariance matrix

update , it can be shown that we need ~ n3

Flops (n~107 at the Met Office), i.e. 1021 Flops

(floating-point operations)

• The IBM supercomputer at ECMWF has a maximal

achieved performance of ~100 TFlops / second

~1014 Flops / s. We need ~107 s to compute ,

that is about five months!

• Clearly, we need to resort to some approximations

and/or alternative strategies

Use of Kalman filters in NWP

1

a

k P

1

a

k P


Data assimilation techniques for NWP

• The most widespread data assimilation approach used for NWP is based on variational techniques

• 4D-Var calculates the optimal model trajectory over a given time window (typically 6h or 12h)

by taking into account observations taken within the same window and some prior information.

• In the linear case, it is equivalent to a Kalman

smoother initialized with the same prior information used in 4D-Var

• We now discuss an approximation of the Kalman filter based on a set of ensemble members (i.e., based on Monte Carlo techniques)


Ensemble methods

• Consider a random variable x distributed according f(x), with unknown mean m and

variance 2

• From f(x) we can generate a ensemble of N independent realizations of x: x1, x2, …, xn.

• We can estimate m and 2 via the sample (or ensemble) mean and sample variance s:

• They are both unbiased estimators and their

variance is

1

1 N

i

i

x xN

2 2

1

1( )

1

N

i

i

s x xN

2

Var( )xN

42 2

Var( )1

sN

x


• Gray dots: mean values of N ensemble members drawn from a Gaussian with zero

mean and unit variance, for a total of 100 mean values for each N value

• asterisks: sample mean of 100 means and sample standard deviation of the mean

• Solid line: expected value of the mean (= 0) and (dashed) of the standard deviation

of the mean (= 1/sqrt(n))

Standard deviation of the mean: empirical vs. expected values


Ensemble Kalman filter

• Replace xt with , the mean of an ensemble of

forecasts

• Replace Pf and Pa with Pfe and Pa

e calculated

from an ensemble of forecasts (error ~1/N)

where with, e.g.,

• We define

with

and, e.g.,

{( )( ) } ( )( )f f t f t T f f f f T f

i i i i eE P x x x x x x x x P

ax

1 2( , ,..., ,..., )a a a a

i Nx x x x

{( )( ) } ( )( )a a t a t T a a a a T a

i i i i eE P x x x x x x x x P

1( ) ( ( )) ( )f a

i k i k i kt M t t x x η h

i: N (0,Q)

( ) ( )( ( ) ( ))f T f f f f T f T

e i i i iH H P H x x x x P H

( ) ( ( ) ( ))( ( ) ( ))f T f f f f T f T

e i i i iH H H H HP H x x x x HP H

1( ) (( ) ) ( ( ))a f f T f T o f

i i e e i iH x x P H HP H R y x

y

i

o yo

i

i~ N (0,R)


EnKF features• Unlike the EKF, the EnKF makes use of the full

nonlinear model M: more accurate determination of error growth (and of error saturation)

• The Kalman gain K is computed without the need to

calculate Pf. This is particularly advantageous when observations are processed serially

• No need to linearize the observation operator H

• Each ensemble member can be propagated forward in time independently: highly parallel algorithm

• Model error term can be included as a perturbation of the deterministic forecast


Covariance localization

• When forecast error covariance is misspecified

(e.g., due to neglecting model error, or when N

<< n), it may include spurious correlations

between very distant grid points

• A common solution is to multiply each Pfe

element by an appropriate weight that reduces

long-distance correlations

• This ensures that only the components of Pfe

believed to represent the corresponding

components of Pf accurately are retained


Covariance localization: an example

(a) Pfe (N=25)

(b) Pfe (N=100)

(c) Correlation function with compact support

(d) localized Pfe (N=25)

From Fig. 6.4 of Hamill, 2006


Other issues• In the extra-tropics at large scale the atmosphere is

in near-geostrophic balance. An inappropropriate

(e.g., too narrow) localization may give rise to

unbalanced increments (e.g., between height

gradient and wind increments)

• Analysis update equations are optimal only when

errors are Gaussian (not necessarily the case)

• Unless model error is properly taken into account,

forecast errors may be underestimated

• Computational expense is proportional to number of

observations: problems when obs are “too many”


EnKF variants• To estimate Pa

e correctly, the EnKF algorithm requires the

observations to be treated as random variables: stochastic or

fully-Monte Carlo algorithm

• An alternative strategy consists in requiring Pae to be

consistent with the analysis error covariance prescribed by the

Kalman filter: deterministic algorithms (nonuniqueness).

• An advantage of deterministic algorithms is to avoid

misrepresentation of observation error (due to finite sampling)

statistics

• Deterministic algorithms prescribe the expression of the matrix

of the ensemble member perturbations (from the mean), that

is a square-root (or, more precisely, Cholesky factor) of Pae:

also called square-root algorithms (e.g., EnSRF)


Conclusions

• Ensemble-based algorithms are imposing

themselves as viable alternative to the more

established variational techniques for NWP data

assimilation

• Their easy method of implementation makes it

easier to scientists (including PhD students!) to

contribute to research in data assimilation

• They provide a natural framework to generate

probabilistic forecasts, i.e. forecasts with a

quantitative estimate of their errors

Documents

PowerPoint at Readingdarc/nceo_training/reading2011/ensemble_da.… · kkk K P H HP H R 11 af P I KH P kk a kk k k M w w xx x M x a kk k k H w w xx x H x. NCEO 2 Day Intensive Course