48
INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski 1 , Milija Zupanski 1 , Arthur. Y. Hou 2 , Sara Q. Zhang 2 , and Christian D. Kummerow 1 1 Colorado State University, Fort Collins, Colorado 2 NASA Goddard Space Flight Center, Greenbelt, Maryland Manuscript version 2, to be resubmitted to J. Atmos. Sci (2 tables, 5 figures) Corresponding author address: Dusanka Zupanski, Cooperative Institute for Research in the Atmosphere/Colorado State University, Fort Collins, Colorado, 80523-1375; E-mail: [email protected]

INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

Embed Size (px)

Citation preview

Page 1: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

INFORMATION THEORY AND ENSEMBLE

DATA ASSIMILATION. PART I: THEORETICAL ASPECTS

Dusanka Zupanski1, Milija Zupanski1, Arthur. Y. Hou2, Sara Q. Zhang2,

and Christian D. Kummerow1

1Colorado State University, Fort Collins, Colorado

2NASA Goddard Space Flight Center, Greenbelt, Maryland

Manuscript version 2, to be resubmitted to J. Atmos. Sci

(2 tables, 5 figures)

Corresponding author address:

Dusanka Zupanski, Cooperative Institute for Research in the Atmosphere/Colorado State

University, Fort Collins, Colorado, 80523-1375; E-mail: [email protected]

Page 2: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

2

Abstract

A general framework to link together information theory and ensemble data assimilation

is proposed. This framework can be used to estimate various information measures (e.g., degrees

of freedom for signal and Shannon entropy reduction) by employing information matrix defined

in ensemble subspace. There are, at least, two major advantages of defining information matrix

in ensemble subspace. The first major advantage is that relatively small dimensions of the

information matrix (equal to the ensemble size) make it possible for a straightforward and

computationally inexpensive calculation of different information measures. The second, which is

an equally important advantage, is that the information matrix employs a flow-dependent

forecast error covariance matrix, defined in terms of time evolving ensemble perturbations. The

flow-dependent forecast error covariance takes into account the impact of the model state time

evolution on the information measures.

In this two-part study, we employ the Maximum Likelihood Ensemble Filter (MLEF)

data assimilation approach in application to the Goddard Earth Observing System Single Column

Model (GEOS-5 SCM). In Part I we define theoretical background for the proposed general

framework, and focus on the impact of ensemble size, covariance localization, and on the

temporal evolution of the information measures. In Part II we evaluate the impact of different

data assimilation approaches on the information measures by comparing Kalman filter and 3-

dimansional variational solutions as two special cases of the MLEF solution.

The results of Part I indicate that it is possible to capture the essential character of the

information measures with a relatively small ensemble size. For example, in applications to the

models similar to the GEOS-5 SCM, 10 ensemble members might be sufficient. The results

Page 3: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

3

additionally indicate that covariance localization generally increases the amount of information

and also improves the data assimilation results. The temporal evolution of the information

measures is found to be in agreement with the true model state evolution, thus indicating that the

information measures are meaningful.

Page 4: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

4

1. Introduction

Novel probabilistic approaches to data assimilation and ensemble forecasting are often

referred to as ensemble data assimilation, or Ensemble Kalman Filter (EnKF) methods. These

methods are considered powerful because of their capability to address both data assimilation

and ensemble forecasting within a consistent mathematical approach. As other advanced data

assimilation methods, such as Kalman Filer (KF) and variational methods, ensemble data

assimilation techniques provide an “optimal” estimate of the atmospheric state using information

from available observations. The “optimal” estimate of the atmospheric state is commonly

defined either as a minimum variance, or as a maximum likelihood solution (e.g., Lorenc 1986;

Cohn 1997). Most of the ensemble data assimilation methods seek a minimum variance solution

(Evensen 1994; Houtekamer and Mitchell 1998; Lermusiaux and Robinson 1999; Hamill and

Snyder 2000; Keppenne 2000; Mitchell and Houtekamer 2000; Anderson 2001; Bishop et al.

2001; van Leeuwen 2001; Reichle et al. 2002a,b; Whitaker and Hamill 2002; Tippett et al. 2003;

Zhang et al. 2004; Ott et al. 2005; Szunyogh et al. 2005; and Peters et al. 2005). There are also

ensemble data assimilation approaches seeking a maximum likelihood solution (e.g., Zupanski

2005; Zupanski and Zupanski 2006; and Fletcher and Zupanski 2006). As explained in Fletcher

and Zupanski (2006), the differences between the two solutions can become significant in data

assimilation problems with errors described by non-Gaussian Probability Density Functions

(PDFs). This paper also points out that cloud variables and their errors would likely follow log-

normal (i.e., non-Gaussian) PDFs. Also, non-linearity of the forecast models and observation

operators would cause the errors involved in data assimilation to depart form the Gaussian

distribution. Nevertheless, in this study we assume that the errors are Gaussian, and thus assume

Page 5: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

5

that the results of this study will also be applicable to other ensemble-based data assimilation

methods, as long as the Gaussian assumption is valid.

It has been recognized that information theory (e.g., Shannon and Weaver 1949; Rodgers

2000) and predictability are inherently related (e.g., Schneider and Griffies 1999; Kleeman 2002;

Roulston and Smith 2002; DelSole 2004; Abramov et al. 2005). Information theory has also

come to the attention of data assimilation, where it has been used to calculate information

content of various observations (e.g., Wahba 1985; Purser and Huang 1993; Wahba et al. 1995;

Rodgers 2000; Rabier et al. 2002; Fisher 2003; Johnson 2003; Engelen and Stephens 2004;

L’Ecuyer et al. 2006).

The information theory has primarily been examined in application to other data

assimilation methods (e.g., variational, KF), while its application to ensemble data assimilation

has been rather limited so far. Some of the pioneering studies in this area are as follows. Bishop

and Toth (1999) and Wang and Bishop (2003) examined the eignevalues and eigenvectors of the

Ensemble Transform Kalman Filter (ETKF, Bishop et al. 2001) transformation matrix and

demonstrated that these eigenvalues and eigenvectors define the amount and the direction of the

maximum forecast error reduction due to information from the observations. Patil et al. (2001),

Oczkowski et al. (2005), and Wei et al. (2006) used the eigenvalues of the ETKF transformation

matrix to define measures of information, referred to as “bred dimension”, “effective degrees of

freedom”, and “E dimension”, respectively. These studies have recognized that ensemble-based

methods have a potential to improve measures of information due to flow-dependent forecast

error covariance matrix, especially in applications to adaptive observations. Building upon the

previous studies, we link the ETKF transformation matrix with the so-called information or

observability matrix, defined in ensemble subspace, and demonstrate how this matrix can be

Page 6: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

6

used to define standard measures of information theory, such as Degrees of Freedom (DOF) for

signal and Shannon entropy reduction (e.g. Rodgres 2000). Thus, we propose a general

framework to link together ensemble data assimilation and information theory in a similar

manner as in variational and KF methods. We evaluate this framework within an ensemble-based

data assimilation method, using a single column precipitation model and simulated observations.

In Part I (this paper) we focus on the impact of ensemble size and covariance localization. In Part

II (the following paper) we compare the results of the KF and the 3-dimensional variational (3d-

var) approaches, defined as special applications of the proposed framework.

Part I is organized as follows. In section 2 the general framework is described. The

experimental design is explained in section 3, and experimental results are presented in section 4.

Finally, in section 5, the conclusions are summarized and their relevance for future research is

discussed.

2. General framework

In this study we employ an ensemble data assimilation approach referred to as Maximum

Likelihood Ensemble Filter (MLEF, Zupanski 2005; Zupanski and Zupanski 2006; Zupanski et

al. 2006). Here we shortly describe the MLEF. The MLEF seeks a maximum likelihood state

solution employing an iterative minimization of a cost function. The solution for a state vector x

(also referred to as control variable), of dimension Nstate, is obtained by minimizing a cost

function J defined as

J(x) =

1

2[x ! xb ]

TPf

!1[x ! xb ]+

1

2[ y ! H (x)]

TR

!1[ y ! H (x)] , (1)

Page 7: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

7

where y is an observation vector of dimension equal to the number of observations (Nobs), and H

is a non-linear observation operator. Subscript b denotes a background (i.e., prior) estimate of x,

and superscript T denotes a transpose. The Nobs ×Nobs matrix R is a prescribed observation error

covariance, and it includes instrumental and representativeness errors (e.g., Cohn 1997). The

matrix Pf of dimension Nstate×Nstate is the forecast error covariance. Note that we employ the

common rank-reduced square-root formulation

Pf = Pf

1

2 Pf

1

2( )T

, where Pf

1

2 is an Nstate×Nens

square-root matrix (Nens being the ensemble size).

Uncertainties of the optimal estimate of the state x are also calculated by the MLEF. The

uncertainties are defined as square roots of the analysis error covariance ( Pa

1

2 ) and the forecast

error covariance ( Pf

1

2 ), both defined in ensemble subspace. The square root of the analysis error

covariance is obtained as

!Pa

12 = pa

1pa2... pa

Nens!" #$!!= Pf

12 (Iens +C )

% 12 , (2)

where Iens

is an identity matrix of dimension Nens× Nens, and pa

i are column vectors representing

analysis perturbations in ensemble subspace. The square root in (2) is calculated via eigenvalue

decomposition of C. It is defined as a symmetric positive semi-definite square root, and therefore

it is unique (e.g., Horn and Johnson 1985, Theorem 7.2.6). This is one of the differences between

the ETKF (e.g., Bishop et al. 2001 and Wang and Bishop 2003) and the MLEF: in the ETKF a

non-symmetric, hence a non-unique square root is chosen [Bishop et al. 2001, Eq. (18b)], while

in the MLEF the unique symmetric square root is used [Zupanski 2005, Eq. (10)]. As a

consequence of using different eigenvectors in (2), the square roots of the analysis error

Page 8: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

8

covariance in the ETKF and the MLEF are different, even for linear observation operators and

identical forecast error covariances. The eigenvalues are, however, identical under these

conditions. Note that, in application to non-linear forecast models (M), the final results of the two

approaches will be different in terms of both the eigenvalues and the eigenvectors, due to

different non-linear updates of Pf

1

2 . Non-uniqueness of the square root filters, such as ensemble

square root filters, is discussed in Tippett et al. (2003).

Matrix C of dimension Nens×Nens is defined as

ZZCT

= ; zi= R

!12H (x + p f

i) ! R

!12H (x) , (3)

where vectors zi are the columns of the matrix Z of dimension Nobs×Nens. Note that, when

calculating zi, a nonlinear operator H is applied to perturbed and unperturbed states x. Vectors

i

fp are columns of the square root of the background error covariance matrix obtained via

ensemble forecasting employing a non-linear forecast model M:

!Pf

1

2 = p f

1p f

2... p f

Nens!" #$! ; p f

i= M (x + pa

i) ! M (x) . (4)

Equations (1)-(3), referred to as analysis equations, are solved iteratively in each data

assimilation cycle, while equation (4), referred to as a forecast equation, is used to propagate in

time the columns of the forecast error covariance matrix Pf

1

2 .

A measure of information content of observations referred to as DOF for signal is often

used in information theory (e.g., Rodgers 2000). It is a number measuring the amount of

Page 9: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

9

independent pieces of information of observations that is above the noise. In data assimilation

applications, DOF for signal (here denoted as ds) is commonly defined in terms of analysis and

forecast error covariances, Paand

Pf , (e.g., Wahba 1985; Purser and Huang 1993; Wahba et al.

1995; Rodgers 2000; Rabier et al. 2002; Fisher 2003; Johnson 2003; Engelen and Stephens

2004) as

ds = tr [Istate ! PaPf

!1] , (5a)

where tr denotes trace, and Istate

is an identity matrix of dimension Nstate× Nstate. Wahba et al.

(1995) define ds in terms of so-called influence matrix A as

ds= tr [R

- 12 HP

aH

TR- 12 ] = tr [A] , (5b)

which is equivalent to (5a), as pointed out by Fisher (2003).

Employing definition of Pa in ensemble subspace (2) and using

tr [xx

T] = tr [x

Tx] we

can write (5b) in ensemble subspace as

ds = tr [(Iens +C )

!1(Pf

12 )T

HT(R

!12 )T

R!12 HPf

12 ] . (6)

Using the non-linear operator H instead of the linear operator H, we can write

R

!12 HPf

12 " R

!12H (x + p f

i) ! R

!12H (x) . (7)

Page 10: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

10

Finally, combining (3), (6), and (7) we have

ds= tr [(I

ens+C )

!1Z

TZ ] = tr [(I

ens+C )

!1C ] . (8)

Definition (8) is essentially the same as Eq. (2.61) of Rodgers (2000). The only difference is that

the trace is obtained employing matrix C of dimension Nens×Nens, while in the formulation of

Rodgers (2000), the trace is obtained employing information matrix of dimension Nstate×Nstate

(the full-rank information matrix). We will denote matrix C as information matrix in ensemble

subspace.

By introducing information matrix C, we have defined a link between information theory

and ensemble data assimilation. Having this link is of special importance for the following

reasons. When calculating information content measures such as ds, a flow-dependent

Pf obtained directly from ensemble data assimilation is used. In addition, eigen-decomposition

of C is easily accomplished due to the relatively small size of this matrix (Nens×Nens). Hence, it is

practical to calculate information content of numerous observations (large Nobs) in applications to

complex models with large state vectors (large Nstate). A possible disadvantage of this approach is

that a small ensemble size might not be sufficient to describe full variability of the forecast error

covariance matrix, which could potentially result in meaningless information measures. One of

the main focuses of this study is the impact of ensemble size on the information measures.

Once the information matrix C is available, various measures of information content can

be calculated. It is especially useful to define these measures in terms of the eigenvalues 2

i! of C.

Thus, as in Rodgers (2000), we can define (8) in terms of 2

i! and calculate ds as:

Page 11: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

11

ds=

!i

2

(1+ !i

2)i

" . (9)

Similarly, Shannon information content h, defined as the reduction of entropy due to

added information from the observations (Shannon and Weaver 1949; Rodgers 2000), can be

calculated using the following formula

! +=

i

ih )1ln(

2

1 2" . (10)

As explained in Rodgers (2000, Section 2.4) the values !i

2" 1 correspond to signal and

conversely !i

2< 1 correspond to noise. Eqs. (3) and (7) indicate that the eigeinvalues !

i

2 depend

on the ratio between the forecast error covariance and the observation error covariance, both

defined in the observation locations. Thus, for the forecast errors larger than the observation

errors we have !i

2" 1 (signal), and for the forecast errors smaller than the observation errors we

have !i

2< 1 (noise). Therefore, one should expect that it is important to properly estimate a flow-

dependent forecast error covariance matrix (e.g., via ensemble-based or KF methods), since it

brings the impact of changing atmospheric conditions on the eigenvalues !i

2 and consequently

on the information measures.

We have explained how the information theory and ensemble data assimilation, in

particular the MLEF approach, can be linked to produce a technique for calculation of various

information content measures, defined in ensemble subspace. An important characteristic of the

Page 12: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

12

MLEF approach is that it can be made identical to KF or variational methods, under special

conditions explained below. This provides an opportunity to directly compare information

measures obtained using different data assimilation approaches.

a. Connection to KF

As shown in Zupanski (2005), a linear version of the MLEF is identical to the KF under

the assumptions of the classical linear KF (e.g., Jazwinski 1970): assuming Gaussian PDFs,

linear models M, and linear observation operators H. Under these assumptions, the solution that

minimizes (1) can be explicitly calculated using the following formula (e.g., Zupanski 2005,

Appendix A, Eq. A7):

x = xb +!Pf H

T(HPf

TH

T+ R)

"1[ y " H (xb )] . (11)

The solution (11) is identical to the KF solution, since the minimization step-size α is equal to 1

for quadratic cost functions (Gill et al. 1981). The MLEF solution will remain identical to the KF

solution through all data assimilation cycles, since the linear version of the forecast error

covariance update equation (4) is the same as the KF update equation. Note that the full-rank

MLEF (Nens=Nstate) is identical to the full-rank KF, while the reduced-rank MLEF (Nens<Nstate) is

related to the reduced-rank KF, under the assumptions of the classical linear KF.

b. Connection to 3d-var

As explained before, the solution obtained by the MLEF is a maximum likelihood one,

and, in general, a non-linear one. These characteristics are shared with variational methods, thus

there is a connection to these methods as well. The full-rank non-linear MLEF solution without

Page 13: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

13

the update of the forecast error covariance [i.e., using a prescribed covariance instead of Eq. (4)]

is identical to a 3d-var solution, since the same cost function (1) is minimized. To obtain

identical results, one can employ the same minimization method with the same Hessian

preconditioning in both the MLEF and the 3d-var (e.g., Zupanski 2005). In practice, however, it

is not always feasible to employ the perfect Hessian preconditioning in variational methods due

to large dimensions of the full-rank covariance matrices.

As explained before, the general framework proposed here should be directly applicable

to KF and 3d-var methods, as long as it remains practical to evaluate full-rank covariance

matrices. In cases when full-rank matrices are too large for practical evaluations, the information

matrix of reduced order, defined in ensemble subspace (Eq. 3), can be used as a more practical

tool to define information measures. Note that calculation of the information measures (9) and

(10) is straightforward within the ETKF and the MLEF ensemble-based approaches, since the

eigenvalues of (3) are explicitly calculated within these algorithms. In other EnKF approaches

the additional calculation of the information matrix C and its eigenvalues would have to be

included.

There are, however, some restrictions to the proposed general framework. For example,

when deriving information measures (e.g., DOF for signal and entropy reduction) we have

assumed, as in Rodgers 2000, that all errors are Gaussian. Therefore, we have implicitly

assumed weak nonlinearity in M and H, even though ensemble-based and variational methods do

not necessarily require this assumption. Consequently, the information measures obtained in

highly non-linear data assimilation problems, and also for variables that are typically non-

Gaussian (e.g., humidity and cloud microphysical variables) could be incorrect, or only

approximately correct. A theoretical framework for information measures employing non-

Page 14: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

14

Gaussian ensembles is proposed in Majda et al. (2002) and Abramov and Majda (2004). They

have employed a different approach, based on the moment constraint optimization, to estimate

so-called “predictive utility”, which is an information measure derived from the Shannon

entropy. The framework proposed here could be further generalized following Majda et al.

(2002) and Abramov and Majda (2004). These generalizations are beyond the scope of this

study, and will be addressed in the future.

Additional potential restriction of the proposed approach is a possibility that the

calculated information measures could be underestimated in the experiments with small

ensemble size when assimilating many observations. To simulate this situation we examine

information measures in the experiments with a relatively small number of ensemble members

(10 ensemble members) and a relatively large number of observations per data assimilation cycle

(40 and 80 observations).

3. Experimental design

a. Forecast model

A single column version of the GEOS-5 Atmospheric General Circulation Model

(AGCM) is used in this study. Previous experience employing column versions of the GEOS-

series within a 1-dimensional variational data assimilation technique indicated that the 1-

dimensional framework could produce useful data assimilation results, especially in applications

to rainfall assimilation (Hou et al. 2000, 2001, 2004).

Page 15: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

15

The GEOS-5 SCM consists of the model physics components of the GEOS-5 AGCM:

moist processes (convection and prognostic large-scale cloud condensation), turbulence,

radiation, land surface, and chemistry. The dynamic advection is driven by prescribed forcing

time series. The column model is capable of updating all the prognostic state variables and

evaluating of a suite of additional observable quantities such as precipitation and cloud

properties. The GEOS-5 SCM retains most of the non-linear complexities and interaction

between physical processes as in the full AGCM. In the meanwhile, it has the advantage of

reduced dimensions when it is used in the research experiments of ensemble data assimilation.

b. Control variable, observations

In the applications of this paper we focus on using simulated observations directly on two

state variables: temperature (T) and specific humidity (q) vertical profiles. They are also the

control variables for data assimilation. In the experiments presented, 40 model levels are used.

Thus, the dimension of the control vector is 80. The column model only updates temperature and

specific humidity during a data assimilation interval. Remaining state variables, along with the

advection forcing, are prescribed by the Atmospheric Radiation Measurement (ARM) data time

series. The Tropical Western Pacific site (130E, 15N) in ARM observation program is chosen for

the application discussed in this paper. The assimilation experiments cover the period from 7

May 1998 to 24 May 1998 (17 days).

Data assimilation interval of 6 hours is used in the experiments, and simulated

observations of temperature and specific humidity are assimilated at the end of each data

assimilation interval. Simulated observations are defined using the “true” state, defined by the

Page 16: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

16

GEOS-5 SCM, and by adding a Gaussian white random noise to the “true” state. Thus, the

observation error covariance matrix R is assumed diagonal and constant in time. We use the

same version of the model to perform data assimilation and to create observations, thus we

assume that the model is perfect. In experiments with real observations the perfect model

assumption might not be justified. In order to relax this assumption one can use some of the

recently proposed model error estimation approaches (e.g., Heemink et al. 2001; Mitchell et al.

2002; Reichle et al. 2002a; Zupanski and Zupanski 2006).

The observations are created assuming instrumental error for T of 0.2 K at all model

levels ( Rinst

1 2= 0.2K ). The instrumental errors for q vary between

Rinst

1 2= 6.1*10

!8 and

Rinst

1 2= 7.9 *10

!4 ; the errors are defined to decrease from the lowest to the highest model level.

The total observation errors are defined as R1 2

= !Rinst

1 2 , where an empirical parameter α>1 is

employed to approximately account for representativeness errors. To approximately account for

the reduced variability in the forecast error covariance due to small ensemble size the parameter

α increases with decreasing ensemble size. The values of the parameter α are tuned to the

ensemble size to approximately satisfy the expected chi-square innovation statistic, calculated for

optimized innovations and normalized by the analysis error (e.g., Dee et al. 1995; and Menard et

al. 2000; Zupanski 2005). The instrumental errors and the values of the parameter α used in data

assimilation experiments of this study are listed in Table 1.

Initial conditions for T and q at the beginning of the first data assimilation cycle are from

ARM observations of T and q at the time (0000 UTC 07 May 1998), interpolated from

observation levels to the model levels. With this configuration the errors in initial conditions are

simulated by the difference between ARM observations and the “true” states defined by the

model simulation (started from 1800 UTC 06 May 1998 and integrated for 6 hours to 0000 UTC

Page 17: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

17

07 May1998). This has resulted in Root Mean Square (RMS) errors of 0.46 K for Tb and

4.8×10-4 for qb in the first data assimilation cycle (recall that subscript b denotes background

values). In all subsequent cycles, 6-h forecast of T and q from the previous cycle is used to

define the background for the current cycle.

c. Ensemble perturbations

Ensemble perturbations i

fp that are used to define forecast error covariance Pf

1

2 are

prescribed in the first data assimilation cycle (cold start); in the subsequent cycles the data

assimilation scheme updates i

fp by using the analysis perturbations i

ap and by running

ensembles of forecasts (4). The cold start ensemble perturbations are defined using Gaussian

white noise with prescribed standard deviation of comparable magnitude to the observations

errors. A compactly supported second-order correlation function of Gaspari and Cohn (1999),

with decorrelation length of 3 vertical layers, is applied to the random perturbations to define a

correlated random noise (e.g., Zupanski et al. 2006). The decorrelation length of 3 vertical layers

was determined empirically, based on overall best data assimilation performance of all

experiments of this two-part study.

d. Minimization

A conjugate gradient minimization algorithm (e.g., Luenberger 1984), with line-search

defined as in Navon et al. (1992), and with Hessian preconditioning (Zupanski 2005) is used in

the experiments of this paper. In all data assimilation experiments, only a single iteration of the

Page 18: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

18

minimization is performed, which is sufficient for linear observation operators (Zupanski 2005).

Note that non-linearity of the forecast model M, even though it influences the final data

assimilation results, it does not influence the minimization results within a filter formulation.

This would be, however, different for a smoother application, since the non-linear model would

influence the minimization results.

e. Covariance localization

Covariance localization is often used in ensemble-data assimilation applications to better

constrain the data assimilation problems with either insufficient observations or insufficient

ensemble size (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill

2002). The localization was also found beneficial in the full-rank KF filter applications due to

spurious loss of variance in the discrete KF covariance evolution equation (e.g., Menard et al.

2000). Since covariance localizations are typically achieved by employing arbitrary covariance

functions (e.g., Gaspari and Cohn 1999) it is important to evaluate if such localizations could

unrealistically change information measures.

Covariance localization is applied in a set of data assimilation experiments of this paper

to assess its impact on the information content measures. We use a common localization

technique (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill 2002)

based on Schur (element-wise) product between the covariance matrix and a compactly

supported covariance function. Since the localization increases the number of degrees of

freedom, the Nens leading eigenvalues and eigenvectors of the localized forecast error covariance

are selected after localization. We have employed the compactly supported second-order

Page 19: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

19

correlation function of Gaspari and Cohn (1999), with decorrelation length of 3 vertical layers.

Recall that in data assimilation experiments without localization we also employ the same

correlation function, with the same parameters, but to define correlated random noise in the cold

start, as explained in sub-section 3c. Using the same correlation function ensures maximum

compatibility between different data assimilation experiments.

4. Results

a. Verification summary

We have performed extensive verifications of all data assimilation experiments listed in

Table 1 in terms of analysis and background errors and the chi-square innovation statistic tests

(e.g., Dee et al. 1995; and Menard et al. 2000; Zupanski 2005). The verification summary is

given in Table 2. The RMS errors of the analysis and the 6-h forecast (background) are

calculated with respect to the truth as mean values over 70 consecutive data assimilation cycles.

The mean values and the standard deviations of the chi-square statistic are calculated over 70

data assimilation cycles from the chi-square statistic values obtained in the individual data

assimilation cycles. Note that ergodic hypothesis was made when calculating the mean chi-

square values: sample mean was replaced by time mean, calculated over 70 data assimilation

cycles.

The results in Table 2 indicate that the RMS errors are increasing as the ensemble size

decreases (from 80 to 10), and also as the number of observation decreases (from 80 to 40),

which is an expected performance. The analysis and background errors of all experiments are

Page 20: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

20

smaller than the errors of the experiment without data assimilation (no_obs), thus indicating a

positive impact of data assimilation. The analysis errors of the experiments with 80 observations

are within the estimated total observation errors (note that the total observation errors also

include empirical represenatitventess errors).

Table 2 also indicates that covariance localization generally reduces analysis and

background errors. Exceptions are some experiments with 20 ensembles (e.g., RMS Ta=0.63 K

vs. RMS Ta=0.54 K) and 40 ensembles (e.g., RMS qa=3.98*10-4 vs. RMS qa=3.74*10-4). It is not

surprising that covariance localization could sometime have an adverse impact on data

assimilation results due to an arbitrary decorrelation length imposed on the forecast error

covariance (also discussed in Houtekamer and Mitchel 2001; and Zhang et al. 2006).

The mean values of the chi-square statistic indicate that the experiments without

localization are within less than 20% difference from the expected value of 1, with standard

deviations within 15%-31%. Larger departures from the expected chi-square statistic are

obtained in the experiments with covariance localization. The mean chi-square values are 21%-

51% different from the expected value of 1. Standard deviations are in the range of 13%-38%.

Note that the mean chi-square values larger (smaller) than 1 indicate an underestimation

(overestimation) of the forecast error variance, which would result in underestimation

(overestimation) of the information measures. One should, however, expect departures from the

expected chi-square statistic, since the Gaussian assumption is not strictly valid due to non-

linearity of the forecast model. The chi-square values calculated in individual data assimilation

cycles indicated no time increasing or decreasing trends, meaning that all data assimilation

experiments had stable filter performance (figure not included). Based on the stable filter

Page 21: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

21

performance, we will assume that chi-square values in Table 2 are acceptable, which is

admittedly a subjective assumption.

b. DOF for signal and entropy reduction

1) IMPACT OF ENSEMBLE SIZE

Information measures ds (DOF for signal) and h (entropy reduction), calculated in data

assimilation experiments with 80 observations, are shown as functions of data assimilation

cycles in Figs. 1a and 1b. Comparison of Figs. 1a and 1b indicates that both information

measures have similar variability with time; however, the amplitude of variability of h is larger.

Note that, by definition, ds cannot exceed ensemble size Nens, since matrix C has Nens

eigenvalues. The entropy reduction h, however, can be greater that Nens. As seen in Figs. 1a and

1b, the experiments with larger ensemble size typically have larger values of the information

measures, and vice versa. Assuming that the full-rank experiment produces the information

measures close to the truth, we can notice that the true information content is underestimated in

the reduced-rank experiments. Important observation is, however, that all experiments show

similar time variability of the information measures (the lines approximately follow each other).

Thus, even though small ensembles sizes could result in underestimation of the true information

measures, comparing information measures within the same ensemble size could produce

meaningful results. This is an indication that even a small ensemble size (e.g., 10 ensembles) is

sufficient to capture basic variability of the information measures.

2) IMPACT OF COVARIANCE LOCALIZATION

Page 22: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

22

In this subsection we evaluate the impact of covariance localization on the information

measures focusing on the experiments with insufficient number of observations (40

observations) and with a relatively small ensemble size (10 ensemble members). In Fig. 2, DOF

for signal, obtained in the experiments with and without localization are plotted as functions of

data assimilation cycles. Both Fig. 2a (with 10 ensemble members) and Fig. 2b (with 80

ensemble members) indicate, in general, an increased amount of information due to localization.

This is not surprising, since covariance localization introduces extra DOF to the data assimilation

system (e.g., Hamill et al. 2001), but the total number of DOF cannot exceed 40, which is the

maximum number of independent pieces of information in the example in Fig. 2a,b. An

important observation is that the localization does not change the essential character of the

information measures (the lines with and without covariance localization are approximately

parallel). Important to note is that 10 ensemble members capture the essential temporal

variability of the information measures obtained in the experiment with 80 ensemble members,

which we consider close to the truth. There are some cases, however, with large disagreements

between the two experiments (e.g., there is a notable departure between the two lines around

cycle 56 in Fig. 2a). In such cases the experiment with localization (ds_10ens_loc) is in better

agreement with the “true” information content obtained with 80 ensemble members (Fig. 2b).

Note, however, that due verifications over a single column, it is likely to get shifted maximums

and minimums by a single point, even under similar experimental conditions.

3) TEMPORAL VARIABILITY OF THE INFORMATION MEASURES

Page 23: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

23

As seen in Figs. 1 and 2, the information measures have peculiar time variability. One

can observe that the information measures have a maximum in the first data assimilation cycle,

and there is a steep decrease in the following data assimilation cycles. After the initial period

(lasting up to five data assimilation cycles), the information measures vary with time in a

seemingly random way. There are, however, two pronounced local maximums in the later cycles,

located around cycles 40 and 50 (the exact locations of the maximums vary between different

experiments). Interestingly enough, our experience from other data assimilation applications,

using different forecast models and observations, has also indicated that the information

measures have very similar time variability in the initial data assimilation cycles, while in the

later cycles the variability was dependent on a particular application. This is an indication that

the local maximum in the first cycle is likely a consequence of the initially prescribed forecast

error covariance matrix, which is not dependent on the model state evolution. Conversely, the

local maxima or minima in the later cycles are likely influenced by the evolving model state. In

the following text, we examine if there is a correlation between the information measures in Figs.

1 and 2 and the model state evolution.

True T, true q, observed T, and observed q are shown, respectively in Figs. 3a, b, c, and d

as functions of data assimilation cycles and model vertical levels. One can observe rapid, front-

like, time-tilted changes in both temperature and humidity around cycles 40 and 50. Comparison

with Figs. 1 and 2 indicates that the two local maxima in the information measures are also

observed around the same data assimilation cycles. One can also observe correlations between

additional smaller local maxima in Figs. 1 and 2 and rapid changes in Fig. 3, though, the rapid

changes are more pronounced in the humidity than in the temperature field. It is, therefore,

evident that the time evolution of the information measures in the later cycles is in agreement

Page 24: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

24

with the true model state time evolution. It is not obvious, however, if the maximum in the

information measures in the first data assimilation cycle is correlated with the true model state

evolution. We will examine this issue further in Part II of this study.

One can also observe in Fig. 3 more variability in the observations than in the

corresponding model generated true fields, especially for the specific humidity field (Figs. 3b

and 3d). This is a manifestation of representativeness error, introduced by randomly perturbing

the model state variables when creating simulated observations. Recall that we have

approximately accounted for the impact of the representativenes error through the empirical

parameter α (Table 1).

Another way to look at the information measures is to compare the time evolution of the

information measures with the time evolution of the errors obtained in the experiments with and

without data assimilation. In the example shown in Fig. 4, we can compare the analysis errors of

the best data assimilation experiment (the full-rank experiment with 80 observations) with the

errors of the experiment without assimilation (no_obs). As the figure indicates, the largest errors

in both T and q of the experiment without data assimilation are associated with the abrupt

changes in the true model state around cycles 40 and 50 and also with the local maxima in the

information measures. The largest errors are reduced by the greatest amount in the analysis,

which indicates a highly efficient use of observed information (e.g., Daley 1991; Wang, and

Bishop 2003), and also confirms that the information measures are meaningful.

c. Eigenvalue spectrum of the information matrix

Page 25: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

25

In the previous subsections we have examined information measures defined in terms of

a single parameter, such as ds or h. Since the full spectrum of the eigenvalues of C is also

available, this spectrum can also be evaluated as a more detailed information measure than a

single parameter. Here, we examine the eigenvalues of the matrix 21

)(!

+CIens

. This particular

matrix is chosen because it measures the impact of observations on the analysis error reduction

[see definition of Pa

1

2 in (2)]. In addition, the eigenvalues of 21

)(!

+CIens

are on the interval [0,1],

which is a convenient property when comparing different experiments. Note that the eigenvalues

equal to 0 indicate maximum possible information, while the eigenvalues equal to 1 indicate no

information.

The eigenvalues (1+ !i

2)"1

2 of matrix 21

)(!

+CIens

calculated in the full-rank experiment

with 80 observations and with 40 observations without localization are plotted in Fig. 5 as

functions of the eigenvalue rank. We focus here on comparing the spectrums of the eigenvalues

for the data assimilation cycles with similar values of ds, to examine if the spectrum could

potentially offer additional information. Thus, we have selected cycles 3, 4, 12, and 37 as similar

cycles, according to Figs. 1a and 2b. Let us now compare the eigenvalue spectrum in Fig. 5a for

cycles 4 and 12. Note that the two cycles have comparable values of ds (e.g., ds =11.77 in cycle 4

and ds =11.58 in cycle 12 for experiment ds_80ens in Fig. 1a). Fig. 5a indicates that, even for the

values of ds that are reasonably close to each other, one can obtain notably different distributions

of the eigenvalues. For example, the eigenvalue spectrum in cycle 4 is flatter over a larger

portion of the spectral domain, compared to the eigenvalue spectrum in cycle 12. On the other

hand, the eigenvalue spectrums of cycles 3 and 4 in Fig. 5a are both flat, and thus are more

similar, even though there is a larger difference in the values of ds (e.g., ds =15.28 in cycle 3 and

ds =11.77 in cycle 4). Thus, similarity between cycles 3 and 4 is not so much due to similar

Page 26: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

26

values of ds, rather it is due to similar eigenvalue spectrums. Our experience is that typically the

first five data assimilation cycles have a flatter spectrum than the later cycles. When eigenvalues

are close together the corresponding eigenvectors become linearly dependent (Golub and van

Loan 1989), which is an indication that the ensemble members are not used effectively in the

initial cycles. This is not surprising, since the adjustment of the prescribed forecast error

covariance is taking place during the initial data assimilation cycles.

By comparing Figs. 5a and 5b, we can observe similar spectrums for corresponding data

assimilation cycles with the difference that the experiment with 40 observations produces more

eigenvalues equal to 1 (with no information) than the experiment with 80 observations, reflecting

the fact that 40 observations cannot bring more than 40 pieces of independent information. The

upper limit of 40 pieces of information does not imply that the experiment with 40 observations

should necessarily always have smaller amount of information than the experiment with 80

observations. Note that the information content depends on the ratio between the forecast error

and the observation error covariance, not only on the number of observations. Finally, we can

conclude that evaluations of the eigenvalue spectrum of the information matrix can provide

additional information not present in the parameters ds and h.

Page 27: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

27

6. Conclusions

In Part I of this two-part study, we have proposed a general framework to link together

information theory and ensemble data assimilation. We have evaluated this framework in

application to the GEOS-5 SCM and simulated observations, employing ARM observations as

forcing. In this part of the study, we have focused on the impact of ensemble size, covariance

localization, and on the temporal evolution of the information measures.

Experimental results indicated that, even though larger ensemble size is desirable for

improved data assimilation results, the essential character of the information measures could still

be captured with a relatively small ensemble size (10 ensemble members in our experiments).

This follows from the fact that the information measures have indicated similar trends of increase

or decrease with time in the experiments with different ensemble sizes. The information matrix

in ensemble subspace can be, therefore, used in cases when the full-rank information matrix is

impractical to evaluate.

Experimental results also indicted that the temporal evolution of the information

measures is in agreement with the true model state evolution, which is an indication that the

flow-dependent forecast error covariance played a proper role in the definition of the flow-

dependent information matrix.

The impact of covariance localization was found beneficial: it generally improved data

assimilation results and also increased the information content of data, without introducing

unrealistic changes to the temporal evolution of the information measures.

Page 28: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

28

The encouraging results of this study indicate that it is indeed advantageous to have a

unified framework involving information theory and ensemble data assimilation. Availability of

the eigevalue spectrum of this matrix is an additional benefit, since it can provide more detailed

information content measures. Further evaluations of the proposed approach, employing complex

atmospheric models and various observations are still needed, and are planned for the near

future.

The proposed framework is applicable to different data assimilation approaches,

including classical KF and 3d-var approaches. The impact of different data assimilation

approaches on the information measures is examined in Part II of this study.

Page 29: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

29

Acknowledgements

We thank Chris Snyder and two anonymous reviewers for their comments that helped to

significantly clarify and improve this paper. The first author would also like to thank Graeme

Stephens, Christine Johnson, and Stephane Vannitsem for constructive discussions regarding

information content measures. This research was supported by NASA grants: 621-15-45-78,

NAG5-12105, and NNG04GI25G.

Page 30: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

30

References:

Abramov, R, and A. Majda, 2004: Quantifying uncertainty for non-Gaussian ensembles in

complex systems. SIAM J. Sci. Stat. Comp., 26, 411-447.

Abramov, R., A. Majda and R. Kleeman, 2005: Information theory and predictability for low-

frequency variability. J. Atmos. Sci., 62, 65–87.

Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation. Mon. Wea. Rev., 129,

2884–2903.

Bishop, C. H., B. J. Etherton, and S. Majumjar, 2001: Adaptive sampling with the ensemble

Transform Kalman filter. Part 1: Theoretical aspects. Mon. Wea. Rev., 129, 420–436.

Bishop, C. H., and Z. Toth, 1999: Ensemble transformation and adaptive observations. J. Atmos.

Sci., 56, 1748–1765.

Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75, 257–288.

Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.

Dee, D., 1995: On-line estimation of error covariance parameters for atmospheric data

assimilation. Mon. Wea. Rev., 123, 1128–1145.

DelSole, T., 2004: Predictability and information theory. Part I: Measures of predictability. J.

Atmos. Sci., 61, 2425–2440.

Engelen, R. J., and G. L. Stephens, 2004: Information Content of Infrared Satellite Sounding

Measurements with Respect to CO2. J. Appl. Meteor. 43, 373–378.

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using

Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, (C5), 10143-

10162.

Fisher, M., 2003: Estimation of entropy reduction and degrees of freedom for signal for large

variational analysis systems. ECMWF Tech. Memo. No. 397. 18 pp.

Page 31: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

31

Fletcher, S.J., and M. Zupanski, 2006: A data assimilation method for lognormally distributed

observational errors. Q. J. Roy. Meteor. Soc. (in press).

Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three

dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757.

Gill, P. E., W. Murray, and M. H. Wright, 1981: Practical Optimization. Academic Press, 401

pp.

Golub, G. H., and C. F. van Loan, 1989: Matrix Computations. 2d ed. The Johns Hopkins

University Press, 642 pp.

Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter/3D-variational analysis

scheme. Mon. Wea. Rev., 128, 2905–2919.

Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background

error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–

2790.

Heemink, A. W., M. Verlaan, and A. J. Segers, 2001: Variance reduced ensemble Kalman

filtering. Mon. Wea. Rev., 129, 1718–1728.

Horn, R. A., and C. R. Johnson, 1985: Matrix Analysis. Cambridge University Press, 561 pp.

Hou, A. Y., S. Q. Zhang, A. da Silva and W. Olson, 2000: Improving assimilated global datasets

using TMI rainfall and columnar moisture observations. J. Climate., 13, 4180–4195.

Hou, A. Y., S. Q, Zhang, A. da Silva, W. Olson, C. Kummerow, and J. Simpson, 2001:

Improving global analysis and short-range forecast using rainfall and moisture

observations derived from TRMM and SSM/I passive microwave sensors. Bull. Amer.

Meteor. Soc., 81, 659–679.

Hou, A. Y., S. Q. Zhang, and O. Reale, 2004: Variational continuous assimilation of TMI and

Page 32: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

32

SSM/I rain rates: Impact on GEOS-3 hurricane analyses and forecasts. Mon. Wea. Rev.,

132, 2094–2109.

Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter

technique. Mon. Wea. Rev., 126, 796–811.

Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for

atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137.

Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.

Johnson, C., 2003: Information content of observations in variational data assimilation. Ph.D.

thesis, Department of Meteorology, University of Reading, 218 pp. [Available from

University of Reading, Whiteknights, P.O. Box 220, Reading, RG6 2AX, United

Kingdom.]

Keppenne, C., 2000: Data assimilation into a primitive-equation model with a parallel ensemble

Kalman filter. Mon. Wea. Rev., 128, 1971–1981.

Kleeman, R, 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci.,

59, 2057–2072.

Lermusiaux, P. F. J., and A. R. Robinson, 1999: Data assimilation via error subspace statistical

estimation. Part I: Theory and schemes. Mon. Wea. Rev., 127, 1385–1407.

L’Ecuyer, T. S., P. Gabriel, K. Leesman, S. J. Cooper, and G. L. Stephens. 2006: Objective

assessment of the information content of visible and infrared radiance measurements for

cloud microphysical property retrievals over the global oceans. Part i: liquid clouds. J.

Appl. Meteor. Climat., 45, 20–41.

Lorenc, A. C., 1986: Analysis methods for numerical weather prediction. Quart. J. Roy. Meteor.

Soc., 112, 1177–1194.

Luenberger, D. L., 1984: Linear and Non-linear Programming. 2d ed. Addison-Wesley, 491 pp.

Page 33: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

33

Menard, R., S. E. Cohn, L.-P. Chang, and P. M. Lyster, 2000: Assimilation of stratospheric

chemical tracer observations using a Kalman filter. Part I: Formulation. Mon. Wea. Rev.,

128, 2654–2671.

Mitchell, H. L, and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea.

Rev., 128, 416–433.

Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-

error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 2791–2808.

Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an

adiabatic version of the NMC spectral model. Mon. Wea. Rev., 120, 1433–1446.

Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanism for the development of locally

low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135-1156.

Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza, M., Kalnay, E.,

Patil, D. J. and Yorke, J. A. 2004: A local ensemble Kalman filter for atmospheric

data assimilation. Tellus, 56A, 273-277.

Patil, D. J., B. R. Hunt, E. Kalnay, J.A. Yorke, and E. Ott, 2001. Local low dimensionality of

atmospheric dynamics. Phys. Rev. Lett., 86, 5878-5881.

Peters, W., J.B. Miller, J. Whitaker, A.S. Denning, A. Hirsch, M.C. Krol, D. Zupanski, L.

Bruhwiler, and P.P. Tans, 2005: An ensemble data assimilation system to estimate

CO2 surface fluxes from atmospheric trace gas observations. J. Geophys. Res. 110,

D24304, doi:10.1029/2005JD006157.

Purser, R.J., and H.-L. Huang, 1993: Estimating effective data density in a satellite retrieval or an

objective analysis. J. Appl. Meteorol., 32, 1092–1107.

Rabier F., N. Fourrie, C. Djalil, and P. Prunet, 2002: Channel selection methods for Infrared

Page 34: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

34

Atmospheric Sounding Interferometer radiances. Quart. J. Roy. Meteor. Soc., 128, 1011–

1027.

Reichle, R. H., D. B. McLaughlin, D. Entekhabi, 2002a: Hydrologic data assimilation with the

ensemble Kalman filter. Mon. Wea. Rev., 130, 103–114.

Reichle, R.H., J.P. Walker, R.D. Koster, and P.R. Houser, 2002b: Extended versus ensemble

Kalman filtering for land data assimilation. J. Hydrometorology, 3, 728-740.

Rodgers, C. D., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. World

Scientific, 238 pp.

Roulston, M, and L. Smith, 2002: Evaluating probabilistic forecasts using information theory.

Mon. Wea. Rev., 130, 1653–1660.

Schneider, T, and S. Griffies, 1999: A conceptual framework for predictability studies. J.

Climate., 12, 3133–3155.

Shannon, C. E., and W. Weaver, 1949: The Mathematical Theory of Communication. University

of Illinois Press, 144 pp.

Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. A.

Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with

the NCEP global model. Submitted to Tellus, 57A, 528-545.

Tippett, M., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble

square-root filters. Mon. Wea. Rev., 131, 1485–1490.

van Leeuwen, P. J., 2001: An ensemble smoother with error estimates. Mon. Wea. Rev., 129,

709–728.

Wahba, G., 1985: Design criteria and eigensequence plots for satellite-computed tomography. J.

Atmos. Oceanic Technol., 2, 125–132.

Page 35: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

35

Wahba, G., D. R. Johnson, F. Gao, and J. Gong, 1995: Adaptive tuning of numerical weather

prediction models: Randomized GCV in three- and four-dimensional data assimilation.

Mon. Wea. Rev., 123, 3358–3370.

Wang, X., and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman

filter ensemble forecast schemes. J. Atmos. Sci., 60, 1140–1158.

Wei, M., Z. Toth, R.Wobus, Y. Zhu, C.H. Bishop, and X. Wang, 2006: Ensemble Transform

Kalman Filter-based ensemble perturbations in an operational global prediction system at

NCEP, Tellus, 58A, 28-44.

Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed

observations. Mon. Wea. Rev., 130, 1913–1924.

Zhang, F., Z. Meng, and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and

regional-scale data assimilation. Part I: perfect model experiments. Mon. Wea. Rev., 134,

722–736.

Zhang, F., Snyder, C., and Sun J., 2004: Impacts of initial estimate and observation availability

on convective-scale data assimilation with an ensemble Kalman filter. Mon. Wea. Rev.

132, 1238–1253.

Zupanski D. and M. Zupanski, 2006: Model error estimation employing an ensemble data

assimilation approach. Mon. Wea. Rev., 134, 1337-1354.

Zupanski D., Zupanski, M., DeMaria, M., Grasso L., Hou, A.Y., Zhang, S., and Lindsey, D.,

2005: Ensemble data assimilation and information theory. Extended abstracts of the AMS

21st Conference on Weather Analysis and Forecasting and AMS 17th Conference on

Numerical Weather Prediction, 1–5 August 2005, Washington, D.C., 4pp.

Zupanski, M., 2005: Maximum Likelihood Ensemble Filter: Theoretical Aspects. Mon. Wea.

Rev., 133, 1710–1726.

Page 36: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

36

Zupanski, M., S. J Fletcher, I. M. Navon, B. Uzunoglu, R. P. Heikes, D. A. Randall, T. D.

Ringler, and D. Daesccu, 2006: Initiation of ensemble data assimilation. Tellus,

58A, 159-170.

Page 37: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

37

Table Captions List

Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number

of observations per data assimilation cycle. The empirical parameter α, varying with ensemble

size, is employed to approximately account for an unknown representativeness error. In the

experiments with suffix “_loc” localization is applied to the forecast error covariance using a

compactly supported second-order correlation function of Gaspari and Cohn (1999) with

decorrelation length of 3 vertical layers. Note that all experiments with localization employ 40

observations, while the experiments without localization employ either 40 or 80 observations.

Experiment denoted no_obs is an experiment without data assimilation.

Table 2. Total RMS errors of the analysis and the background solution, calculated with respect to

the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS

analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and

for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the

experiment with Nens=80 and Nobs=80, and are largest for the experiment without data

assimilation (no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS

errors are highlighted in bold italic. Also shown are the mean values and standard deviations of

the chi-square statistic, calculated over 70 data assimilation cycles.

Page 38: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

38

Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number

of observations per data assimilation cycle. The empirical parameter α, varying with ensemble

size, is employed to approximately account for an unknown representativeness error. In the

experiments with suffix “_loc” localization is applied to the forecast error covariance using a

compactly supported second-order correlation function of Gaspari and Cohn (1999) with

decorrelation length of 3 vertical layers. Note that all experiments with localization employ 40

observations, while the experiments without localization employ either 40 or 80 observations.

Experiment denoted no_obs is an experiment without data assimilation.

Experiment

Nens

(T and q

estimated)

Nobs

(T and q

observed)

Rinst

1 2 for T in

degrees K

Rinst

1 2 for q in kg/kg

(Min; Max errors)

Parameter

α

Localization

10ens_80obs 10 80 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO

20ens_80obs 20 80 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO

40ens_80obs 40 80 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO

80ens_80obs 80 80 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO

10ens_40obs 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO

20ens_40obs 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO

40ens_40obs 40 40 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO

80ens_40obs 80 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO

10ens_loc 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 YES

20ens_loc 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.7 YES

40ens_loc 40 40 0.2 6.1*10-8 ; 7.9*10-4 1.4 YES

80ens_loc 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 YES

no_obs _ 0 _ _ _ _

Page 39: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

39

Experiment RMS Ta

(K)

RMS Tb

(K)

RMS qa

(kg/kg)

RMS qb

(kg/kg)

Chi-square

(mean)

Chi-square

(stddev)

10ens_80_obs 0.45 0.49 3.77*10-4 3.97*10-4 1.11 0.27

20ens_80_obs 0.28 0.35 2.65*10-4 3.08*10-4 0.95 0.20

40ens_80_obs 0.23 0.32 2.26*10-4 2.91*10-4 0.92 0.15

80ens_80_obs 0.21 0.31 2.04*10-4 2.57*10-4 1.06 0.20

10ens_40_obs 0.64 0.68 4.93*10-4 5.08*10-4 1.16 0.31

20ens_40_obs 0.54 0.57 4.07*10-4 4.27*10-4 1.03 0.31

40ens_40_obs 0.51 0.55 3.74*10-4 4.14*10-4 0.84 0.22

80ens_40_obs 0.38 0.40 3.38*10-4 3.42*10-4 0.81 0.20

10ens_loc 0.57 0.58 4.35*10-4 4.51*10-4 1.21 0.34

20ens_loc 0.63 0.58 3.85*10-4 3.87*10-4 1.26 0.38

40ens_loc 0.50 0.49 3.98*10-4 3.97*10-4 0.78 0.17

80ens_loc 0.29 0.38 2.52*10-4 3.23*10-4 0.59 0.13

no_obs 0.82 0.82 6.56*10-4 6.56*10-4 _ _

Table 2. Total RMS errors of the analysis and the background solution, calculated with respect to

the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS

analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and

for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the

experiment with Nens=80 and Nobs=80, and are largest for the experiment without data

assimilation (no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS

errors are highlighted in bold italic. Also shown are the mean values and standard deviations of

the chi-square statistic, calculated over 70 data assimilation cycles.

Page 40: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

40

Figure Captions List

Fig. 1. (a) Degrees of Freedom (DOF) for signal, denoted ds, and (b) entropy reduction, denoted

h, obtained in the experiments with 80 observations (first four experiments in Table 2), shown as

functions of data assimilation cycles. Note that both information measures have similar time

variability, however, the amplitude of variability of h is larger. By definition, ds cannot exceed

ensemble size; conversely, h can be larger than the ensemble size.

Fig. 2. DOF for signal, obtained in the experiments employing 40 observations, with and without

localization, are plotted as functions of data assimilation cycles. The results with 10 ensemble

members are given in (a), and with 80 ensemble members in (b).

Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d)

observed specific humidity, shown as functions of data assimilation cycles and model vertical

levels. Observations defined in each grid point (80 observations) are used in Fig. 3c,d. Units for

temperature are degrees K, and for specific humidity g/kg. Note rapid time-tilted changes in both

temperature and humidity around cycles 40 and 50.

Fig. 4. Analysis errors, calculated with respect to the truth, are shown as functions of data

assimilation cycles and model vertical levels. The results from the best data assimilation

experiment in Table 2 (80_ens_80_obs) are shown in (a) for temperature (in degrees K) and in

(b) for specific humidity (in g/kg). The corresponding errors of the experiment without data

Page 41: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

41

assimilation (no_obs) are given in (c) and (d). The numbers is the upper right corners are total

RMS errors from Table 2.

Fig. 5. Eigenvalue spectrum of 21

)(!

+CIens

calculated in the experiments with 80 ensemble

members, without localization using in (a) 80 observations, and in (b) 40 observations. The

eigenvalues are shown as functions of the eigenvalue rank. The eigenvalues equal to 1 indicate

maximum information, and the eigenvalues equal to 0 indicate no information.

Page 42: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

42

(b)

(a)

Page 43: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

43

Fig. 1. (a) Degrees of Freedom (DOF) for signal, denoted ds, and (b) entropy reduction, denoted

h, obtained in the experiments with 80 observations (first four experiments in Table 2), shown as

functions of data assimilation cycles. Note that both information measures have similar time

variability, however, the amplitude of variability of h is larger. By definition, ds cannot exceed

ensemble size; conversely, h can be larger than the ensemble size.

Page 44: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

44

Fig. 2. DOF for signal, obtained in the experiments employing 40 observations, with and without

localization, are plotted as functions of data assimilation cycles. The results with 10 ensemble

members are given in (a), and with 80 ensemble members in (b).

(a)

(b)

Page 45: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

45

Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d)

observed specific humidity, shown as functions of data assimilation cycles and model vertical

levels. Observations defined in each grid point (80 observations) are used in Fig. 3c,d. Units for

temperature are degrees K, and for specific humidity g/kg. Note rapid time-tilted changes in both

temperature and humidity around cycles 40 and 50.

(a) T true

(c) T obs

(b) q true

(d) q obs

Vert

ical

leve

ls

Data assimilation cycles

Page 46: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

46

Fig. 4. Analysis errors, calculated with respect to the truth, are shown as functions of data

assimilation cycles and model vertical levels. The results from the best data assimilation

experiment in Table 2 (80_ens_80_obs) are shown in (a) for temperature (in degrees K) and in

(b) for specific humidity (in g/kg). The corresponding errors of the experiment without data

assimilation (no_obs) are given in (c) and (d). The numbers is the upper right corners are total

RMS errors from Table 2.

(c) T no_obs

(a) T, 80 ens, 80 obs

Data assimilation cycles

Vert

ical

leve

ls

RMS=0.21K

RMS=0.82K

(d) q no_obs

(b) q, 80 ens, 80 obs

RMS=2.04*10-4

RMS=6.56*10-4

Page 47: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

47

Fig. 5. Eigenvalue spectrum of 21

)(!

+CIens

calculated in the experiments with 80 ensemble

members, without localization using in (a) 80 observations, and in (b) 40 observations. The

(a)

(b)

Page 48: INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION…cheung/PUB/GEOS5_Part_I_JAS.pdf · INFORMATION THEORY AND ENSEMBLE DATA ASSIMILATION. PART I: THEORETICAL ASPECTS Dusanka Zupanski1,

48

eigenvalues are shown as functions of the eigenvalue rank. The eigenvalues equal to 1 indicate

maximum information, and the eigenvalues equal to 0 indicate no information.