Introduction to Data Assimilation and Applications within …...Introduction to Data Assimilation and Applications within NOAA –Part II NOAA Educational Partnership Program with

Introduction to Data Assimilation and Applications within NOAA – Part II

NOAA Educational Partnership Program with Minority Serving Institutions9th Biennial Science and Education ForumHoward University, Washington, DCMarch 20, 2018

Karina ApodacaColorado State University/Cooperative Institute for Research on the Atmosphere, Ft. Collins, ColoradoNOAA/OAR/AOML/HRD and Global Observing Systems Analysis Group

Given a background field xb before performing the analysis, there is only one vector of errors that separates it from the true state: εb = xb - xt

If we repeat each analysis a large number of times, keeping conditions constant, but with different realizations of errors generated by unknown circumstances, εb will be different each time

εb will depend solely on physical processes responsible for the errors

We can find error statistics: The best information about εb is given by the histogram of statistics, which is a scalar function f(x) = εb whose integral is equal to 1, or the PDF of εb

From this PDF we can derive statistics, such as, the the average (expectation) εb and the variances.

A popular PDF is the Gaussian function

Uncertainty and Probability Density Functions

• Necessary to account for the uncertainty in the observations, background, and analysis in DA

• Errors between the true state vector and the other vectors need to be modeled

• To do this, we need to assume some probability density function (PDF) for each type of error

• For a more rigorous mathematical description of PDF’s, please refer to: http://mathworld.wolfram.com/ProbabilityDensityFunction.html

The relative likelihood that a random variable, or function

(errors, in this case), will take a certain value, or a range between

a and bNote: In Mathematics, a PDF does not have an average or a variance, but in geophysical problems, all PDF’s do. We’ll assume this throughout the course.


http://mathworld.wolfram.com/ProbabilityDensityFunction.html

We can find out the error statistics:

The best information about εb is given by the histogram of statistics

Is a scalar function f(x) = εb whose integral is equal to 1, or the PDF of εb

From this PDF we can derive statistics, such as, the average (expectation) and the variances.

A popular PDF is the Gaussian function


!εo

Other non-Gaussian PDF’s:

- Laplacian (wind speed errors, sharp gradient fields like fronts)

- Skewed: Gamma function (precipitation) Lognormal (humidity, cloud variables)


Background errors

Are estimation errors of the background state

Defined as the differences between the background state vector and its true state

εb = xb – xt

The average is:

The covariance is:

!εo

Modeling of Error Variables

!B = εb −εb( ) εb −εb( )T

Observation errors εo = y – H(xt)

Related to the observation process (instrument, reported values are not close to reality), the observation operator may not be well designed, representativeness errors that prevent xt from being an exact match of the true state.

The average:

The covariance:

!εo

!B = εo −εo( ) εo −εo( )T


Analysis errors

Errors related to the estimation of the analysis

The goal of DA is to minimize those errors

Averages are also called Biases (bad sign)

This indicates systematic errors in the DA system, or observation biases, or errors in the way observations are utilized.

A measure of analysis errors is the trace of the analysis error covariance matrix A:

!!εa = xa − xt

!εa

!! Tr(A)= εa −εa2


Data assimilation concepts and methods

Meteorological Training Course Lecture Series

ECMWF, 2002 11

It is important to understand the algebraic nature of the statistics. Biases are vectors of the same kind as the modelstate or observation vectors, so their interpretation is straightforward. Linear transforms that are applied to modelstate or observation vectors (such as spectral transforms) can be applied to bias vectors.

3.3 Using error covariances

Error covariances are more subtle and we will illustrate this with the background errors (all remarks apply to ob-servation errors too). In a scalar system, the background error covariance is simply the variance, i.e. the root-mean-square (or r.m.s., or quadratic) average of departures from the mean:

In a multidimensional system, the covariances are a square symmetric matrix. If the model state vector hasdimension , then the covariances are an matrix. The diagonal of the matrix contain variances12, for eachvariable of the model; the off-diagonal terms are cross-covariances between each pair of variables of the model.The matrix is positive13. Unless some variances are zero, which happens only in the rather special case where onebelieves some features are perfect in the background, the error covariance matrix is positive definite. For instanceif the model state is tri-dimensional, and the background errors (minus their average) are denoted ,then

The off-diagonal terms can be transformed into error correlations (if the corresponding variances are non zero):

Finally, linear transformations of the model state vector can only be applied to covariances as full matrix trans-forms. In particular, it is not possible to directly transform the fields of variances or standard deviations. If one de-fines a linear transformation by a matrix (i.e. a matrix whose lines are the coordinates of the new basis vectorsin terms of the old ones, so that the new coordinates of the transform of are ), then the covariance matrix interms of the new variables is .

3.4 Estimating statistics in practice

The error statistics (biases and covariances) are functions of the physical processes governing the meteorologicalsituation and the observing network. They also depend on our a priori knowledge of the errors. Error variances inparticular reflect our uncertainty in features of the background or the observations. In general, the only way to es-timate statistics is to assume that they are stationary over a period of time and uniform over a domain14 so that one

12. The square roots of variances are called standard deviations, or standard errors.13. This does not mean that all the matrix elements are positive; the definition of a positive definite matrix is given in Appendix A. The posi-tiveness can be proven by remarking that the eigenvalues of the matrix are the variances in the direction of the eigenvectors, and thus are posi-tive.

B var εb( ) var εb εb–( )2= =

×

1 2 3, ,( )

Bvar 1( ) cov 1 2,( ) cov 1 3,( )cov 1 2,( ) var 2( ) cov 2 3,( )cov 1 3,( ) cov 2 3,( ) var 3( )

=

ρ ,( )cov ,( )

var( )var( )-----------------------------------------=

x xB T

Background (forecast) error covariance: In a scalar system it is the variance or root-mean-square (RMS) average of the departures from the mean:

In a model with multiple dimensions (n), the covariance(s) is a n x n square symmetric matrix.

The diagonal contains variances for each model variable

The other terms in the matrix are cross-covariances between each pair of variables in the model

For a 3D model state with background errors (minus their average) e1, e2, and e3, B will be equal to:

Importance of error covariances

! B = var(εb ) = var(εb −εb )2

The off-diagonal terms can be transformed into error correlations (if corresponding variances are ≠ 0)

,

Linear transformations of the model state vector can only be applied as full matrix transforms

Not possible to directly transform the fields of variances or standard deviations

Analysis update is a linear combination of forecast error covariance singular vectors

Analysis increments are defined in the subspace spanned by forecast error covariance singular vectors

Importance of error covariances



ECMWF, 2002 11











×

1 2 3, ,( )


=

ρ ,( )cov ,( )

var( )var( )-----------------------------------------=

x xB T



ECMWF, 2002 11











×

1 2 3, ,( )


=

ρ ,( )cov ,( )

var( )var( )-----------------------------------------=

x xB T

• A way to approach the analysis problem is by using conditional probabilities

• From Bayes theorem we can relate current probabilities based on prior probabilities

• P(x) is the a priori PDF of the model state before the observations are considered (background PDF)

• P(y) is the PDF of the observations

• The goal of the analysis is to find the maximum • of P(x|y): the maximum probability of the model• state given the observations

Prior and conditional probabilities

The joint PDF of x and y is: P(x|y) = P(y|x) P(x) P(y)

It is the probability that x occurs when y occurs, and vice versa.

If in the analysis procedure we know that one observation was made and we know its value is y , P(y) = 1,

therefore: P(x|y) = P(y|x) P(x)

The analysis PDF equals the background PDF times the observation PDFP(yn|x), yn = new observations

The maximum PDF occurs at y = H(x)

Prior and conditional probabilities

Least-squares analysisStatistical approach in a probabilistic framework

v The goal is to estimate a scalar value x

v Where yi is a realization of a random variable Yi

v Looking for an estimator that satisfies the following:

linear: - to be simpleunbiased:

- to be reasonableof minimum variance: -

for optimal accuracy

This value is the Best Linear Unbiased Estimator (BLUE)

The BLUE can be simplified/modified to obtain the most common algorithms used in DA and in numerical weather prediction

Model problem: statistical approach

Reformulation in a probabilistic framework:I the goal is to estimate a scalar value x

I yi is a realization of a random variable Yi

I One is looking for an estimator (i.e. a r.v.) X̂ that isI linear: X̂ = ↵1Y1 + ↵2Y2 (in order to be simple)I unbiased: E (X̂ ) = x (it seems reasonable)I of minimal variance: Var(X̂ ) minimum (optimal accuracy)

�! BLUE (Best Linear Unbiased Estimator)

E. Blayo - An introduction to data assimilation Ecole GDR Egrin 2014 10/61



















Least-squares analysis

We want to find the best estimate of the analysis of temperature Ta:

!! J(T)=12 Jo(T)+ Jb(T)( ) = 12

(T −To)2σ o1 +

(T −Tb)2σ b1

⎡

⎣⎢⎢

⎤

⎦⎥⎥

For any T, the distance between T and Ta and T and To can be measured through this quadratic relationship:

Least-squares analysis

This equation represents the square of the misfit of a variable (T) from each source of information, weighted by the precision of each estimator. The best estimate of T is Ta

!! J(T)=12 Jo(T)+ Jb(T)( ) = 12

(T −To)2σ o1 +

(T −Tb)2σ b1

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Cost Function or Penalty Function

Minimizing the analysis error variance by finding optimal weights

Model problem: least squares approach

Two di�erent available measurements of a single quantity. Whichestimation for its true value ? �! least squares approach

Example 2 obs y1 = 19�C and y2 = 21�C of the (unknown) presenttemperature x .

I Let J(x) = 12⇥(x � y1)2 + (x � y2)2⇤

I Minx J(x) �! x̂ =y1 + y2

2 = 20�C


Least-squares approach

Suppose there are two different measurements of a quantity. Which estimate is the truth?

y1 = 19oC and y2 = 21oC

Both values are estimates of the unknown present temperature x

Let

Minx

Model problem: least squares approach

Two di�erent available measurements of a single quantity. Whichestimation for its true value ? �! least squares approach

Example 2 obs y1 = 19�C and y2 = 21�C of the (unknown) presenttemperature x .

I Let J(x) = 12⇥(x � y1)2 + (x � y2)2⇤

I Minx J(x) �! x̂ =y1 + y2

2 = 20�C


Documents

Introduction to Data Assimilation and Applications within …...Introduction to Data Assimilation and Applications within NOAA –Part II NOAA Educational Partnership Program with