Laboratory in Oceanography: Data and Methods Empirical ...Sundermeyer MAR 550 Spring 2020 1 Laboratory in Oceanography: Data and Methods MAR550, Spring 2020 Miles A. Sundermeyer Empirical

Sundermeyer

MAR 550

Spring 2020 1

Laboratory in Oceanography:

Data and Methods

MAR550, Spring 2020

Miles A. Sundermeyer

Empirical Orthogonal Functions

Sundermeyer

MAR 550

Spring 2020 2

Basics idea of Empirical Orthogonal Functions

EOFs are an orthogonal linear transformation into new coordinate system such

that the greatest variance by any projection of the data is contained in the

first EOF (also called the first principal component), the second greatest

variance on the second EOF, etc.

• Distinguish patterns/noise

• Reduce dimensionality

• Prediction

• Smoothing

• Also known as:

• Principal Component Analysis (PCA)

• Discrete Karhunen-Loeve Transform (KLT)

• The Hotelling Transform

• Proper Orthogonal Decomposition (POD)

• Singular Value Decomposition (SVD) …

Empirical Orthogonal Functions (EOFs)

Principal Component Analysis (PCA)

Sundermeyer

MAR 550

Spring 2020 3

SVD: Singular Value Decomposition

• Decomposes any [n x p] matrix X into the form:

X = U S V’

where:

• U is a [n x n] orthonormal matrix

• S is a diagnoal [n x p] matrix with si,i elements on the diagonal. The

elements, s, are called singular values.

• The columns of U and V contain the singular vectors of X.

Empirical Orthogonal Functions EOFs via SVD

Sundermeyer

MAR 550

Spring 2020 4

• X is the demeaned data matrix as before.

1) Cx = X’ X = (U S V’)’ (U S V’) = V S’ U’ U S V’ = V S’ S V’

2) Cx = EOFs L EOFs’ (re-written eigenvalue problem)

• Comparing 1) & 2):

– EOFs = V (at least almost)

– L = S’ S: the squared singular values are the eigenvalues

– The columns of V contain the eigenvectors of Cx= X’ X; our EOFs

– The columns of U contain the eigenvectors of X’ X, which is also the

normalized time series.


Sundermeyer

MAR 550

Spring 2020 5

1. Use SVD to find U, S, and V such that X = U S V’

2. Compute the eigenvalues of Cx.

3. The eigenvectors of Cx are the column vectors of V.

• Note: we never have to actually compute Cx!


Sundermeyer

MAR 550

Spring 2020 6

How to in Matlab:

1. Shape your data, e.g., into [time x space]

2. Demean the data: >> X = detrend (X,0);

3. Perform SVD: >> [ U, S, V ] = svd(X);

4. Compute Eigenvalues: >> EVal = diag(S.^2);

5. Compute explained variance: >> expl_var = EVal/sum(EVal);

6. EOFs are the column vectors of V’: >> EOFs = V’;

7. Compute Expansion Coefficients: >> EC = U*S;


Sundermeyer

MAR 550

Spring 2020 7

• There are basically two techniques:

• Computing Eigenvector and Eigenvalues of the Covariance Matrix

• Singular Value Decomposition (SVD) of the data.

• Both Methods give similar results; however:

• There are some differences in dimensionality.

• SVD is much faster – especially for data above 1000 x 1000 points


Sundermeyer

MAR 550

Spring 2020 8


Options for dealing with data gaps:

• Ignore them; leave them be.

• Introduce randomly generated data to fill gaps and test for M realizations

• Fill gaps, e.g., using optimal interpolation

A word about removing the mean

• Removing the mean has nothing to do with the process of finding

eigenvectors, but it allows us to interpret Cx as a covariance matrix, and

hence we can understand our results. Strictly speaking one can find EOFs

without removing the mean.

Sundermeyer

MAR 550

Spring 2020 9

• Example: Coastal Mixing and Optics Shipboard Velocity

time (days)


Sundermeyer

MAR 550

Spring 2020 10


Sundermeyer

MAR 550

Spring 2020 11


Sundermeyer

MAR 550

Spring 2020 12


Sundermeyer

MAR 550

Spring 2020 13


Sundermeyer

MAR 550

Spring 2020 14

North et al. (1982) proposed a “rule of thumb” for deciding whether an EOF

is likely subject to large sampling fluctuations.

For large sample size, N, an approximate 95% confidence interval for the

eigenvalue of the sample covariance matrix is:

Confidence Interval = λi ± 1.96λi p 2/N

Rule: if the confidence interval is comparable to the spacing between

neighboring eigenvalues, then the corresponding eigenvalues will be

strongly affected by sampling fluctuations.

Since the confidence interval scales with λ, the CIs will be equally spaced on

a log scale.


Sundermeyer

MAR 550

Spring 2020 15

Useful Tidbits:

• pca.m - Matlab’s Principal Component Analysis function

References

• R. W. Preisendorfer. Principal component analysis in meteorology and

oceanography. Elsevier. Science, 1988

• Hans v. Storch and Francis W. Zwiers: Statistical Analysis in Climate

Research. Cambridge University Press, 2002.

• North, G.R., T.L. Bell, R.F. Cahalan, and F.J. Moeng, Sampling errors in the

estimation of empirical orthogonal functions, Mon. Wea. Rev., 110, 699-706,

1982.

• Hannachi, A., I. T. Jolliffe and D. B. Stephenson: Empirical orthogonal

functions and related techniques in atmospheric science: A review.

International Journal of Climatology, 27, 1119–1152, 2007.


Sundermeyer

MAR 550

Spring 2020 16

Cautionary note from von Storch and Navarra (2002):

“I have learned the following rule to be useful when dealing with

advanced methods. Such methods are often needed to find a

signal in a vast noisy space, i.e. the needle in the haystack. But

after having the needle in our hand, we should be able to identify

the needle by simply looking at it. Whenever you are unable to do

so there is a good chance that something is rotten in the

analysis.”

Documents

Laboratory in Oceanography: Data and Methods Empirical ...Sundermeyer MAR 550 Spring 2020 1 Laboratory in Oceanography: Data and Methods MAR550, Spring 2020 Miles A. Sundermeyer Empirical