Novel Estimation Aspects for the Application of Maximum … · 2017-03-23 · Novel Estimation Aspects for the Application of Maximum ... The phase ambiguity is therefore an extra

Novel Estimation Aspects for the Application of Maximum

Likelihood Estimation to Time Series of Interferometric Phase

Observations

Jochgem Gunneman

November 23, 2010

Abstract

Radar interferometry is a technique that differences radar images of the Earth. The formed interfer-ogram has magnitude and phase. The phase measures thereby simultaneously topography, surfacedisplacement and possibly other signals.

The multiplicity in which the parameters are available per phase observation results in an underde-termined estimation problem. Moreover, phase observations also have the characteristics that theycan only be measured modulo 2π, and need to be “unwrapped” in order to generate a continuoussignal. The phase ambiguity is therefore an extra parameter to estimate, making the problem evenmore underdetermined. The underdetermined problem forms the challenge to solve.

One way of solving this challenge is to apply Maximum Likelihood Estimation (MLE) to timeseries of phase observations. Within MLE a likelihood function is formed by making use of a-prioriinformation and a time-dependent observation model. The induced redundancy, depending on theobservation model, can result in sufficient redundancy in order to solve the underdetermined problem.

Two independent research groups have already shown that the method is viable for the estimationof the topographical height. They also introduced two different reliability estimators to assess thequality of the parameter estimates.

My research studies several aspects of the estimation process using MLE: quality assessment,hypothesis testing and outlier detection and identification.

First I generalize the mathematical framework of MLE in order to estimate an arbitrary set ofmultiple parameters. Deformation models can therefore easily be incorporated within the observationmodel.

Since the likelihood function is a function of the variables within the observation model, I studytheir influence on the shape of the likelihood function. I find some limits and characteristics that theobservation model variables impose, such as the occurrence of beat frequency-like phenomena andthe ambiguity lengthening within the solution space.

I also discuss parameter and reliability estimation based on likelihood functions. Although param-eter estimation within MLE is done using the global maximum of a likelihood function, I show thatthey can also be based on other characteristics of the likelihood functions. My conclusions are thatboth parameter and reliability estimation can be based on the same characteristics of a likelihoodfunction, resulting in four pairs of parameter and reliability estimators.

In order to investigate the validity of a proposed observation model, I introduce hypothesis testingfor the application of MLE. I show not only that hypothesis testing is important in order to reducethe biases within the parameter estimates, but also that it is easy to apply.

Furthermore I show that it can be important to detect and identify outliers and that the removalof outliers improves the parameter estimates. I also show that outlier detection and identification isdifficult to apply.

The focus of my research that follows is on outlier identification rather than detection. I created fivedifferent outlier identification algorithms. I discuss their working principles, after which I statisticallyanalyze the performance of the outlier identification algorithms. My conclusions are that undercertain circumstances all the outlier identification algorithms are successful: the outlier identificationand removal improve the parameter estimates.

Preface

This is my final Thesis for the Master of Science Track of Earth Observation at the Faculty ofAerospace Engineering, Delft University of Technology.

In my Thesis I study the estimation process using maximum likelihood estimation (MLE) appliedto time series of interferometric phase observations. I found it initially a rather difficult topic, butit also intrigued me as my research expanded. Although it is at this stage too early to state, theestimation process using MLE seems to be very promising and could very well play a prominentrole in radar interferometry in the near future. I suspect that the technique could give additionalinformation in those areas where not sufficient persistent scatterers of significant quality are present.

The Thesis gives insight in the estimation process using MLE, and can best be read in completeform. For anyone that is familiar with the InSAR technique, Chap. 2 can be skipped.

Furthermore, an armada of definitions and concepts are introduced during the Thesis. The chaptersbecome increasingly more difficult to read, and without the knowledge of certain definitions andconcepts the reader might feel lost. I therefore strongly recommend to read the chapters in achronological order.

In all cases I recommended to read the shorthand notation at the beginning of the Thesis.

It has been a turbulent time for me during my Thesis. And I take this opportunity to thank somepeople that have helped me during my Thesis and personal development.

I started my Thesis at the INGV in Rome, which is a very interesting institute that focusses itsresearch on volcanoes and earth quakes. For the experience that I gained over there, I would liketo thank Dr. Fabrizia Buongiorno for giving me this opportunity. Not only did I experience theworking environment of an institute, it also have been given a wealth of experiences to live in Rome.I have tasted the Italian culture, and understand much better now how the Italian society functions.

I also would like to thank Dr. Salvatore Stramondo and Dr. Christian Bignami, for their advice,and the many lunches we have had together.

I wish them, and their family, all the best in their lives.Unfortunately I had to return early to the Netherlands for personal reasons, where I have continued

my Thesis.

3

I would like to thank Dr. Andy Hooper for being my tutor, and working constructively together.It occurred many times that a few mentioned words of him provided me with ideas and solutions tosolve the problems.

I also would like to thank Professor Ramon Hanssen for being my tutor. He has been watching myresearch with care, and provided some feedback as well in this busy period for the radar group.

I also wish to them, and their family , all the best in their lives.

And last but not least, I would like to thank Anke, Guido and Isolde, my family and Marina, mygirl-friend, for their understanding and financial support during my Thesis. Without you I could nothave done it!

Enjoy the reading!

Jochgem GunnemanDelft, 4 November 2010.

4

Contents

1 Introduction 12

2 A-Priori 15

2.1 SAR geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Derivation of the relationship between phase and parameters . . . . . . . . . 15

2.1.2 Side looking effects of a SAR antenna . . . . . . . . . . . . . . . . . . . . . . 19

2.2 On the estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 The height ambiguity and height variance defined . . . . . . . . . . . . . . . 21

2.2.2 Achieving maximum precision . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 Resolving phase ambiguities using spatial phase unwrapping . . . . . . . . . . 24

2.2.4 The potential of the application of MLE . . . . . . . . . . . . . . . . . . . . . 25

3 The Application of MLE 26

3.1 The mathematical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Observation models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 The likelihood of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.3 The computation and assessment of a likelihood function . . . . . . . . . . . 32

3.1.4 On the uncertainty of a reference pixel . . . . . . . . . . . . . . . . . . . . . . 33

3.1.5 An example of the application of MLE . . . . . . . . . . . . . . . . . . . . . . 34

3.2 The interferometric phase PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 An interferometric system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Estimation of the coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4.1 Coherence estimation based on ergodicity . . . . . . . . . . . . . . . . . . . . 41

3.4.2 Decomposition of the magnitude of coherence . . . . . . . . . . . . . . . . . . 43

4 Parameter and Reliability Estimation 46

4.1 A theoretical likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Parameter and reliability estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.1 Estimators based on single points . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.2 Estimators based on regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.3 On the choice of estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 Algorithms to Apply MLE 56

5.1 Computation of the expected phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Simulation of phase observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 On the propagation of numerical errors . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5

6 Effects of Variables on a Likelihood function 636.1 Influence of the phase PDF-dependent variables . . . . . . . . . . . . . . . . . . . . . 646.2 Influence of the parameter ambiguities . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.2.1 Ambiguity lengthening within the solution space . . . . . . . . . . . . . . . . 656.2.2 Beat frequency-like phenomena within the likelihood function . . . . . . . . . 666.2.3 Propagation of an ambiguity estimation error . . . . . . . . . . . . . . . . . . 68

7 Hypothesis Testing and Outlier Detection and Identification 717.1 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.1 Introduction into hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . 727.1.2 On the importance of hypothesis testing: an example . . . . . . . . . . . . . 73

7.2 Outlier detection and identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.2.1 On the impact of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.2.2 Outlier detection and identification based on the B-method of testing . . . . 797.2.3 Outlier detection and identification based on the laws of conservation . . . . 817.2.4 On the difficulty of outlier identification . . . . . . . . . . . . . . . . . . . . . 837.2.5 Strategies for identifying outliers . . . . . . . . . . . . . . . . . . . . . . . . . 857.2.6 Settings of the simulations of time series of phase observations . . . . . . . . 877.2.7 The working principles of the outlier identification algorithms . . . . . . . . . 887.2.8 Performance of the outlier identification algorithms . . . . . . . . . . . . . . . 947.2.9 The future for outlier detection and identification . . . . . . . . . . . . . . . . 100

8 The MLE Applied to Real Data 102

9 Conclusions and Recommendations 108

6

Shorthand notation

Before the Thesis starts, it is convenient to introduce shorthand notation. Shorthand notation ishere introduced to avoid long sentences in the Thesis, and gives information about the state of thevariables.

RandomnessRandom variables are here indicated by underlining the variables. An example of an random vari-

able used in the Thesis is the interferometric phase observation φ.

EstimatesEstimates are here indicated by superscripting the variables by a caret, also known as a hat. An

example of an used estimated variable is the topographical height estimate h.

Expected valuesExpected values are here indicated by making use of the expectancy operator E{·}. For the fre-

quently used interferometric expected phase the subscript of zero used as well to further shorter thenotation. The expected phase for example can therefore be indicated by either E{φ} or φ0.

Identification within a group of variablesSeveral variables in the Thesis are available in multiplicity. To identify between single or multiple

members of a group of variables, superscripts of numbers are used. In the Thesis x is frequently usedfor the parameter of interest. x2 may therefore either refer to the second parameter, or may refer toa single parmater squared. From the context it will be clear which of the two situations is valid.x1−M on the other hand refers to the whole set of M parameters of interest. The superscript ·1−M

needs therefore to be read as 1 to M, and not 1 minus M. The indication of multiple but not allthe members of a set of variables can here be distinguished by superscripting the number of thosemembers, i.e. φ1,2,7 signifies the first, second and seventh phase observation.

In the Thesis the variables M and N are frequently used as the available number of parameters ofinterest and the available number of observations respectively.

Division between single and multiple sets of variablesTo indicate multiple values of a variable the type setting of the variable has been set to boldface.

h indicates therefore the domain of the values of the topographical height.The type setting of boldface is also used for a group of variables. Here the boldface type setting

means that every member of the group has a domain of multiple values. In the Thesis the termmultiple sets of variables is used. x1−M refers therefore to the whole multi-dimensional domain ofall the M parameters of interest, while x1−M indicate only the M parameters of interest. Eventuallya single set of values to the variables of x1−M can be assigned.

7

Identification of the set of expected phase values induced by any parameters of interestTo identify a specific set of expected phase values induced by any parameters of interest, the

parameters of interest that induce those expected phase values are indicated in subscript. Theexpected phase φ1−N

0 induced by a single set of parameter estimates x1−M is for example notatedas φ0,x1−M .

Alternatively, in pseudocode the specified set of expected phase values are indicated by the follow-ing of an at symbol @ after which the single set of parameter values follow.

Defined notation of a resolution cell common within a stack of interferogramsAs the long term common resolution cell within a stack of interferograms is frequently used in the

Thesis, its notation is shortened as RC.

Defined notation of the application of Maximum Likelihood Estimation (MLE) to timeseries of interferometric phase observations

In the Thesis the MLE is always applied to time series of interferometric phase observations. Toimprove the readability of the Thesis the term application of Maximum Likelihood Estimation (MLE)to time series of interferometric phase observations is frequently shortened to application of MLEor MLE.

8

Nomenclature

Acronyms

CDF Cumulative Density Function

CEW Coherence Estimation Window

DEM Digital Elevation Model

InSAR SAR interferometry

ML Maximum Likelihood

MLE Maximum Likelihood Estimation

PDF Probability Density Function

PS Persistent Scatterer

PSI Persistent Scatter Interferometry

QA Quality Assessment

RC Resolution Cell

RMS Root-Mean-Square

SAR Synthetic Aperture Radar

SLC Single Look Complex

SNR Signal-to-Noise Ratio

Symbols

α baseline tilt [deg]

βx height-to-phase conversion factor [rad/m]

D deformation acceleration [mm/year2]

∆R difference between R1 and R2 [m]

D deformation rate per year [mm/year]

γ complex coherence [-]

κ2π general parameter-to-phase conversion factor [-]

9

λ radar wavelength [m]

R reliability of the estimate [-]

φ0 expected interferometric phase [rad]

ψ SAR phase [-]

σ2h height variance [m2]

σ2φ phase variance [rad2]

θ looking angle of the satellite [deg]

θinc incidence angle at the reference surface [deg]

φ interferometric phase observation [rad]

r random number [-]

ϑ expected reference phase [rad]

ζ terrain slope [deg]

A design matrix [·]

a integer ambiguity number [-]

B baseline [m]

BR range bandwidth [Hz]

B⊥,crit critical perpendicular baseline [m]

B⊥ perpendicular baseline [m]

c correlated part of the two SAR signals [-]

c speed of light [m/s]

D2π deformation rate ambiguity [m]

fDC Doppler centroid frequency [Hz]

h topographical height [m]

h2π height ambiguity [m]

K2π general parameter ambiguity [-]

L multi-looking factor [-]

l length of boxcar distribution [-]

N number of interferograms [-]

n interferometric phase noise [rad]

Pn received thermal noise power [W]

10

Pr received signal power [W]

R1 distance between the SAR antenna and the target area during the first pass of thesatellite [m]

R2 distance between the SAR antenna and the target area during the second pass ofthe satellite [m]

RCDF distribution function ranging from 0 to 1 according to the CDF [-]

s step size [·]

t time between the SAR acquisitions [s]

X change in slant range caused by a difference in time delays [m]

x ground range direction [·]

x parameter of interest [·]

x scene reflectivity [-]

y SAR signal [-]

y azimuth direction [·]

y observation [·]

z interferometric signal [-]

z zenith direction [·]

11

Chapter 1

Introduction

Radar interferometry (InSAR) is a technique that combines Synthetic Aperture Radar (SAR) imagesof the Earth to form interferometric images, better known as interferograms.

For a resulting interferometric phase observation there are several parameters to estimate, suchas the topographical height, deformation, atmospherical noise and the orbit errors of the satellite.Since the interferometric phase can only be measured modulo 2π, they need to be “unwrapped” toobtain a continuous signal. The phase ambiguities need therefore also to be estimated.

The multiplicity of the parameters to estimate forms an underdetermined problem. It is a challengeto solve this problem.

To facilitate the estimation of the parameters, the redundancy needs to be increased. This can bedone by assuming certain relations between the phase observations. Often this is done by assumingspatial ergodicity. By using this assumption the phase observations can be spatially unwrapped,after which the parameters can be estimated for using the unwrapped phase.

The phase unwrapping algorithms based on the ergodic assumption have only a single phase obser-vation available per phase ambiguity estimate. In some cases phase noise, eventually in combinationwith a high phase variation, cause thereby phase unwrapping errors, resulting in a loss of local in-formation.

Using a time series of phase observations the redundancy can be increased as well. A likelihoodfunction of the parameters can be constructed on basis of the time series of phase observations, afterwhich Maximum Likelihood Estimation (MLE) becomes possible. MLE simultaneously unwrapsthe phase observations and estimates the parameters. Hereby the redundancy for the estimation ofthe phase ambiguities is in general higher than the zero redundancy of spatial phase unwrappingtechniques, resulting in a lower likelihood of phase unwrapping errors.

This approach has been introduced two decades ago. Moreover, Ferretti et al. [6] and Einederand Adam [5] are the only ones that have applied this approach to real data. They showed thatthe application of MLE to time series of phase observations is viable. Both groups focussed theirresearch on the reconstruction of Digital Elevation Models (DEM)s. In their research they focussedon the feasibility of the MLE. They also introduced two different reliability estimators.

12

Although Ferretti et al. and Eineder and Adam showed that the application of MLE to timeseries of phase observations is viable, some aspects of the estimation process are still unclear and hastriggers several questions to me, before the start of my research and during my research:

1. Ferretti et al. and Eineder and Adam showed that the application of MLE to time series ofphase observations is viable for the estimation of the topographical height. Is it possible toestimate other parameters as well? And what are the conditions for an observation model toapply MLE? Is it possible to test the validity of the observation model?

2. Ferretti et al. and Eineder and Adam introduced two different reliability estimators and usedthe global maximum of a likelihood function to estimate the parameters. On what othercharacteristics can the parameter and reliability estimators be based on? And can parameterand reliability estimation be based on the same characteristics?

3. What would happen with the likelihood function if one of the phase observations within a timeseries is omitted? What impact would it have on the parameter estimate? Is it possible toimprove the parameter estimates by omitting one or more phase observations within the timeseries?

My research attempts to answer these questions.

Chap. 2 introduces the measurement technique of InSAR and its complications. It gives insightinto the different aspects that influence the precision, and shortly introduces the problem of phaseunwrapping. Most of the knowledge that is discussed here is assumed to be known a-priori by anyonefamiliar with InSAR. Anyone not familiar with InSAR is encouraged to read this chapter.

The mathematical framework for the application of MLE to a time series of phase observations isdiscussed in Chap. 3. The formulation of MLE is generalized for the usage of any time-dependentobservation model. Chap. 3 also provides the background information needed to understand thebasics of the estimation process of MLE. The phase Probability Density Function (PDF) and theestimation of the magnitude of coherence, a variable of the phase PDF, are thereby discussed, andan overview of the interferometric system is given.

Chap. 4 introduces several parameter and reliability estimators based on certain characteristicsof likelihood functions. As it turns out, any characteristics of a likelihood function can be used forboth parameter and reliability estimation. Chap. 4 introduces the concept and presents four pairsof parameter and reliability estimators.

In Chap. 5 the simulation of phase observations are discussed. The simulation of time series ofphase observations is needed in Chap. 7 to perform analyses. Chap. 5 also discusses the smoothnesscriteria that need to be satisfied in order to numerically estimate the Maximum Likelihood (ML)parameter estimates.

Chap. 6 discusses several effects that variables may impose on a likelihood function. The con-straints that are needed in order to apply MLE successfully are thereby revealed. The prominentrole of the observation model dependent rank variables becomes thereby apparent as these are asource of some interesting phenomena, such as the occurrence of beat frequency-like phenomena andthe ambiguity lengthening within the solution space.

Chap. 7 returns to one of the fundamentals of science: the confrontation of a hypothesis withthe observations, also known as hypothesis testing. Since the observation model is deduced from ahypothesis, hypothesis testing tests the validity of an observation model. First hypothesis testingfor the application of MLE to a time series of phase observations is introduced.

Then within Chap. 7 our focus shifts to those phase observations that reject the hypothesis,the so-called outliers. Since they have a negative impact on the parameter estimates, it is best todetect, identify and remove the outliers. After the discussion of two different theories about outlier

13

detection and identification, the attention of our discussion focusses on the identification ratherthan the detection part. Five different outlier identification algorithms are introduced and theirworking principles are discussed. Moreover, the performance of the five different outlier identificationalgorithms is statistically analyzed. Any potential improvements and constraints of the identificationalgorithms are further discussed.

Chap. 8 holds a short discussion about the application of MLE to time series of phase observationsusing real data.

Finally, Chap. 9 discusses the conclusions and recommendations that I made about my research:novel estimation aspects for the application of MLE to time series of interferometric phase observa-tions.

14

Chapter 2

A-Priori

In my research MLE is applied to time series of phase observations. A-priori its proper introduc-tion in Chapter 3, several aspects of the estimation process using the technique of InSAR need tobe discussed. In this chapter the emphasis lies on parameter estimation using a single interfero-gram. Due to complexity of the estimation of topography and the general prominent presence of thetopographical signal within the interferograms, the estimation of topography is discussed in moredetail.

Readers that are familiar with parameter estimation using InSAR can skip this chapter.

Sec. 2.1 shows the derivation that is needed in order to estimate the parameters of interest. Theestimation of topography is based on SAR geometry and is an important part of the derivation. Sec.2.1 gives also a short overview of the side looking effects of SAR geometry.

Sec. 2.2 characterizes the performance of topographical height estimation. The optimization issuesare discussed It discusses its optimization issues and presents the motivation of studying MLE basedon phase statistics.

2.1 SAR geometry

InSAR enables to preserve the amplitude and the interferometric phase of the reflected radar beam[15]. The measurement of the interferometric phase is important to derive relationships with theparameters of interest.

To observe topography, use is made of the SAR geometry. The derivation of the relationshipbetween the interferometric phase and the parameters, including topography, is done in Sec. 2.1.1.

The slant range direction of the SAR viewing technique has also its drawbacks. The distortionsthat arise from a side looking SAR antenna are discussed in Sec. 2.1.2.

2.1.1 Derivation of the relationship between phase and parameters

Topographical information can be acquired using InSAR techniques [30]. Also the difference intime delay of two consecutive SAR signals contains information, such as surface displacement andatmospherical distortions. In this section the relations between the topographical position of a pointand its expected interferometric phase are first derived, after which the model is expanded to includethe time delay-dependent signals.

Since the model incorporates only point scatterers, effects of wavenumber shifts of the groundreflectivity spectrum are not taken into account [8].

15

Figure 2.1: The formation of a SAR image. (See the text for an elaborative explanation.)

Before starting the derivation, I underline here that a relationship between the expected interfer-ometric phase E{φ} and the parameters is constructed, and not directly between the phase obser-vations and the parameters. A phase observation φ includes (stochastic) interferometric phase noiseand can therefore not be directly related to any parameters.

In the Thesis the underlining of a variable is used to indicate the stochastic behavior of the variable.To simplify the writing notation, the term phase will be frequently used for the interferometric phaseobservation and the term expected phase will be used for the expected interferometric phase. E{φ}is also here notated as φ0.

Moreover, interferometric phase can only be measured modulo 2φ and therefore need to be “un-wrapped” in order to generate a continuous signal. The interferometric phase ambiguities needtherefore in general to be estimated. To keep things simple the assumption is made that the inter-ferometric phase does not need to be unwrapped, i.e. it is assumed that the phase ambiguities areknown: φ0,unw ∈ R. The spatial unwrapping of the interferometric phase is discussed separately inin Sec. 2.2.3.

First consider the formation of a SAR image in Fig. 2.1. A side looking SAR satellite is locatedin the upper left corner. Also a simple model of its footprint can be seen. Each RC in the footprinthas a range and azimuth coordinate. The slant range direction is the direction in which the pulsesare send and measured, while the azimuth direction is the direction in which the SAR instrumentis orbitting. The horizontal component of the slant range direction, known as the ground rangedirection, is indicated as well. As can be seen to the red pixel in the figure, the distance to everypixel is modelled in a slant range direction and a looking angle θ.

To derive the relations between the topography and φ0,unw, use has been made of the geometricalrelationships of a dual-pass configuration shown in Fig. 2.2. For the moment it is assumed that onlythe topography is observed.

16

In Fig. 2.2 the points indicated as “Master” and “Slave” are the positions where the Master andSlave image were taken. The terms Master and Slave are defined by the InSAR pixel coregistrationprocedure as the Slave image is adapted to the Master image. In this process the Slave image isinterpolated to the pixel domain of the Master.P is here the observed point from both satellite positions. B is the distance between the (dual-

pass) satellite antennas, also known as the baseline. R1 and R2 are the distances between the SARantennas and point P , and their range difference is indicated as ∆R. α is the baseline tilt, runningfrom the horizontal towards the baseline, and θ is the looking angle of the SAR antenna, runningfrom the vertical towards R1. Hsat is the altitude of the satellite, while h is the topographical heightof point P with respect to a reference surface. x, y, and z are the directions in ground range, azimuthand zenith direction.

Figure 2.2: A dual-pass geometrical configuration for obtaining the height. (See the text for an elaborativeexplanation.)

To estimate the topographical coordinates of point P , consider the following geometrical relation-ships from Fig. 2.2:

h = zP = Hsat − R1 cos θ (2.1)

xP = R1 sin θ (2.2)

in which xP and zP are the topographical position of P in the ground range and zenith directionrespectively.

Differentiating equation (2.1) and (2.2) with respect to θ results into:

∂h = R1 sin θ∂θ (2.3)

∂xP = R1 cos θ∂θ (2.4)

17

Equation (2.3) and (2.4) are very important for the derivation between the phase and the topo-graphical position of a point: they represent the sensitivity in which a position can be measuredfor a change in looking angle θ. Thus if the difference in looking angle can be resolved, also thetopographical position can be found.

The difference in looking angle can not be measured directly however. Consider therefore thefollowing derivations.

Under the assumption that the looking angle of both the first and second pass are parallel, i.e.using a far field approximation [30], ∆R can be obtained from Fig. 2.2:

∆R = B cos((π

2− θ

)

+ α) = B sin(θ − α) (2.5)

By differentiating equation (2.5) a relationship can be made between a change in the range differenceand the difference in looking angle:

∂∆R = B cos(θ − α)∂θ (2.6)

∆R is in reality estimated by the (noise-free) expected phase 1

∆R =λ

2

φ0,unw

2π(2.7)

in which λ is the radar wavelength.The derivative of equation (2.7) is:

∂∆R =λ

2

∂φ0,unw

2π(2.8)

Using equation (2.6) and (2.8), it is possible to relate the difference in looking angle with φ0,unw

through ∂∆R:

∂θ =λ

4π

∂φ0,unw

B cos(θ − α)=

λ

4π

∂φ0,unw

B⊥(2.9)

in which B⊥ is also known as the perpendicular baseline.

The topographical information can now be directly obtained from the expected phase. This isdone by inserting equation (2.9) into equation (2.3) and (2.4) followed by integration with respectto φ0,unw :

zP = h =λ

4π

R1 sin θ

B⊥φ0,unw + href (2.10)

xP =λ

4π

R1 cos θ

B⊥φ0,unw + xref (2.11)

in which href and xref are constants that appear due to the integration. The point (xref ,href ) isthe reference point and shows that InSAR is a relative measurement technique.

It needs to be stressed that equation (2.10) and (2.11) were derived under the assumption thatonly the topography was observed. The measured difference in range ∆R is however also a result of

1

Here the convention is used that a positive change in range results in a positive change in the expected phase. Analternative convention is explained in Sec. 2.3.4 of [15].

18

any difference in time delays of the two received SAR signals. The difference in time delays may havedifferent sources, such as the change in atmospherical conditions in which the SAR images were takenor the occurrence of deformation. If such phenomena occur, these are all observed simultaneously,i.e. within the same phase observation.

By reordering equation (2.8) and accounting for a difference in time delays, equation (2.9) can berewritten and reordered in order to incorporate a time delay-dependent parameter:

∂φ0,unw =4π

λ∂∆R =

4π

λ(B⊥∂θ − ∂X) (2.12)

in which ∂X is a change in range caused by a difference in time delays in slant direction. The signof ∂X is positive for a time delay decrease, e.g. uplift (in slant direction).∂X can, if applicable, be further specified into individual sources that cause the difference in time

delay, e.g. into deformation, atmospherical distortions, orbit errors, etc.

It was already shown before InSAR is a relative measurement technique, and therefore need areference. The reference can be obtained in various ways.

One approach is, for instance, if one is interested in the estimation of the topography of a sceneon the Earth, to model the Earth as an ellipsoid to estimate its topography. The topography isthen related with respect to the ellipsoid. In that case φ0,unw needs to be subtracted by an artificialreference phase ϑellipsoid induced by the ellipsoid. The reference phase ϑellipsoid can be computedby the following equation:

ϑellipsoid =4π

λB sin(θ − α) (2.13)

Alternatively, for the estimation of deformation, a reference phase of a DEM ϑDEM can be com-puted and subtracted.

Another approach is to relate a RC with another RC within the interferogram. These relationsare known as arcs. As both RCs are located with respect to an equivalent reference, the differenceof the two expected phases result in a difference between the parameter estimates of the RCs, i.e. atopographical height difference or deformation difference within the arc.

Notice that both approaches can be applied simultaneously, depending on the parameters to esti-mate.

Notice that both the ground range and height of equation (2.10) and (2.11) are based on the samefoundation, namely that they are a function of the difference in looking angle ∂θ. To prevent doingwork twice, in the remainder of the Thesis only the topographical height h will be considered forfurther analysis and evaluation. From now on the term height will be frequently used as a shortnotation for the topographical height.

2.1.2 Side looking effects of a SAR antenna

Implicitly, in Sec. 2.1.1 the assumption had been made that the looking angle θ can be retrievedone-to-one with respect to the topographical position, which means that the topographical positionscan be represented as individual single scatterers.

In the reality of the SAR viewing geometry the SAR antenna receives a superpositioning of multiplescatters, resulting in foreshortening, layover and shadow, see Fig. 2.3.

19

Fig. 2.3 shows objects with positive and negative topographical slopes. The topographical slopesare defined positive towards the satellite. The Cartesian coordinates of the objects are then projectedto the cylindrical coordinate system, from which several phenomena can be seen:

• At the left figure foreshortening can be seen. Foreshortening will coarsen the resolution due tothe topographical slope. This implies that in general the number of scatter sources is reducedwithin the RC;

• The middle figure shows layover. Layover is caused by the simultaneous reception of radarwaves bounced by scatterers from positions of different elevations. Here the SAR antennareceives first the scattering of the scatter sources located more far in ground range direction.The result is a mixture of elevation information mapped into one single RC. Therefore phaseinformation of pixels in layover may not represent the correct topography.

• Shadow occurs due to steep terrain that is blocking the view of the SAR system, as can beseen in the right figure. No information can be acquired in the shadow zone of a SAR viewinggeometry.

Foreshortening, layover and shadow cause problems in observing the phenomena as the SAR signalsare distorted or (partially) lost. It might be still possible to obtain topographical information of theareas in foreshortening, layover or shadow however.

Due to the rotation of the Earth the SAR satellite is repeatedly ascending and descending over ascene. If the target area can be captured in both the ascending and descending mode of the satellite,then the satellite observes the same area in two different perspectives. This allows access to moreinformation about the areas that are subject to foreshortening, layover or shadow.

Figure 2.3: Implications due to the SAR viewing geometry: foreshortening (left), layover (in the middle)and shadow (right). This image has been taken from www.crwr.utexas.edu on the seventh of January 2009.

2.2 On the estimation of parameters

After the derivation of the relation between the expected phase and the parameters in Sec. 2.1.1 itis now possible to estimate the parameters. This section gives more insight into the estimation ofthe parameters using the phase observations situated within a single interferogram. The factors thathave a significant impact on the estimation process are here discussed. Of all the parameters, theheight estimation is subject to all those factors, and is therefore discussed in more detail. Also theperformance of spatial phase unwrapping is shortly touched.

20

Sec. 2.2.1 introduces the definitions of the height ambiguity and height variance. These are neededto measure the performance of height estimation. The performance of height estimation is furtherdiscussed in Sec. 2.2.2. Here the influence of the geometrical configuration and the radar wavelengthare amongst others discussed.

Additionally, incorrectly unwrapping the phase observations introduces estimation errors too. Thisis explained in Sec. 2.2.3.

Sec. 2.2.4 finally presents the motivation for studying the application of MLE to a time series ofphase observations.

2.2.1 The height ambiguity and height variance defined

The relationship between the variation in phase and the variation in height can be expressed in twodifferent ways.

The height sensitivity δφδh can be approximated as the variation of the expected phase with respect

to the height [2], or:

δφ

δh=

4π

λ

B⊥

R1 sin θ(2.14)

in which λ is the radar wavelength, R1 is the distance between the position of the SAR antennaof the first pass and the scatterer, θ is the looking angle and B⊥ is the perpendicular baseline. Notethat this approximation is only valid for small variations of the height.

The height ambiguity h2π can be used as an alternative to δφδh . h2π is the height difference that

one fringe, being one phase cycle of 2π [ibid], represents within an interferogram and is defined as:

h2π =λ

2

R1 sin θ

B⊥(2.15)

To discuss the height precision in topographical mapping, the definition of the height variance σ2h

is introduced as a function of the phase variance σ2φ and the height ambiguity h2π:

σ2h =

(

λ

4π

R1 sin θ

B⊥

)2

σ2φ =

(

h2π

2π

)2

σ2φ (2.16)

2.2.2 Achieving maximum precision

In general, to achieve for a parameter a maximum precision within a resolution cell, advantage is takenfrom the redundancy in which the parameter is measured. The redundancy of the phase observationsin the observation equations will in general increase the precision if at least no significant outliersare present in the measurements. Hypothesis testing can give closure which observation model isbest to use to achieve a maximum unbiased precision [26].

To gain more insight into the the achievement of maximum precision, the maximization of theheight precision is studied in more detail. This is because of all parameters, the height estimation issubject to all those factors discussed here that have a significant impact on the estimation process.Here only the maximization of the height precision of a single interferogram is discussed.

Maximizing the height precision signifies minimizing σ2h, which, according to Eq. (2.16), is equiv-

alent to minimizing h2π and σ2φ.

As can be seen from Eq. (2.15), h2π is governed by the radar wavelength λ and the perpendicularbaseline B⊥, since the distance between the scatterer and the satellite position at the first pass R1

21

and the looking angle θ are the expressions to map the topographical position into the cylindricalcoordinate system. The radar wavelength is a characteristic of the design of the SAR system, whilethe perpendicular baseline is part of the geometrical configuration, see Fig. 2.2. Equation (2.15)shows that using a small wavelength reduces h2π and therefore increases the sensitivity to observetopography. Using a large perpendicular baseline has the same effect.

Although a smaller wavelength increases the height ambiguity, it may increase the phase noise inan interferogram too, enlarging σ2

φ. This can be explained by the fact that the scattering of signalsof smaller wavelengths is more dispersive, according to physical laws that dictate that the rate ofdispersive scattering depends strongly on the ratio between the wavelength and the size of a scatterer.Another source of noise might be present due to the attenuation of the signal caused by vegetation ofsmall wavelengths is (significantly) larger in comparison with large wavelengths. The soil type andmoisture content of a scene plays is another major player in the selection of an optimal wavelength.

Depending on the scatter behavior within a scene, a smaller wavelength might therefore not havethe desired increase in the height precision. Care needs therefore to be taken to select a certainwavelength.

Presently SAR systems using X-, C- and L-band are available, resulting in wavelengths of about3, 6 and 24 cm respectively. An optimization of the height precision with respect to the radarwavelength is therefore largely constrained by the available SAR systems developed by the spaceagencies.

Moreover, the scene that one likes to study needs of course to be available, and therefore is de-pendent on the research programme of the space agencies. Maximization of the height precisionwith respect to the radar wavelength is therefore severely limited, although there might be a choiceavailable between different SAR systems.

While the choice in wavelengths is very limited, the selection of an optimal B⊥ is only dependenton the data availability. Minimizing σ2

h depends therefore strongly on the selection of the availableSAR images.

A simple optimization tool for finding a SAR image pair with an optimal B⊥ is the baselineplot. The baseline plot makes use of the two more prominent decorrelation sources: the geometricaldecorrelation and the temporal decorrelation.

Temporal decorrelation is the change in scene scattering as a function of time, which is discussedin Sec. 3.4.2. Usually one can assume that the temporal decorrelation becomes larger in time.

Geometrical decorrelation is caused by a difference in the looking angle of the SAR antenna, causinga difference in scattering of the scene [8]. It is induced by the length of the perpendicular baseline.Although a larger perpendicular baseline increases the height sensitivity, see, Eq. (2.15), the signalexperiences a larger geometrical decorrelation as well.

Geometrical decorrelation is as well a function of the critical baseline B⊥,crit, and is dependent onthe ratio between the perpendicular baseline and the critical baseline. Geometrical correlation canbe estimated as:

|γ|geom =

{

1− B⊥

B⊥,crit, |B⊥| ≤ B⊥,crit

0, |B⊥| > B⊥,crit(2.17)

where the critical baseline B⊥,crit is defined as:

B⊥,crit = λ(BR/c)R1 tan(θinc − ζ) (2.18)

in which BR is the range bandwidth, c the speed of light, θinc the incidence angle at the referencesurface, which is, similar as the looking angle θ in Fig. 2.2, running from the vertical towards R1,and ζ the terrain slope, defined positive towards the satellite.

22

In a baseline plot first a reference image is chosen and set to the origin. Then, simply the SARimages are visualized according to their perpendicular baseline and their time difference with respectto the reference image.

Using time constraints and the lengths of the perpendicular baselines fair SAR image pair candi-dates can be sought to find an optimal height precision. See Sec. 2.5.1 of [15] for a more elaborateexplanation about the baseline plot.

In theory, if one would have an infinite set of interferograms available, one could simply computethe optimal perpendicular baseline.

Fig. 2.4 shows the behavior of the phase and height standard deviation (std. dev.), σφ and σh,with respect to the perpendicular baseline B⊥ for the Cosmo-Skymed (X-band), ERS (C-band) andALOS (L-band) satellite configuration. Here ζ is equal to zero.

Figure 2.4: Height std. dev. (up) and phase std. dev. (below) with respect to the perpendicular baselinefor several satellite configurations. See the text for an elaborative explanation.

Here σφ is computed by making use of the phase PDF of [27], see also Sec. 3.2. The decorrelationparameter |γ| has been modelled as a linear function of the perpendicular baseline according to Eq.(2.17).

σφ increases with an increasing B⊥ for all satellite configurations until the critical baseline B⊥,crit

is met. In that situation the phase PDF is uniform. This is true for all satellite configurations,including ALOS that has a B⊥,crit of almost seven kilometers.

While σφ increases for an increasing B⊥, σh reduces for an increasing B⊥. This is the effect ofthe height ambiguity h2π. Notice that the reduction rate in σh for small B⊥ is very high, while formedium and large B⊥ it is not significant.

Larger perpendicular baselines do therefore increase the precision in topographical mapping, ascan be seen in Fig. 2.4. Although on one hand does a larger perpendicular baseline go accompanied

23

with higher geometrical decorrelation noise, enlarging σφ, on the other hand the change in lookingangle ∂θ is enlarged too, resulting in a higher capability to observe topography.

It need to be stressed that wavelength dependent scatter behavior and phase unwrapping errors inthe spatial domain, to be discussed in Sec. 2.2.3, and moreover other decorrelation sources, see Sec.3.3, are here not taken into account.

Fig. 2.4 was based on an elevation angle ζ of zero. However, if the terrain slope changes locally,the critical baseline is changing locally too, and with it the local geometrical noise level, see Eq.(2.17) and (2.18). This of course complicates the search for an optimal perpendicular baseline evenfurther.

Hagberg and Urlander [14] showed that an optimum baseline can be found depending on thethe histogram of topographical slopes within a scene. Their optimization has been based on aminimization of the Root Mean Square (RMS) height error and resulted in a scene dependent optimalperpendicular baseline.

2.2.3 Resolving phase ambiguities using spatial phase unwrapping

Another source that causes errors within the parameter estimates are phase unwrapping errors. Ina statistical sense phase unwrapping errors therefore lower the precision of a parameter estimate.We will see in this section that the occurrence of phase unwrapping errors affects the choice of anoptimal perpendicular baselines or radar wavelenght.

Interferometric phase can only be measured modulo 2π and therefore needs to be unwrapped toacquire a signal within the interferogram such that the unwrapped phase resembles the signal of theparameters. Here the phase unwrapping procedure within the spatial domain is discussed.

At first the field of phase observations needs to be checked for any discontinuities caused by thecyclical behavior of the phase measurements. The field should therefore be checked for any stepwiseshifts. The check can be performed by setting the rotation of the phase gradient equal to zero:

∇×∇φ = 0 (2.19)

If the condition of (2.19) is not fulfilled, positive or negative residues are found [12]. After all theresidues are found, the residues are connected with each other such that no positive and negativeresidues exist. While the residues are connected, phase cycles of 2π are added or subtracted to thephase values, attempting to reveal the continuous signal.

There exist many phase unwrapping techniques based on the condition of (2.19) [9], mainly causedby complications of unwrapping the phase. These complications exist due to phase noise, the SARviewing geometry discussed in Sec. 2.1.2 combined eventually with a high phase variation.

Although phase unwrapping techniques in the spatial domain have achieved high maturity [5],they are all based on the condition of (2.19). Stochastic phase noise and a high fringe rate, causedfor instance by steep terrain, might lead to undersampling, and therefore might unwrap the phaseincorrectly due to the (non-)existence of phase residues [10]. Furthermore, due to the usage of onlya single interferogram, no redundant power exist to resolve the phase unwrapping, which makes itimpossible to check for errors.

Results of phase unwrapping might contain catastrophique errors for the parameter estimates,and in some cases it is impossible to check if the parameter estimates are strongly biased by phaseunwrapping errors, or that the unwrapped phase represents the real signal. For the maximization ofthe precision of the height estimate, or any other parameter estimate, it complicates the search foran optimal perpendicular baseline or radar wavelength.

24

A large perpendicular baseline or a small radar wavelength results into a higher fringe rate withinthe interferogram. A high fringe rate results therefore in a high need in phase unwrapping andtherefore has a higher potential to generate phase unwrapping errors. On the other hand we haveseen that a large perpendicular baseline would imply a high precision in height, according to Fig.2.4. Fig. 2.4 on the other hand is based on the absence of phase unwrapping errors.

It is difficult to know the optimal values for the wavelength or perpendicular baseline to maximizea parameter estimate precision. The impact of (a lack of) phase unwrapping errors is in general notknown, and therefore the choice of an optimal wavelength or perpendicular baseline unknown.

2.2.4 The potential of the application of MLE

In a nutshell, Sec. 2.2.2 and 2.2.3 showed that there exist many factors that influence the precisionin which any parameter can be estimated.

Furthermore, from Sec. 2.2.2 it could be understood that the height estimation depends mainlyon two parameters: the radar wavelength λ and the perpendicular baseline B⊥.

The dependence on these two parameters can also be used to resolve the parameter ambiguities.The phase ambiguities can be estimated using a multi-baseline or multi-frequency approach [10] [28],i.e. using multiple interferograms based on different values of either radar wavelengths or baselinesrespectively.

Eineder and Adam have shown that the multi-frequency approach is viable for estimating theheight [5]. I will not further research the multi-frequency approach however, although some resultswill probably be of beneficiary value in research to the multi-frequency approach. In the remainderof my research no multi-frequency approach will be studied.

Within the multi-baseline or multi-frequency approach a likelihood function is computed, usingamongst others a-priori data, after which the parameters can be estimated using the global maximumof the likelihood function. The multi-baseline or multi-frequency approach is therefore simply theapplication of MLE to a time series of phase observations.

Although the application of MLE to a time series of phase observations will only be properly in-troduced in Chap. 3, it needs to be known that a minimum of two phase observations are needed tounwrap the phase. For a time series of more than two phase observations, there exist redundancy forthe estimation of the phase ambiguities, resulting in a lower likelihood of phase unwrapping errors.This is a major advantage of the application of MLE to a time series of phase observations: phaseunwrapping becomes more reliable.

Moreover, using spectral analyses Gatelli had shown that it is possible to reduce layover effects[8]. Gini et al. [11] showed that with multi-baseline interferometry any layover errors can even befurther reduced. In my research I do not spend any attention to the spectral analyses however.

The usage of PSs can further improve the estimation of parameters [7]. PSs are scatterers thatbehave similar as point scatterers, and can be found in a RC if its scatter behavior is dominantwith respect to other scatterers. They experience consistently low decorrelations, and are thereforeconsidered as reliable observations. Identifying PSs can be done in several ways [7] [16].

On the other hand, at those locations where the PS densitity is low, the application of MLE to atime series of phase observations might give solutions that PSI can not provide.

In short the potential of the application of MLE to a time series of phase observations are thereforea lower likelihood of phase unwrapping errors, a reduction in layover effects and the provision ofinformation where PSI fails to give information.

25

Chapter 3

The Application of MLE

In this chapter the application of MLE to time series of phase observations is introduced.

There are quite some topics that touch the main argument. For sake of completeness these areexplained as well in this chapter.

Since the application of MLE in my research applies always to time series of phase observations,the term application of MLE to time series of phase observations is often interchanged with the termMLE or application of MLE in order to shorten the writing language.

The main argument of the chapter is explained in Sec. 3.1, which introduces the application ofMLE to time series of phase observations. The mathematical framework is hereby discussed and avisual example of MLE is given.

In Sec. 3.2 the PDF of the interferometric phase is discussed. Knowledge about the phase PDF isneeded since the phase PDF is used in MLE.

The phase PDF discussed in Sec. 3.2 is a function of, amongst others, the magnitude of coherence,which is a measure of the decorrelation of the interferometric signal. Sec. 3.4 discusses the existingestimation techniques for obtaining the magnitude of coherence.

Before the estimation techniques of the magnitude of coherence are discussed, a proper introductionis needed to give insight into the different decorrelation sources that may contribute to phase noise.Sec. 3.3 elaborates about the decorrelation sources by introducing an interferometric system model.

3.1 The mathematical framework

To understand the working principle of MLE several steps need to be explained. From Sec. 3.1.1until 3.1.4 the mathematical framework is continuously expanding until Sec. 3.1.5 is reached: herea visual example of the application of MLE is given.

Like any estimation methodology, MLE also needs an observation model to estimate the parameters.Sec. 3.1.1 introduces several observation models that can be used for the application of MLE.

Sec. 3.1.2 discusses the computation of the likelihood of a single set of parameter values. SinceInSAR is a relative measurement instrument the parameters need to be related to a reference point.For the derivation of the application of MLE the resulting arcs are based on artificial, deterministicreference points.

Sec. 3.1.3 expands Sec. 3.1.2 for multiple sets of parameter values. The likelihood for an arcbased on an artificial, deterministic reference point is here computed for the whole multi-dimensional

26

domain of the parameters, resulting in a likelihood function. The computation of ML estimates ofthe parameters and the reliability estimates known from literature are hereby introduced.

Sec. 3.1.2 until Sec. 3.1.3 introduce the application of MLE for arcs based on artificial, deterministicreference points, resulting in biases within the parameter estimates. Sec. 3.1.4 discusses MLE forarcs based on reference pixels within the interferograms. By taking reference pixels into account thebiases can be cancelled. Reference pixels are not known deterministically however: their uncertaintyhas consequences for the parameter estimation. Sec. 3.1.4 discusses the two available options to dealwith the uncertainty of a reference pixel.

Finally, Sec. 3.1.5 shows the application of MLE visually using reference pixels.

3.1.1 Observation models

In this research only linear observation models are used. The observation models are described inthe form of E{y} = Ax, in which E{·} is the expectancy operator, y the observations, A the designmatrix and x the parameters.

Sec. 2.1.1 derived the relationship between the expected phase and the combination of topographyand the time delay-dependent parameters in Eq. (2.12). Notice that φ0 is here observed with respectto an artificial reference point.

Rewriting Eq. (2.12) results into the following observation model:

E{φ} = φ0 = φ0,ref +[

4πλ

B⊥

R1 sin θ − 4πλ −2π

]

h− hrefX −Xref

a− aref

(3.1)

Here E{·} is the expectancy operator. E{φ} or φ0 is the wrapped expected phase, see Sec. 2.1.1.λ, R1 and θ are the radar wavelength, the distance between the antenna of the first acquisition andthe ground target and the looking angle of the SAR antenna respectively. B⊥ is the perpendicularcomponent of the baseline with respect to the satellite positions, also known as the perpendicularbaseline. h, X , and a are the parameters of interest. h is the topographical height, X the change inrange caused by deformation and/or atmospherical differences in the SAR images and a the unknowninteger ambiguity number of the phase cycles (a ∈ Z). The used sign-convention for the estimationof h and X have been explained in Sec. 2.1.1.

For the reference pixel the values of φ0,ref , href , Xref and aref are chosen to be zero. The notationof φ0,ref , href , Xref and aref will be omitted in the following.

27

(3.1) can be expanded for N ≥ 2 interferograms. The expansion of (3.1) for multiple interferogramsis shown in (3.2).

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

β1x − 4π

λ −2πβ2x − 4π

λ −2π...

. . .. . .

βNx − 4πλ −2π

hX1

X2

...XN

a1

a2

...aN

(3.2)

Here λ4π

R1 sin θBi

⊥

is subsituted by βix, and φi0, βix and X i are here the wrapped expected phase, the

height-to-phase conversion factor and the change in range for a single common RC affiliated withthe ith interferogram. The empty space of the design matrix is filled with zeros.

Notice that (3.2) is highly underdetermined. For the next observation models time-dependent as-sumptions need to made in order to increase the redundancy.

My research focusses on the parameter estimation using the application of MLE using a singleMaster image. The advantage of using a single Master image is obvious: all interferograms are usingthe same reference image and any parameters can therefore be temporally related with respect to acertain moment.

The usage of time series of phase observations based on a single Master image might result intobiased parameter estimates however. The reason for the biases and the measure that needs to betaken to prevent biases within the parameter estimates can be explained as follows.

The biases, induced by using a single Master image for the application of MLE, can be explainedby taking into account the correlation between the SAR images.

Any interferometric signal based on a single Master stack has uncorrelated and correlated parts.The uncorrelated parts are caused by the signals of the Slave images, while the correlated parts arecaused by the signal of the Master image.

Any phase noise induced by the Slave part of the interferometric signal can be averaged out: theexcesses of phase noise of the individual Slave parts within the stack of interferograms are com-pensated by the phase noise of other Slave parts. The phase noise induced by the Master image ishowever always the same, present within any interferogram, and can not be averaged out. Thereforethe phase noise induced by the Master image can form a bias within the parameter estimates.

Any bias is the result of imperfections within the observation model. Fortunately the validity of theobservation models can be checked. Using hypothesis testing any observation model can be eitheraccepted or rejected, resulting in the detection and removal of biases.

Hypothesis testing is discussed in more detail in Sec. 7.1.

In order to discuss the application of MLE to any possible observation model, and to keep my re-search easier to comprehend, I abstract on one hand all the observation models to a single generalizedobservation model, while on the other hand I use lots of tangible observation models.

Here first four tangible observation models are discussed, after which the generalized observationmodel is discussed.

28

The first observation model states that only the topography can be observed and that no biasinduced by the Master image is present. Hence, (3.2) is reduced to:

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

β1x −2πβ2x −2π...

. . .

βNx −2π

ha1

a2

...aN

(3.3)

The second observation model states that both the height and the bias induced by the Masterimage can be observed. (3.2) becomes:

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

β1x 1 −2πβ2x 1 −2π...

.... . .

βNx 1 −2π

h∇a1

a2

...aN

(3.4)

in which ∇ is the bias induced by the Master image.

The third observation model states that both the height and a constant deformation rate can beobserved. (3.2) becomes:

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

β1x − 4π

λ t1 −2π

β2x − 4π

λ t2 −2π

......

. . .

βNx − 4πλ t

N −2π

h

Da1

a2

...aN

(3.5)

in which D is the deformation rate per year and ti the time, expressed in years, between the SARimage acquisitions of the ith interferogram. Other deformation models can be found in [3], [20] and[19].

The fourth observation model states that the height, a constant deformation rate and a bias inducedby the Master image can be observed. (3.5) becomes:

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

β1x − 4π

λ t1 1 −2π

β2x − 4π

λ t2 1 −2π

......

.... . .

βNx − 4πλ t

N 1 −2π

h

D∇a1

a2

...aN

(3.6)

All these observation models could only exist by making time-dependent assumptions over time.The topographical height in these observation models is assumed, for instance, to remain constant intime, while the deformation in slant direction is assumed to be a linear function of time. These as-sumptions are needed to increase the redundancy of the problem. In general can any time-dependent

29

observation model therefore be used as long as the problem is not underdetermined.

An observation model can therefore be generalized as follows.Let x1−M be the parameters to be estimated, including any biases. Applying MLE the parame-

ters x1−M can only be estimated if x1−M have a time-dependent behavior. A general form of theobservation equations for MLE can be mathematically expressed as:

E

φ1

φ2

...

φN

=

φ10

φ20...φN0

=

κ1,12π κ1,2

2π . . . κ1,M2π −2π

κ2,12π κ2,2

2π . . . κ2,M2π −2π

......

. . ....

. . .

κN,12π κN,22π . . . κN,M2π −2π

x1

x2

...xM

a1

a2

...aN

(3.7)

in which κ1−N,j2π are the parameter-to-phase conversion factors for the jth parameter. κ1−N,j

2π are

also known as the rank variables. κi,j2π is equal to2π

Ki,j2π

, in which Ki,j2π is the parameter ambiguity.

Sec. 6.1 discusses the constraints that the rank variables need to satisfy in order to apply MLEsuccessfully.

3.1.2 The likelihood of parameters

In this section the computation of the likelihood of a single set of parameters is discussed. Thismeans that multiple parameters x1−M are possible to use within the computation, and that notmore than one value to a parameter can be assigned.

The computation of the likelihood takes place on an arc based on time series of phase observations.Since the application of MLE is here always based on time series of phase observations, the term “arcbased on time series of phase observations” is shortened simply to “arc”. Furthermore, the nota-tion “arcart.ref.” is used to indicate that the corresponding arc is based on an artificial reference point.

To compute the likelihood of any parameter values for an arcart.ref., the probability P(x1−M |φ1−N )needs to be computed.

P(x1−M |φ1−N ) can not be directly computed however. In consequence Bayes rule needs to beapplied.

Bayes rule states that:

P(B|A) =P(A|B)P(B)

P(A)(3.8)

in which A and B are events and P(·) and P(·|·) are the probability and conditional PDF respec-tively. The first argument of P(·|·) is conditioned by the second argument.

Bayes rule applied to P(x1−M |φ1−N ) gives:

P(x1−M |φ1−N ) =P(φ1−N |x1−M )P(x1−M )

P(φ1−N )(3.9)

30

If independence can be assumed, the correlation between the different parameters does not needto be known, resulting in the following:

P(A1−N ) =

N∏

i=1

P(Ai) (3.10)

in which N are the number of events. Assuming independence of the phase observations results forP(φ1−N |x1−M ) and P(φ1−N) into the following:

P(φ1−N |x1−M ) =

N∏

i=1

P(φi|x1−M )

P(φ1−N ) =

N∏

i=1

P(φi)

(3.11)

As long as the phase observations have been taken independently from each other, Eq. (3.10) canbe applied.

Notice that for a single Master stack all the interferograms are correlated with the Master, andthus not independent from each other. However, if the bias induced by the Master image is selectedas a parameter to estimate, independency can still be obtained.

Assuming independence of the phase observations results into:

P(x1−M |φ1−N ) =

N∏

i=1

P(φi|x1−M )P(x1−M )

N∏

i=1

P(φi)

(3.12)

P(x1−M |φ1−N ) might now be possible to compute. To compute P(x1−M |φ1−N ) the right side ofEq. (3.12) needs to be further analyzed.

P(φi|x1−M ) is the conditional phase probability for a RCi. While x1−M invoke a certain value

of the expected phase φi0 via an observation model, the phase observation φi assigns a PDF valueaccording to the shape of the phase PDF.

The product of the N phase PDF values results in a comparison of the expected phase behaviorφ1−N

0 and all the available phase observations φ1−N . This results into an assessment of the parametervalues in the form of an assigned probability.

The individual probability P(φi) is easy to compute if the assumption is made that the PDF(φi) is

uniformly distributed. P(φi) can therefore be replaced by the constant 12π . It follows that

∏Ni=1 P(φi)

can be replaced by 1(2π)N .

The computation of P(x1−M ) might be difficult as it is depending on a-priori data. A PDF ofx1−M needs to be setup in order to compute P(x1−M ).

The PDF(x1−M ) can be modelled in several ways according to the a-priori data and the nature ofthe parameters. If the parameters are independent from each other, the PDF(x1−M ) can be split upaccording to Eq. (3.10).

If there is few information available about a parameter, pessimistic boundaries can be set accordingto common sense. In that case a boxcar PDF

∏

a,b can be assumed for any independent PDF(xj), a

and b being the boundaries of the distribution of the jth parameter.

31

The likelihood P(x1−M |φ1−N ) of an arcart.ref. can now be computed and expanded for multiplesets of parameter values.

3.1.3 The computation and assessment of a likelihood function

The computation of the likelihood of multiple sets of parameter values results into the computationof the likelihood function PDF(x1−M |φ1−N ). The computation of the likelihood function is easy

to do, since the computation of the likelihood P(x1−M |φ1−N ) needs simply to be repeated over thedomain of the parameters. The domain of the likelihood function has thereby as many dimensionsas there are parameters to estimate.

The assessment of the likelihood function can be done in several ways and exists out of parameterand reliability estimation. Parameter estimation is usually done by MLE and will be explained here.Here MLE and the reliability estimation known from literature are discussed. Several parameter andreliability estimators are introduced in Chapter 4.

To differentiate between symbols that comprise single or multiple values, in the Thesis boldfacenotation is used for symbols that comprise multiple values.

In my research the multiple sets of parameter values are distributed according to the boxcar dis-tributions for the M parameters of interest. To differentiate between single sets of parameter valuesand multiple sets of parameter values, multiple sets of parameter values are notated in boldface, i.e.x1−M .

Ferretti et al. [6] and Eineder and Adam [5] used MLE to estimate the expected value of theheight. The Maximum Likelihood (ML) height estimate is equal to:

hML = arg max PDF(h|φ1−N ) (3.13)

This can be generalized for any observation model:

x1−MML = arg max PDF(x1−M |φ1−N ) (3.14)

x1−MML can only be an unique estimate if:

• the distribution of PDF(x1−M ) of Eq. (3.12) has small finite boundaries such that φ1−N0 have

no need in phase unwrapping;

• the distribution of PDF(x1−M ) of Eq. (3.12) has finite boundaries and the design matrix Asatisfies certain conditions. For now the details of those conditions will not be discussed, asSec. 6.1 discusses these in more detail.

To assess the quality of a ML parameter estimate, Eineder and Adam [5] derived the height varianceσ2h as follows:

σ2h =

∫ hb

ha

PDF(h|φ1−N )(h− hML)2dh

∫ hb

haPDF(h|φ1−N )dh

(3.15)

in which ha and hb are the finite boundaries of the height variable h.

32

Ferretti et al. [6] assessed the quality of the estimate in a different way. Their definition of reliabilityR is based on the probability within a height variation:

R =

∫ hML+C

hML−C

PDF(h|φ1−N )dh (3.16)

Here C is the allowed height variation.

More parameter and reliability estimators, generalized for any observation model that satisfy theconditions to apply MLE, are discussed in Chap. 4.

3.1.4 On the uncertainty of a reference pixel

Sec. 3.1.2 explained that in my research the values of the reference point are arbitrarily set to zero.The consequence of this choice is that all the arcsart.ref. within the stack of interferograms have biasedparameter estimates. Moreover, any parameter of any arcart.ref. will be biased with the same value,since all the arcsart.ref. within the stack of interferograms use the same artificial reference point.

There exist several ways to cancel the biases.

The best option to incorporate a real reference point is, in a qualitative sense, to relate all the in-formation of the two arcsart.ref. with each other. This is done by computing the convolution betweenthe two likelihood function and results in a likelihood function of the two corresponding time seriesof phase observations [26].

Let x1 and x2 be multiple sets of a single parameter x for the first and second arcart.ref. and let∆x be the difference between the values of x1 and x2: ∆x = x2 − x1.

By assuming that the parameters of x1 and x2 are mutually independent from each other, theconvolution of the two likelihood functions as a function of the variable ∆x can be defined as:

PDFarc(∆x) = (PDFarcart.ref.1⋆PDFarcart.ref.2)(∆x) =

∫

x1

PDFarcart.ref.1 (x1)PDFarcart.ref.2 (x1+∆x)dx1

(3.17)Equation (3.17) can be generalized for the situation of M parameters:

PDFarc(∆x1−M ) = (PDFarcart.ref.1 ⋆ PDFarcart.ref.2 )(∆x1−M ) =

(3.18)

∫

x11

∫

x21

. . .

∫

xM1

PDFarcart.ref.1

x11

x21...xM1

PDFarcart.ref.2

x11 + ∆x1

1

x21 + ∆x2

1...

xM1 + ∆xM1

dx11dx

21 . . . dx

M1 (3.19)

The likelihood function PDFarc allows to directly estimate the parameters. Also the uncertaintyof the reference pixel can be taken into account by the reliability estimators of (3.15), (3.16) or anyother reliability estimator1.

The convolution of the likelihood functions of the two arcsart.ref. is easy to apply and from a qual-itative point of view also the best application to deal with the uncertainty of a reference pixel: all

1See Chap. 4

33

the information of the likelihood functions of the two arcsart.ref. is taken into account.

The convolution of the likelihood functions of the two arcsart.ref. can also be expensive to computehowever. Only by making heuristic assumptions the computation time can be decreased.

An example of such a heuristic approach is to compute the ML estimates of two arcsart.ref. followedby the differencing of the ML estimates resulting in the parameter estimates of an arc. In that casethe bias will be cancelled. On the other hand the cross-correlation of the stochastic informationof the two arcsart.ref. is not taken into account, resulting in higher estimation noise with respect tothe case in which the convolution would be applied. The computation time is significantly reducedhowever.

Another approach is to relate all the arcsart.ref. to those arcsart.ref. that have a high reliability. Thetime series of phase observations of the reliable reference pixel can thereby considered to be knowndeterministically, ignoring the the stochastic information of the reference pixel. The reliability ofan arc is than only based on the likelihood function of an arcart.ref.. The approach reduces thecomputation time even more.

The latter approach Ferretti et al. and Eineder and Adam have done, using the best availablereference pixel and assuming that its estimate is known deterministically [6] [5].

Of course many more heuristic assumptions can be made. The choice of heuristic assumptiondepends on the desired quality of the estimates and the computation time, and can in general beadapted to any flavor.

3.1.5 An example of the application of MLE

In this section an example of the application of MLE is visualized.

Two interferograms are available that are formed from ERS data. Within both interferograms twopixels, pixel P and pixel Q, are studied that have the same row and column position for both images,see Fig. 3.1.

For the first and second interferogram the perpendicular baselines B1,2⊥ are such that the height

ambiguities h1,22π are equal to 20 and 24 meters per phase cycle. For simplicity it is assumed that

B1,2⊥ are not varying within the interferogram. For P the magnitude of coherence |γ1,2| has been set

to 0.3 and 0.5 respectively, while for Q |γ1,2| equals 0.6 and 0.4 respectively. No multi-looking hasbeen applied (L = 1).

Here we wish to estimate the height difference between point P and Q, according to (3.3), andto estimate the reliability of the height estimate using the definition of (3.16). C is here set to 10meters.

Furthermore a-priori data states that the PDF(h) for both P and Q can be modelled as boxcardistributions: PDF(hP) =

∏

450,550 and PDF(hQ) =∏

470,570. It is assumed that the observationmodel is correct, and that no biases were induced by the Master image. Realistic phase observationshave been simulated. The simulation of phase observations will only be discussed in Chapter 5.

The reference height href and expected reference phase φ0,ref of an artificial reference point areset to zero. Both P and Q are related to this reference point.

Considering only P for the moment, two likelihood functions per interferogram are computed.

While φ1−2 are known directly from the interferogram, and |γ1−2| can be directly estimated from

the interferogram, φ1−20 are set by the height domain and the artificial reference point. Following the

observation model of (3.3) there exist a one-to-one relationship between the expected phase and the

34

PQ

first interferogra

second interferogram

Figure 3.1: Two interferograms with RCs P and Q. The focus lies in estimating the height differencebetween the two RCs. The height ambiguities h

1,22π are 20 and 24 meters for the first and second interferogram

respectively. The magnitude of coherence |γ| for P is for the first interferogram equal to 0.3 and for the secondone 0.5. For Q |γ| equals 0.6 and 0.4 respectively.

height. With φ and |γ| known and φ0 derived from the observation model, two likelihood functionsfor the height can be computed.

Fig. 3.2(a) and 3.2(b) shows the two likelihood functions. Since the phase observations are knownmodulo 2π, the shape of the phase PDF is repeated in the height domain, resulting in multi-modallikelihood functions. In Fig. 3.2(a) and 3.2(b) this can be seen by the repetition interval definedby the height ambiguities of 20 and 24 meters respectively. Notice that the shape of the likelihoodfunctions are governed by the shapes of the phase PDFs.

In both cases the ML estimate for the height can not be estimated since there is not a uniqueglobal maximum. This could of course also be known beforehand as the phase observations are onlyknown modulo 2π.

From Fig. 3.2(a) and 3.2(b) it is only possible to conclude that any parameter, in this case thetopographical height, can not be estimated using a likelihood function based on a single phaseobservation, unless in the trivial case in which the phase observations do not need to be unwrapped.

(a) likelihood function of an arcart.ref. of Pbased on the first interferogram

(b) likelihood function of an arcart.ref. of Pbased on the second interferogram

Figure 3.2: In (a) the likelihood function for the height is shown for an arcP,art.ref. based on pixel P withinthe first interferogram. h1

2π is equal to 20 meters per phase cycle. |γ| is equal to 0.3. In (b) the the likelihoodfunction for the height is shown for an arcP,art.ref. based on pixel P within the second interferogram. h2

2π isequal to 24 meters per phase cycle. |γ| is equal to 0.5. No multi-looking has been applied. Both likelihoodfunctions have been computed using the observation model of (3.3).

35

If the height can be assumed constant during the time intervals between the SAR images, time seriesof phase observations can be used for the estimation of the height. A time-dependent relationship,in this case the height being constant in time, will in general allow to estimate any parameterunambiguously.

Notice that the assumption of the height being constant is in general acceptable: only if majorevents like earth quakes and volcanic eruptions occur the height would differ considerably such thatthis assumption is not valid.

Fig. 3.3(a) and 3.3(b) show the likelihood functions PDFP(h|φ1−2) and PDFQ(h|φ1−2). In Fig.3.3(a) three likelihood functions for the height of P are shown: two likelihood functions basedon only the first and only the second interferogram and one likelihood function based on both theinterferograms. Fig. 3.3(b) shows only the likelihood function of Q based on both the interferograms.

Fig. 3.3(a) shows that the likelihood function based on both the interferograms has its globalmaximum at that location where the other likelihood functions, based on only the first and only thesecond interferogram, have those values in common such that the product of those values is highest.

The likelihood functions for P and Q shown in Fig. 3.3(a) and 3.3(b) have been computed withrespect to an artificial reference point with href and φ1−2

0,ref set to zero. φ1−20,ref at href are not known

however, and therefore are the height PDFs biased.

(a) likelihood function of the height for P based ononly the first, only the second and both the interfer-ograms.

(b) likelihood function of the height for Q based onboth the interferograms.

Figure 3.3: In (a) the likelihood function is shown for P based on only the first, only the second and boththe first and second interferogram. h

1,22π is equal to 20 and 24 meters per phase cycle. |γ| is equal to 0.3 and

0.5 for P in the first and second interferogram respectively. In (b) the likelihood function for Q is shownbased on both the first and second interferogram. h

1,22π is equal to 20 and 24 meters per phase cycle. |γ| is

equal to 0.4 and 0.6 for Q in the first and second interferogram. No multi-looking has been applied, and forboth P and Q artificial reference points have been used.

To compute the height between P and Q, the convolution of the likelihood functions of P and Qhas been computed. The resulting likelihood function for the arc P - Q is shown in Fig. 3.4.

Fig. 3.4 shows again a global maximum for the unbiased height parameter. Due to the uncertaintyof both P and Q the likelihood function is broadened. As can be read from the figure, hML is equal to16.4 meters. Its reliability as defined as by (3.16) is equal to 19.3% as can be seen to the probabilitymass indicated in red.

36

Figure 3.4: The likelihood function for an arc between P and Q. hML is here equal to 16.4 meters. Itsreliability is equal to 19.3% and is indicated by the probability mass in red.

3.2 The interferometric phase PDF

Assuming circular Gaussian distributions of the complex SAR signals, the PDF of the interferometricphase has been derived and published in [27] as:

PDFINSAR(φ;φ0; |γ|;L) =

(1− |γ|2)L

2π

{

Γ(2L− 1)

[Γ(L)]222(L−1)(3.20)

×

[

(2L− 1)β

(1 − β2)L+ 12

(π

2+ arcsinβ

)

+1

(1 − β2)L

]

+1

2(L− 1)

L−2∑

r=0

Γ(L− 12 )

Γ(L− 12 − r)

Γ(L− 1− r)

Γ(L− 1)

1 + (2r + 1)β2

(1− β2)r+2

}

in which φ is the observed phase, φ0 the expected phase, L the effective number of looks and |γ|the magnitude of coherence. The difference between φ and φ0 is here an important variable as isdenoted as ∆φ.

Furthermore, Γ denotes the Gamma function and β = |γ| cos(φ− φ0) = |γ| cos(∆φ).The observed phase φ is known up to an ambiguity of 2π and usually has values on the interval

[−π, π]. φ is therefore “wrapped”. Here φ0 is parameterized ambiguously on the interval [−π, π] aswell, resulting in an unimodal phase PDF.

The multi looking factor L can be set and needs to be a positive integer (L ∈ N) according to thenumber of samples averaged.

Notice that Lee et al [21] used alternative expressions using hypergeometric functions. Hypergeo-metric functions make use of infinite power series, that can be well approximated in its evaluation forlow multi looking factors (L < 10) and/or for low magnitudes of coherence (|γ| < 0.9). As numericalproblems have been experienced for high multi looking factors and/or high magnitudes of coherence,here the PDF is evaluated by Eq. (3.20).

37

The interferometric phase PDF for single looking (L = 1) is [27]:

PDFINSAR(φ;φ0; |γ|;L = 1) =(1 − |γ|2)

2π

{

β(π2 + arcsin(β))

(1 − β2)3/2+

1

1− β2

}

(3.21)

In Fig. 3.5(a) the phase PDF can be seen for the values of |γ| = 0, 0.5, 0.8 and 0.9 and L = 1 for∆φ ranging from −π to π. The figure shows that for a pixel with zero coherence, the PDF of thephase becomes uniform. For PDFs with a coherence larger than zero, in general the case, a peak isformed. The expected phase has here been set to zero.

As can be seen in the figure, an increase in coherence affects the shape of the phase PDF with apeak that is higher and more narrow and a thickness of the tails that is decreased. For a magnitudeof coherence of 1, the PDF looses its tails and becomes a Dirac-delta function. (Not shown in thefigure.)

In Fig. 3.5(b) the phase PDF can be seen for the values of L = 1, 2, 5 and 10 for |γ| = 0.7. Basicallyan increasing multi looking factor L has similar effects on the PDF as an increase in coherence, ascan be seen in the figures.

This can also be seen in Fig. 3.6(a) and 3.6(b), in which the peak value and the phase variance ofthe PDFs are shown for |γ| < 0.9 and L ranging from 1 to 10.

(a) Effects of different coherence magnitudes (b) Effects of different multi looking factors

Figure 3.5: In (a) several phase PDFs for different magnitudes of coherences are shown. The shown numberscorresponds to the magnitude of coherence of the PDF, using a multi-looking factor of one. (L = 1.) (b)shows the phase PDFs for different multi looking factors for |γ| = 0.7. The numbers indicate here thecorresponding multi-looking factors for the PDFs. In both the PDFs the expected phase φ0 has been set tozero.

38

(a) Peak values of the phase PDFs (b) Phase variance of the phase PDFs

Figure 3.6: In (a) and (b) the maximum probability and the variance respectively of phase PDFs fordifferent combinations of the magnitude of coherence and the multi looking factor can be seen.

3.3 An interferometric system overview

The magnitude of coherence is one of the arguments of the interferometric phase PDF discussed inSec. 3.2 and is used in radar interferometry for a measure of the (de)correlation between two SARsignals [31]. It is deduced from the (complex) coherence, also known as the complex correlationcoefficient, and ranges from zero to one: zero is the lower boundary and indicates no correlation andone is the upper boundary and indicates full correlation.

Decorrelation between two SAR signals is both a gift and a nuisance for data interpretation: on onehand it can be used as a source of information about the changes that have occurred in time withina scene, while on the other hand it is a source that contributes to estimation noise. This makes themagnitude of coherence, and its estimation, a main topic within the field of radar interferometry.

To gain more insight in radar interferometry and the magnitude of coherence, in this section aninterferometric system model is studied.

Fig. 3.7 shows the interferometric system model in which several sources of decorrelations areintroduced. Many sources of decorrelations exist [15] [31] [18]. In this system model, the followingdecorrelations are considered:

• decorrelation caused by thermal or SAR receiver noise |γ|thermal;

• decorrelation caused by coregistration and interpolation errors of the resampling of the slaveimage |γ|coreg;

• temporal scene decorrelation, including volume scattering |γ|temporal;

• geometric decorrelation |γ|geom;

• Doppler centroid decorrelation |γ|DC .

The system model shows a repeat-pass interferometric system as a convolution of two independentSAR system processes. The system model has been partially taken over from [18].x(t), the scene reflectivity during time epoch t, is a deterministic representation of the phenomena

occurring within the scene. x(t) is a signal of topography, deformation, atmosphere and other pa-rameters and is twice observed by a SAR system at time epochs t1 and t2.

39

During the SAR acquisition, thermal noise nthermal is introduced by the transfer functions H11

and H21. At this stage both the resulting signals and the thermal noise are modelled as complex (cir-cular) Gaussian distribution. They are assumed to be statistically independent, zero-mean randomprocesses.

After the SAR acquisitions, the signals reach the processing systems H12 and H22 that representthe SAR image processing. Following H22 the slave image still needs to be resampled before aninterferogram can be formed. This process, indicated by H23, introduces additional coregistrationand interpolation errors ncoreg.

At this stage the signals have become realistic Single Look Complex (SLC) signals y1(t) and y2(t)prior to the convolution of the signals:

y(t1) = y1 = |y1|ejψ1

y(t2) = y2 = |y2|ejψ2 (3.22)

where |y1| and |y2| are the magnitude and ψ1 and ψ2 the phase of the two SAR signals. j is herethe imaginary unit.

The two SAR images are now almost ready to be processed into an interferogram. Due to differ-ences in the acquisition process between the two SAR models additional noise need to be added tothe system model. In this model these differences occur due to (eventual) temporal changes in thescatter characteristics of the phenomena, different viewing geometries and Doppler centroid differ-ences and cause the temporal, geometrical and Doppler noise ntemporal, ngeom and nDC respectivelyduring the convolution of the SAR signals2

The convolution results in the interferometric signal z:

z = y1y∗2 = |y1||y2|e

j(ψ1−ψ2) = |z|ejφ (3.23)

where |z| is the magnitude and φ the (random) phase of the interferometric signal. The symbol ∗

denotes the complex conjugate.

At the final stage, after having added different kind of noise, the SAR and interferometric signalscan be compared with each other by the complex coherence γ:

γ =E{y1y

∗2}

√

E{|y1|2}E{|y2|2}= |γ|ejφ0 (3.24)

in which E{·} is the mathematical expectation operator.The magnitude of coherence can be defined as the modulus of the complex coherence:

|γ| =|E{y1y

∗2}|

√

E{|y1|2}E{|y2|2}(3.25)

or alternatively, as the product of the different sources of decorrelation [15] [31]:

|γ| = |γ|thermal × |γ|coreg × |γ|temporal × |γ|geom × |γ|DC (3.26)

2

The geometric artifacts of foreshortening, layover and shadow are here not considered as noise in the interferometricsystem, but are rather the results from SAR viewing geometry.

40

Figure 3.7: An interferometric system model. (See the text for an elaborative explanation.)

3.4 Estimation of the coherence

In this section the estimation of the magnitude of the coherence |γ| is discussed. Coherence estimatorsbased on the assumption of ergodicity are discussed in Sec. 3.4.1, while the theoretical coherencedecomposition of the magnitude of coherence is discussed in Sec. 3.4.2.

3.4.1 Coherence estimation based on ergodicity

In a RC of an interferogram the complex SAR signals measure a summation of scatterers. Evenunder the assumption that the summation of the scatterers can be perfectly described by a PDF, asdone here by a complex circular Gaussian distribution, the true complex coherence of the two signalscan not be estimated: only one realization of the PDF is available per pixel, not sufficient to makeany statistical measure.

Nonetheless, if the spatial observations are assumed to behave equivalent to ensemble observations,i.e. if ergodicity is assumed, |γ| can still be estimated. In that case the data in a spatial window

41

near a RC is used to estimate the magnitude of coherence for the corresponding RC.Assuming spatial ergodicity and the correct removal of the reference fringes result in the following

estimator of |γ| [24]:

|γ| =|∑N

i=1 y(i)1 y

∗(i)2 |

√

∑Ni=1 |y

(i)1 |

2∑Ni=1 |y

(i)2 |

2

(3.27)

in which ˆ|γ| is the estimate of the magnitude of coherence, N are the number of pixels of theCoherence Estimation Window (CEW), i the pixel index, and y1,2 are the SAR signals as defined inEq. (3.22). The symbol ∗ denotes the complex conjugate.

The estimation of the magnitude of coherence as done by Eq. (3.27) is biased due to noise. Thisnoise is not the noise induced by an interferometric system as in Sec. 3.3, but is caused by theestimator itself. Since the CEW has a finite size, the set of possible values of the expected phase φ0

within the CEW is limited, inducing a bias within the CEW.To compensate for the bias caused by the expected phase, the magnitude of coherence can be

estimated as [13]:

|γ| =|∑N

i=1 y(i)1 y

∗(i)2 e−jφ

(i)0 |

√

∑Ni=1 |y

(i)1 |

2∑Ni=1 |y

(i)2 |

2

(3.28)

in which φ(i)0 is the estimate of the expected phase at the ith RC within the CEW. Notice that Eq.

(3.28) is still biased due to the finite number within the CEW.

The biases due to the estimation noise and φ0 in the estimation of the magnitude of coherence canbe well explained by taken into account the correlation model of [31]. In this model, the SLC signalsy1,2 can be split into correlated parts c1,2 and uncorrelated noise parts n1,2:

y(i)1 = c

(i)1 + n

(i)1

y(i)2 = c

(i)2 + n

(i)2 (3.29)

in which i is the pixel index within a CEW of N pixels.Furthermore, c2 can be modelled as a function of c1 and the estimate φ0:

c(i)2 = c

(i)1 e−jφ

(i)0 (3.30)

Substituting Eq. (3.29) and (3.30) into the numerator of Eq. (3.28) gives:

∣

∣

∣

∣

N∑

i=1

y(i)1 y

∗(i)2

∣

∣

∣

∣

=

∣

∣

∣

∣

N∑

i=1

(

c(i)1 c

∗(i)1 e−jφ0

(i)

+ c(i)1 n

∗(i)2 + c

∗(i)1 e−jφ0

(i)

n(i)1 + n

(i)1 n

∗(i)2

)∣

∣

∣

∣

(3.31)

The presence of the sinusoidal factor in the first term of Eq. (3.31) causes |γ| to be underestimated,while the presence of (random) noise, in general not zero, present in the rest of the terms, cause |γ|

42

to be overestimated. Practical computations of the magnitude of coherence yield an estimate thatis higher than the true value [29].

Several estimators of the expected phase exist. For homogeneous targets the ML estimator of φ0

based on the phase values within an estimation window from distributed targets is given by [23]:

φ0 = arctan

[

Im{∑Ni=0 y

(i)1 y

∗(i)2 }

Re{∑N

i=0 y(i)1 y

∗(i)2 }

]

(3.32)

in which Im{·} and Re{·} are the imaginary and real parts of the complex signals.Other methods estimate φ0 on the basis of a fringe rate within a CEW [29] [22]. Hereby play shape

and size of the estimation window an important role in the estimation of a fringe rate. In general itis better to use adaptive window shaping and sizing according to the fringe direction and the locallyvarying fringe rate, if fringes are apparent. If no fringes are apparent it is better to use adaptivewindow sizing according to the locally varying noise level [22].

If a (large) set of interferograms is available of the same scene, another coherence-like estimatorbecomes available that is mostly used by the identification of PSs [7] [19]. If temporal ergodicity isassumed, the coherence-like estimator can be defined as:

|γ∗| =1

T

T∑

t=1

ej(φ(t)−φ

(t)0 ) (3.33)

in which |γ|∗ is the coherence-like estimate, and φ(t) and φ(t)0 are the observed and expected phase

at time t.It need to be stressed that the coherence-like estimate |γ∗| is not the same as the coherence estimate|γ|. The coherence-like estimator gives a measure of the temporal noise and not the spatial noise,and can therefore not be used for the computation of the phase PDF.

3.4.2 Decomposition of the magnitude of coherence

Reviewing Eq. (3.26), the magnitude of coherence can also be theoretically estimated by estimatingthe individual decorelation components:

|γ| = |γ|thermal × |γ|coreg × |γ|temporal × |γ|geom × |γ|DC (3.34)

The individual sources of decorrelation had been introduced in Sec. 3.3.

In this section an overview is given about the estimation of the individual sources of decorrelation.For a more thorough discussion about decorrelation sources it is recommended to read Chapter 4.4from Hanssen [15].

The individual sources of decorrelation discussed in Sec. 3.3 are discussed separately here:

43

Decorrelation by thermal noiseThermal noise is estimated using knowledge about the SAR system. In particular, the received

signal power Pr and the thermal noise of the receiver Pn need to be known. The Signal-to-NoiseRatio of the receiver system SNRthermal is then estimated by dividing Pr by Pn:

SNRthermal =PrPn

(3.35)

Sec. 4.4.1 of [15] discusses an example of the computation of Pr and Pn.

If both SAR images are acquired by the same SAR system, the thermal decorrelation can then beestimated as:

|γ|thermal =1

1 + SNRthermal−1 (3.36)

If the SAR systems are different, the thermal decorrelation can be estimated as:

|γ|thermal =1

√

(1 + SNRthermal,1−1)(1 + SNRthermal,2

−1)(3.37)

in which SNRthermal,1 and SNRthermal,2 are the SNR of the two different systems.

Decorrelation by coregistration and interpolation errorsIt is expected that interpolation and coregistration errors expands any model used to describe the

magnitude of coherence considerably, while on the other hand it gives only a minor contributionto interferometric noise. The discussion of interpolation and coregistration errors is therefore nothere described, and furthermore neglected in the remainder of the Thesis. More information aboutcoregistration and interpolation errors can be found in Section 4.4.5 of Hanssen [15].

Temporal decorrelationTemporal decorrelation is caused by a change of the distribution of scatterers in time and can be

caused by:

• (permanent) deformation, i.e. subsidence or uplift, or the construction or destruction of build-ings;

• seasonal changes of the scene, i.e. the presence of snow in winter, or the changing scatteringproperties for agricultural areas;

• changes in the electrical properties of soil and vegetation, i.e. caused by different moisturecontent;

• changes in volume scattering due to growth of vegetation.

Although the reflectivity of phenomena corresponding to changes in the distribution of scatterers,and therefore also its temporal (de)correlation, is deterministic, its observations are not. Addingthat the sources that cause temporal decorrelation are difficult to identify, makes it rather difficultto model the temporal decorrelation.

Nonetheless some basic models can be derived using simple rules of the thumbs.An often used assumption is that the temporal correlation decreases linearly in time, equivalent to

the change in temporal baselines.

44

Another assumptions is to use a certain height limit in mountains to model as a timber line. Asthe growth of vegetation is a decorrelation source, less temporal decorrelation is expected above theheight limit.

Moreover, the appearance and disappearance of snow can decorrelate the interferograms as well.This behavior can be modelled by a stepfunction as a function of the season.

And so on. Numerous other heuristic assumptions can be made as well.

Geometrical decorrelationGeometrical decorrelation is a result of a difference in the angle of incidence between the two

sensors at the Earth’s surface [8] and is a function of the critical baseline B⊥,crit. The geometricaldecorrelation had already been defined in Sec. 2.2, but is repeated here:

B⊥,crit = λ(BR/c)R1 tan(θinc − ζ)

in which λ is the radar wavelength, BR the range bandwidth, c the speed of light, R1 the rangebetween the satellite sensor and the target, θinc the incidence angle running from the vertical towardsR1, see Fig. 2.2, and ζ the terrain slope, defined positive towards the satellite.

A returned signal is completely decorrelated when the perpendicular baselineB⊥ reaches the criticalbaseline. The geometrical decorrelation is linearly dependent as a fraction of the perpendicularbaseline and the critical baseline. The geometrical correlation can therefore be estimated as [15]:

ˆ|γ|geom =

{

1− B⊥

B⊥,crit, |B⊥| ≤ B⊥,crit

0, |B⊥| > B⊥,crit

Doppler centroid decorrelation

The Doppler centroid decorrelation ˆ|γ|DC can be estimated using the following [15]:

ˆ|γ|DC =

{

1− ∆fDC

BA|∆fDC | ≤ BA

0, |∆fDC | > BA(3.38)

in which ∆fDC is the difference in Doppler centroid frequencies and BA the azimuth bandwidth,also known as the Doppler baseline.

45

Chapter 4

Parameter and Reliability

Estimation

Although parameter estimation is obviously important, the estimation of the reliability of parameterestimates is at least equally important. Reliability estimates are a measure of the estimation noise,and therefore gives vital information about the precision of parameter estimates.

Furthermore, I underline here that reliability does not give statements about any biases within theparameter estimates. The detection and removal of biases can be done by using hypothesis testingand outlier detection and removal, and is discussed in Chap. 7.

Sec. 3.1.3 discussed that Eineder and Adam [5] and Ferretti et al. [6] used different definitionsfor reliability. Here those definitions are reviewed and generalized for any linear observation model.The reliability definitions can be further used within the application of MLE.

Although MLE is the most common parameter estimator, we will see in Sec. 4.2 that several otherparameter estimators can be invented as well. Both parameter and reliability estimation can bebased on the same characteristics of a likelihood function, and are therefore discussed in pairs. InSec. 4.2 four different pairs of parameter and reliability estimators are introduced.

Although three out of the four pairs of parameter and reliability estimators could be easily deducedfrom existing literature, another parameter and reliability estimator needs a more thorough intro-duction. The corresponding parameter and reliability estimator is based on the comparison betweena likelihood function based on actual phase observations and another theoretical likelihood function.Sec. 4.1 introduces that theoretical likelihood function.

46

4.1 A theoretical likelihood function

Consider that for D times the likelihood function PDF(x1−M |φ1−N ) for an arcart.ref. is computed,

and that any dth likelihood function, d = 1 · · ·D, is computed under the following circumstances:

• The a-priori information PDF, PDF(x1−M ), needed for the computation of the likelihoodfunction, see Eq. (3.12) remains the same;

• The N phase PDFs, PDF(φi|x1−M ) for i = 1 · · ·N phase observations, needed for the compu-tation of the likelihood function, see Eq. (3.12) remain the same. This means that the valuesof the magnitude of coherence |γ1−N | and the multi-looking factor L remain the same;

• The observation model remains the same. The set of rank variables within the observationmodel, κ1−N,1−M

2π , do therefore not change as well;

• The observation model does not induce biases within the parameter estimates;

• The parameters that induce the phase observations remain the same, although their values areunknown and still need to be estimated;

• The N phase observations do vary, but only according to the phase PDFs. The phase obser-vations are therefore simply the realizations of the phase PDFs.

In short the conclusion can be made that only the N realizations of the phase observations vary,while the remaining circumstances remain the same and that only the parameter values are un-known. Moreover, the dth likelihood function for the dth time series of phase observations is notatedas PDF(x1−M |φ1−N

d), and its shape is changing per d.

Let us now observe the construction of a dth likelihood function PDF(x1−M |φ1−N

d) in more detail.

Consider that N likelihood functions can be constructed from N single phase observations that areavailable from the time series, resulting in N available multi-modal PDF(φi

d|x1−M ) for i = 1 · · ·N .

In general the nearest maximum of any PDF(φid|x1−M ) will have a shift with respect to the expected

parameter values E{x1−M} due to the phase noise within the phase observation. Fig. 4.1(c) showsthree of such likelihood functions.

All the phase observations of the likelihood function PDF(x1−M |φid) are induced by the same

parameter values. Since the phase noise is different for any likelihood function PDF(φid|x1−M ), the

resulting shift per PDF(φid|x1−M ) is in general different too.

The measure of the individual shift per likelihood function PDF(φid|x1−M ) with respect to the

expected parameter values can in theory be used for the estimation of parameters or their reliability.However, since the expected parameter values are unknown, the measure of the shift can only notbe directly measured. The individual shifts per likelihood function PDF(φi

d|x1−M ) can be compared

with each other though, resulting in a measure of the misalignment between the likelihood func-tion PDF(φi

d|x1−M ), or, as all the phase observations originate from the same time series that the

likelihood function PDF(φ1−M

d|x1−M ) is based on, a measurement of the misalignment within the

likelihood function PDF(φ1−M

d|x1−M ).

Although we do not know the expected parameter values E{x1−M}, it is possible to comparethe misalignment within a likelihood function with a theoretical likelihood function in which nomisalignment takes place.

The shape of such a theoretical likelihood function can be easily build by setting the phase ob-servations equal to the expected phase φ1−N

0 induced by a single set of parameters values x1−M . A

47

(a) Three aligned likelihood functions PDF(φi|x),i = 1 . . . 3, based on three single phase observations.

(b) The resulting likelihood function PDF(x|φ1−3 =φ1−3

0,E{x}) based on all the three phase observations.

(c) Three misaligned likelihood functionsPDF(φi

d|x), i = 1 . . . 3, based on three single

phase observations.

(d) The resulting likelihood function PDF(x|φ1−3

d)

based on all the three phase observations.

Figure 4.1: In all figures the E{x} is shown by the dashed line. Fig. 4.1(a) shows three aligned likelihoodfunctions, PDF(φi|x) for i = 1 · · ·N based on three unreal single phase observations. Fig. 4.1(b) shows the

resulting symmetrical theoretical likelihood function PDF(x|φ1−3 = φ1−3

0,E{x}) based on all the three unrealphase observations. Notice that in both figures all the likelihood functions are symmetrical with respect tothe arbitrarily chosen x = E{x}. Fig. 4.1(c) shows three misaligned likelihood functions based on three realsingle phase observations. Fig. 4.1(d) shows the resulting likelihood function PDF(x|φ1−3

d) for all the real

three phase observations. Notice the offset between xML and E{x} and the difference in shape between thereal and theoretical likelihood functions.

theoretical likelihood function is therefore always perfectly symmetrical with respect to the param-eter values of x1−M , resulting in a perfect alignment within the theoretical likelihood function. Theset of phase realizations that are used to construct a theoretical likelihood function is unlikely tooccur and therefore not realistic. The phase observations used to construct a theoretical likelihoodfunction are termed here as unreal phase observations.

The theoretical likelihood functions are here notated as PDF(x1−M |φ1−N = φ1−N0,x1−M ), in which

φ1−Nx1−M are equal to the expected phase φ1−N

0 induced by the values of x1−M . By setting the val-ues of x1−M to the parameter estimates x1−M , the reliability of the parameter estimates can be

48

estimated: the comparison between the shapes of the real likelihood function and the theoreticallikelihood function results in an assessment of the misalignment within the (real) likelihood functionPDF(φ1−M

d|x1−M ).

Notice that the shape of the theoretical likelihood function remains the same for all the D compu-tations of the likelihood function PDF(φ1−M

d|x1−M )1.

An example of the construction of a PDF(x1−M |φ1−N = φ1−N0,E{x1−M}

) is shown in Fig. 4.1(a) and

4.1(b). Here x is one-dimensional and has been set equal to E{x}.

4.2 Parameter and reliability estimators

In this section four parameter and reliability estimators are introduced. One parameter and reliabilityestimator is based on single points within the likelihood functions, see Sec. 4.2.1,while the othersare based on regions within the likelihood functions, see Sec. 4.2.2. The difference between theestimators are illustrated by an example of a PDF(h|φ1−3) in which the simulated height was set to500 meters, see Fig. 4.2.

The illustrations only show the application of the parameter and reliability estimators in pairs.The pairs are thereby based on the same characteristics of the likelihood function.

Sec. 4.2.3 elaborates shortly on the choice of estimators. Although I did not study the qualityassessment of the estimators, as far as this is possible anyway, the estimators are here compared toeach other on basis of, amongst others, the computational time.

4.2.1 Estimators based on single points

Parameter estimation based on single points has already been introduced in Sec. 3.1.3. It is knownas MLE, and can be defined as:

x1−Mpoint = x1−M

ML = argmax PDF(x1−M |φ1−N ) (4.1)

Reliability estimation can also be based on single points within the likelihood function. A reliabilityestimator can therefore be based on two points and is here defined as:

Rpoint =argmax PDF(x1−M |φ1−N )

arg max′ PDF(x1−M |φ1−N )(4.2)

in which max is the global maximum and max′ the second largest (local) maximum within thePDF(x1−M|φ1−N ). Rpoint is also known as the peak-to-peak ratio.

In Fig. 4.3 hML is found to be equal to 565.6 meter. Rpoint is here equal to 1.08/1.

4.2.2 Estimators based on regions

There exist essentially no difference between estimators based on regions within the likelihood func-tion and estimators based on the whole shape of the likelihood function. The sole difference betweenthese definitions is their boundaries, as the estimators based on the whole shape have their bound-aries specifically defined by the boundaries of PDF(x1−M ), while the estimators based on regions

1Notice that the usage of the phase PDFs that are needed to compute the theoretical likelihood function implythat, from a statistical point of view, phase noise will be present within the phase observations. The unreal phaseobservations used for the theoretical likelihood function do not contain phase noise however. The theoretical likelihoodfunction is therefore a product of statistical lies.

49

Figure 4.2: Up: three likelihood functions PDF(h|φi) for i = 1 · · ·N based on single phase observations

and a simulated height of 500 meters. Down: The resulting likelihood function PDF(h|φ1−3) based on thesame three phase observations and a simulated height of 500 meters. The following settings were arbitrarilyselected: |γ1−3| are equal to 0.35, 0.20 and 0.25 respectively for a multi-looking factor of one (L = 1). B1−3

⊥

have been randomly selected to 897, 275 and 1022 meters respectively.

within the likelihood function can be set to any arbitrarily values.

One of the reliability estimators was already introduced by Ferretti et al. and was shown in Eq.(3.16) in Sec. 3.1.3. It can be generalized as:

Rregion,I =

∫ x1+C1

x1−C1

∫ x2+C2

x2−C2

. . .

∫ xM+CM

xM−CM

PDF(x1−M |φ1−N)dx1dx2 . . . dxM

= P

x1 − C1 ≤ x1 ≤ x1 + C1

x2 − C2 ≤ x2 ≤ x2 + C2

...xM − CM ≤ xM ≤ xM + CM

(4.3)

in which C1−M are the boundaries of the regions. Rregion,I is therefore a measure of reliabilitythat the estimate can be found in the specified region. C1−M can therefore also be regarded as theallowed deviations of the estimate.

50

Figure 4.3: A likelihood function PDF(h|φ1−3) based on three phase observations and a simulated heightof 500 meters. The parameter estimation is here based on ML and the reliability estimation is based on thepeak-to-peak ratio. hML and Rpoint are equal to 565.6 meters and 1.08/1 respectively.

From Eq. (4.3) a parameter estimator based on regions within the PDF(x1−M |φ1−N ) can bederived:

x1−Mregion,I = argmax P

x1 − C1 ≤ x1 ≤ x1 + C1

x2 − C2 ≤ x2 ≤ x2 + C2

...xM − CM ≤ xM ≤ xM + CM

(4.4)

Notice that using Eq. (4.4) actually the most probable region is selected. The parameter estimatesx1−M are only derived by conditioning the estimates to be in the middle of the most probable region.

In Fig. 4.5 hregion,I is found to be equal to 499.6 meters. Rregion,I is here equal to 7.00%, whileC was set to 5 meters.

Another reliability estimator was already discussed by Eineder and Adams [5] and introduced byEq. (3.15) in Sec. 3.1.2. It is equal to the variance of the parameters:

Rregion,II = σ2x1−M =

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

PDF(x1−M |φ1−N )

x1 − x1

x2 − x2

...xM − xM

2

dx1dx2 · · · dxM

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

PDF(x1−M |φ1−N )dx1dx2 · · · dxM

(4.5)in which a1−M and b1−M are the boundaries defined by the user.

51

Figure 4.4: A likelihood function PDF(h|φ1−3) based on three phase observations and a simulated heightof 500 meters. The parameter estimation and reliability estimation are here based on Eq. (4.4) and (4.3).hregion,I and Rregion,I are equal to 500.4 meters and 7.00% respectively. The allowed deviation of theestimates was based on 5 meters. (C = 5 meters.)

From Eq. (4.5) the parameter estimator based on the minimum of variance can be deduced:

x1−Mregion,II = arg min

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

PDF(x1−M |φ1−N )

x1 − x1′

x2 − x2′

...

xM − xM′

2

dx1dx2 · · · dxM

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

PDF(x1−M |φ1−N )dx1dx2 · · ·dxM

(4.6)in which x1′−M ′

are the arguments of Eq. (4.6).

In Fig. 4.5 hregion,II is found to be equal to 499.6 meters. Rregion,II is here equal to 3325 m2,equal to 57.7 meters of std. dev.

The final reliability estimator is based on the theoretical likelihood function PDF(x1−M |φ1−N =

φ1−N0,x1−M ) and was introduced in Sec. 4.1. The shape of the real likelihood function PDF(x1−M |φ1−N )

is compared to a theoretical likelihood function PDF(x1−M |φ1−N = φ1−N0,x1−M ). The unreal phase

observations of the theoretical likelihood function are hereby set to be equal to the expected phaseφ1−N

0,x1−M ) induced by the parameter estimates x1−M .

52

Figure 4.5: A likelihood function PDF(h|φ1−3) based on three phase observations and a simulated height of

500 meters. The parameter estimation and reliability estimation are based on Eq. (4.5) and (4.6). hregion,II

and Rregion,II are equal to 499.6 meters with a minimum variance of 3325 m2 respectively, equal to 57.7meters of std. dev.

The real likelihood function can be compared to the theoretical likelihood function in variousways. Here the comparison between the two likelihood functions is done by computing the Root-Mean-Square (RMS) difference between the two PDFs. The reliability estimator based on the RMSdifference between the two likelihood functions is defined as:

Rregion,III = RMS difference = (4.7)√

√

√

√

√

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

(

PDF(x1−M |φ1−N )− PDF(x1−M |φ1−N = φ1−N0,x1−M )

)2

dx1dx2 · · ·dxM

l1l2 · · · lM

in which l1−M are the lengths of the boxcar distributions.

The parameter estimator follows from Eq. (4.7):

x1−Mregion,III = argmin (4.8)

√

√

√

√

√

∫ xb1

xa1

∫ xb2

xa2

· · ·

∫ xbM

xaM

(

PDF(x1−M |φ1−N )− PDF(x1−M |φ1−N = φ1−N0,x1−M )

)2

dx1dx2 · · ·dxM

l1l2 · · · lM

In Fig. 4.6 hregion,III is found to be equal to 565.6 meters. Rregion,III is here equal to 0.019%difference per meter.

53

Figure 4.6: The real likelihood function PDF(h|φ1−3), in blue, is based on three phase observations and a

simulated height of 500 meters. The theoretical likelihood function PDF(h|φ1−3 = φ1−3

0,h), in green, is based

on three unreal phase observations and a height estimate of 565.6 meters. The parameter and reliabilityestimation are based on Eq. (4.7) and (4.8). hregion,III and Rregion,III are equal to 565.6 meters with aRMS difference of 0.019% per meter respectively.

4.2.3 On the choice of estimators

After the discussion of the various estimators, choices need to be made which estimators to use.Since the quality assessment of the estimators is not investigated here, choices need to be based onother constraints.

One of those constraints is the computational time of the estimators.

Parameter estimation by MLE is based on only one point and is therefore the fastest estimator,while parameter estimation based on the theoretical likelihood function is the slowest, as it needsto build the theoretical likelihood function as many times as there are values within the parameterdomain. The other two have a computational time between the ML estimator and the estimatorbased on the theoretical likelihood function.

Here parameter estimation by MLE is preferred, as the MLE is based on only one point andtherefore fast to compute. It is experienced that the application of MLE to time series of phaseobservations can be a rather time consuming process.

The computational time of the reliability estimators based on regions within or the whole shapeof the likelihood function depend strongly on the boundaries that are chosen. This makes anycomparison between the estimators rather complicated.

Here one could choose the reliability estimator on basis of terms that can easier be understood forlaymen. In that case the estimator based on the probability within the region near the parameterestimate need to be chosen.

The reliability in terms of the standard deviation would also be a rather good choice, as the termstandard deviation is often used in science and therefore well-known as well.

The reliability estimator based on the theoretical likelihood function seems to have high potentialfor future estimation. Except for reliability, the comparison between a theoretical likelihood functionand a real likelihood function gives also information about possible outliers that may be present in the

54

phase observations. Outliers generate in general severe misalignment within the likelihood function,and vice versa can the misalignment indicate the the presence of outliers.

The potential of outlier detection and identification using the reliability estimator based on thetheoretical likelihood function is not further studied. The potential to detect and identify outliers istherefore also not taken into account for the choice of a reliability estimator. I recommend to studythis particular reliability estimator for any future application of outlier detection and identification.The reliability estimator based on the probability within a region is therefore still preferred.

55

Chapter 5

Algorithms to Apply MLE

Prior to the research of hypothesis testing and outlier detection and removal, knowledge about theconstruction and constraints of algorithms to apply MLE is needed. These are discussed in thischapter.

In any MLE algorithm the expected phase φ1−N0 need to be computed. In the first place φ1−N

0

need to be generated in order to be able to compute the likelihood of the parameter values. In thesecond place because these values are needed to simulate the phase observations. In the latter casethe phase observations φ1−N are derived from φ1−N

0 . Sec. 5.1 and 5.2 discuss the computation of

φ1−N0 and the simulation of the phase observations respectively.

Undersampling is a phenomenon that needs to be sufficiently taken care of in order to successfullyapply MLE. Sec. 5.3 explains about the smoothness criteria that need to be embedded in thealgorithm.

5.1 Computation of the expected phase

The expected phase φ1−N0 can be computed from observation models. Here the observation models

of (3.3) until (3.6) are used. The following relations for these observation models hold:

φ1−N0 = W{β1−N

x h} (5.1)

φ1−N0 = W{β1−N

x h+∇} (5.2)

φ1−N0 = W{β1−N

x h−4π

λt1−ND} (5.3)

φ1−N0 = W{β1−N

x h−4π

λt1−ND +∇} (5.4)

in which W{·} is the wrapping operator. β1−Nx and t1−N are the height-to-phase conversion factors

and the temporal baselines expressed in years of N interferograms and λ is the radar wavelength. h,D and ∇ are the height, the deformation rate per year and the bias induced by the Master image.

56

5.2 Simulation of phase observations

In my research the phase observations φ1−N are simulated according to a Monte Carlo algorithmbased on the distribution of the phase PDF. Sec. 3.2 discussed the phase PDF. Fig. 5.2 shows theflow diagram of the main arguments of the simulation of the phase observations.

Figure 5.1: The flow diagram of the main arguments of the simulation of the phase observations φ1−N .See the text for an elaborative explanation

To obtain the phase PDF the multi-looking factor L needs to be known and the magnitudes ofcoherence |γ1−N | need to be estimated. Sec. 3.4 discussed the estimation process of the magnitudesof coherence.

First a vector of phase differences ∆φ1−N is initialized. In my algorithm this is done by discreti-sizing the phase differences ∆φ1−N from −π to π using a step size s of π

500 , after which the phasePDF values are computed using Eq. (3.20)..

The resulting phase noise PDF, phase noise since the expected value is unknown and only thephase noise is computed, the cumulative phase probability of a discretisized phase element φI canbe computed as well:

PINSAR,cum(∆φI) =

I∑

i=1

sP(∆φi) (5.5)

in which i indicates the number of the discretisized phase difference element, s the step size and Ithe phase element in question.

57

Knowing the cumulative phase probabilities for all the 1000 discretisized phase differences resultsin the phase CDF.

With the phase CDF known, a random number is selected between zero and one according to theboxcar distribution

∏

0,1. Interpolation between the discretisized values of the phase CDF and therandom number will return a random number between zero and one that is distributed according tothe phase CDF. For my algorithm the MATLAB program interp1q has been used, which is, accordingto the MATLAB help function, faster than the MATLAB program interp1.

∆φ1−N can now be computed according to:

∆φ1−N = interp1q(R1−NCDF |r

1−NQ

0,1) · 2π − π (5.6)

in which interp1q(·|·) is the interpolation algorithm, where the first input parameter is the tobe interpolated function and the second input parameter the selected boxcar-distribution-generatedrandom values. r1−NQ

0,1are the random numbers between zero and one for an arcart.ref. according to

the boxcar distribution∏

0,1, and R1−NCDF is the distribution according to the shape of the phase CDFs

of arcart.ref., also ranging from zero to one. interp1q(R1−NCDF |r

1−NQ

0,1) is multiplied by 2π to transform

the random values to the phase domain, and corrected by π to adapt ∆φ1−N to the phase domain[−π, π].

In Fig. 5.2(a) a transformation between the random numbers r1−NQ

0,1and the distribution function

R1−NCDF is shown. It can be easily seen that for the phase PDF with |γ| = 0.6 and L = 5 a uniformly

random value will be interpolated such that the shape of the phase PDF is respected. For instance, iffor a certain phase observation within the time series a random value of 0.2 is generated, the numberwould be transformed to a number of about 0.45. The phase difference ∆φ would then be equal to( (0.45× 2π − π =) − π

10 .

Knowing φ1−N0 the phase observations can be simulated:

φ1−N = W{∆φ1−N + φ1−N0 } (5.7)

Here ∆φ1−N and φ1−N0 are merged and wrapped back to form the phase observations in the correct

domain (φ1−N ∈ [−π, π]).

To check the correct functioning of the phase observations simulator, 3000 phase observations havebeen simulated. The histogram of the phase observations can be compared to the defined phase PDFin Fig. 5.2(b).

5.3 On the propagation of numerical errors

For the computation of the likelihood function PDF(x1−M |φ1−N ) the choice of a stepsize is veryimportant. Choosing a stepsize too large within a domain of a parameter introduces undersampling,causing unacceptable large numerical errors to propagate into the parameter estimates. Choosingthe stepsize too small on the other hand results in long computation times.

Notice that the stepsize itself is different per parameter. A multi-dimensional parameter domainhas therefore a number of stepsizes equal to the dimension of the solution space. All the stepsizeswithin a multi-dimensional solution space need therefore to be chosen carefully such that on onehand undersampling is unlikely to occur, while on the other hand the computation time does not

58

(a) Transformation of random numbers

−pi −pi/2 0 pi/2 pi0

0.1

0.2

0.3

0.4

0.5

phase difference [rad]

PD

F

(b) Comparison between 3000 simulated phase observationsand the phase PDF

Figure 5.2: In (a) the transformation is shown between any potential random number according to thecarbox distribution of

Q

0,1and a random number according to the shape of the phase CDF. |γ| and L are

set to 0.6 and 5 respectively. (b) shows the comparison between 3000 simulated phase observations accordingto a phase PDF with |γ| and L equal to 0.6 and 5 respectively and the phase PDF.

become too long.

The phase PDF, defined in Sec. 3.2, is used for the application of MLE to time series of phaseobservations. The shape of the phase PDF is a function of the magnitude of coherence |γ| andmulti-looking factor L, and can in my research vary from very flat functions to functions with verysteep slopes.

Within the application of MLE the phase PDFs are transformed to a parameter domain, afterwhich the product of the transformed phase PDFs is taken and the resulting PDF is normalized toform a likelihood function, see Eq. (3.12).

It is difficult to choose a stepsize within a dimension of the parameter domain. On one hand this

59

is caused by the different shapes that need to be transformed to the parameter domain, resultingin different constraints per phase PDF for a certain maximum allowable numerical error. On theother hand the constraints differ per parameter ambiguity, resulting in time-varying phase variationswithin the solution space.

In this section the influence of parameter ambiguities on numerical errors is shown for a set ofconstant stepsizes within a two-dimensional domain. Here the height h and the deformation rate Dare the parameters to estimate using the observation model of (3.5).

For ERS data with a maximum critical perpendicular baseline B⊥,crit of 1100 meters the minimumheight ambiguity h2π becomes 8.83 meters per phase cycle. In my research, using certain settings,a stepsize of 0.1 meters did not result in any significant numerical errors for phase PDFs basedon a maximum magnitude of coherence of 0.999, a multi-looking factor L of 1 and a minimumperpendicular baseline of 8.83 meters.

The constant stepsize for the deformation rate has here been set to 2 mm/year. The deformationrate ambiguity D2π is a linear function of the time t, and is computed as:

Di2π =

λ

2ti(5.8)

in which i is only an indication of the ith interferogram, not a the mathematical power operator.λ is the radar wavelength.

ERS has a repetition period of 35 days. If all the SAR images for a certain scene are available forinterferometry, Eq. (5.8) becomes for the ith phase observation:

Di2π =

λ

2 35i365.25

(5.9)

The deformation rate ambiguity Di2π for the ith phase observation will become smaller per time

epoch i. For a certain shape of a phase PDF, a constant stepsize and infinite phase observationsavailable, an ith and any consecutive phase observation will therefore experience a numerical errorthat surpasses a certain maximum allowable value, resulting in undersampling.

The occurrence of undersampling caused by parameter ambiguities is here demonstrated by showingseveral different likelihood functions for varying deformation rate ambiguities.

The likelihood functions used here are based on single phase observations. The reason for basing thelikelihood functions on single phase observations is to keep the demonstration simple. A product oftransformed phase PDF values, needed for the application of MLE, results in a product of numericalerrors within the phase PDFs. However, the propagation of a product of numerical errors needs amore thorough numerical analysis than the propagation of a single numerical error, and will not bedone here. I therefore simply assume, without doing a proper numerical analysis, that a numericalerror for a single transformed phase PDF can have the same impact on a parameter estimate as aproduct of numerous numerical errors for multiple phase PDFs.

As has been discussed before, a likelihood function needs to be based on more than a single phaseobservation in order to unwrap the phase observations. Normally you would therefore not base alikelihood function on single phase observations. However, in this case such a likelihood function isuseful to demonstrate the occurrence of undersampling.

Moreover, in the demonstration only theoretical likelihood functions are used. These have beenintroduced in Sec. 4.1, and have normally the advantage that a global maximum can easily be setwithin the solution space of a likelihood function. In this case the likelihood functions are basedon single phase observations, resulting in multi-modal maxima within the solution space. Althoughthere exist not a unique maximum, a common maximum for any likelihood function could be set to

60

a height of 500 meters and a deformation rate of 0 mm/year.

Using these settings the ith theoretical likelihood function PDF(x1−M |φi = φ0,h=500,D=0) is visu-

alized in Fig. 5.3 for the 1st, 4th, 16th and 32nd phase observation (i = 1, 4, 16 and 32) based on ERSdata. Here the simulated height and deformation rate have been set to 500 meters and 0 mm/yearsubsidence. One of the ambiguous maximum values can therefore be found at the location of 500meters and 0 mm/years within all the four solution spaces. Moreover, |γ| and L have been set to0.2 and 1, respectively, and the step size for the height and deformation rate domain have been setto 0.1 meter and 2 millimeter respectively. B⊥ had been set in all cases to 500 meters.

Observing Fig. 5.3 shows that the ambiguous values for PDF(x1−M |φi) are repeating such that

they form linear dike-like shapes within the likelihood functions. A different D2π corresponds to adifferent angle of the dike-like shapes with respect to the height- or deformation rate-axis. This canbe clearly seen in Fig. 5.3(a) and 5.3(b).

The occurrence of undersampling is shown in Fig. 5.3(c) and 5.3(d). In Fig. 5.3(c) undersamplingis here liable for the fluctuation of the values on the lines within the figures. Although difficult tosee, the linear behavior of the dike-like shapes can still be recognized. In Fig. 5.3(d) the dike-likeshapes can not be recognized anymore.

My conclusions are therefore that the likelihood function needs to be constructed with care. Ei-ther the settings for the application of MLE have assigned a certain functional domain from whichis experienced that no significantly large numerical errors occur, resulting in a low likelihood ofundersampling, or a proper numerical analysis needs to be done for a certain time series of phaseobservations and observation model, resulting in a quality assessment of the propagation of productsof numerical errors. The latter case is more time-consuming, but will also result in more knowledgeabout the propagation of numerical errors.

61

(a) t = t1 (35 days) (b) t = t4 (140 days)

(c) t = t16 (560 days) (d) t = t32 (1120 days)

Figure 5.3: In Fig. 5.3(a) to 5.3(d) four theoretical likelihood functions can be seen for the estimationof the height h and the deformation rate D for an arcart.ref using ERS data. While the height ambiguityremains constant, the deformation rate ambiguity becomes smaller, resulting in higher fringe rates withinthe solution space of the likelihood functions. h and D have been discretisized by constant step sizes of 0.1meter and 2 mm/year. In Fig. 5.3(c) and 5.3(d) heavy undersampling can be seen. See the text for anelaborative explanation.

62

Chapter 6

Effects of Variables on a Likelihood

function

Several variables are used in the application of MLE to time series of phase observations. On onehand there are the phase PDF dependent variables, the magnitudes of coherence |γ1−N | and L, while

on the other hand there are the parameter ambiguities κ1−N,1−M2π within the design matrix of an

observation model. They all have influence or effects on the shape of a likelihood function.

It is hard to investigate the influence of variables on the shape of a likelihood function. The randomcharacter of the phase observations makes it difficult to predict the shape of a likelihood functionbeforehand. Still, some insights can be gained from studying the influence of variables.

There exist several ways to study the influence of the variables on the shape of a likelihood func-tion. One way is to do a statistical research using a large number of simulations. For a time series ofrandom phase observations the variables can be varied, after which the impact of the variables on theshape of likelihood functions can be studied and eventually categorized. By choosing a large numberof different time series some statistical conclusions can be made. I would expect that the likelihoodand significance of a certain change in shape of likelihood functions with respect to a variation of avariable can be mapped in detail.

The statistical research can become easily large and computationally expensive. First of all becausethe solution space can be of many dimensions. The computation of a certain likelihood function witha three-dimensional domain using the programming language MATLAB for example can take easilyfour hours for a single arcart.ref

1, and that is just one simulation.

But even for likelihood functions for the estimation of a single parameter the statistical researchcan become too large; The number of variables can become large, and they need all to be variedindependently. Take for instance only the parameter ambiguities. Any parameter ambiguity forinstance will have influence on the solution. Having multiple parameter ambiguities available, thereis an enormous potential of variations available. In other words, there exist a whole spectrum ofvariations that can be assessed. Taking into account the random character of the phase observations,the statistical research seems to become easily an enormous amount of work.

Due to time constraints the statistical research is not done here. Instead the impact of variables,and only for certain cases, is studied for the theoretical likelihood function discussed in Sec. 4.1.The theoretical likelihood function is used here for its ease in predictability: the ML estimates can

1This has been experienced in the application of MLE using real data, and will be discussed in Chap. 8

63

be easily artificially set.

The influence of variation of the phase PDF dependent variables |γ1−N | and L are very shortlydiscussed in Sec. 6.1. The effects of the parameter ambiguities are discussed in more detail in Sec.6.2. In the latter section the impact of estimation errors of parameter ambiguities are explained aswell.

6.1 Influence of the phase PDF-dependent variables

The influence of a magnitude of coherence |γ| and a multi-looking factor L on the shape of the phasePDF have already been explained in Sec. 3.2.

The influence of the magnitudes of coherence |γ1−N | and the multi-looking factor L on a shape of atheoretical likelihood function PDF(x1−M |φ1−N = φ1−N

0,x1−M ) is shown in an example in Fig. 6.1 and6.2. Here no more than two phase observations are used for visualization purposes, and the height isthe only parameter. Two additional theoretical likelihood functions are shown here as well: these arebased on only either the first, or the second phase observation from the first or second interferogram(ifg).

The theoretical likelihood function for the estimation of the height using two phase observationsPDF(h|φ1−2 = φ1−2

0,h=500) is convenient to use here, as both the phase PDFs are perfectly alignedwith respect to an artificial height estimate, here set to 500 meters. At the specified height of 500meters the global maximum occurs, irrespective of the values of |γ1−2| or L. For the influence ofvariables on the likelihood-shape-dependent peak-to-peak ratio, a measure of reliability, the globalmaximum is here used as a stable reference point that can not wander through the likelihood function.

The MLE variables are based on ERS acquisition and are as follows: the perpendicular baselinesB1,2

⊥ are such that the height ambiguities h1,22π are equal to 20 and 24 meters per phase cycle. In

Fig. 6.1(a) and 6.1(b) |γ1| is increasing from 0.1 to 0.5, while |γ2| is increasing from 0.3 to 0.7. Nomulti-looking was applied (L = 1). In Fig. 6.2(a) and 6.2(b) L is increasing from 1 to 5 for |γ1|equal to 0.3 and |γ2| equal to 0.5.

The a-priori PDF has a boxcar distribution according to boundaries of 400 and 600 meters. TheML height estimate has here been set to 500 meters.

Fig. 6.1 shows clearly that the peak-to-peak ratio increases for an increase in both |γ1| and |γ2|.The reliability has therefore improved. This is also the case for an increase in the multi lookingfactor, see Fig. 6.2.

I suspect that, for the increase of the magnitudes of coherences or the multi-looking factor, thereliability for parameter estimates based on real likelihood functions is likely to be improved as well.I also suspect that there will exist cases in which this is not true. One interesting case of which Isuspect that this is not always true, is the case in which the global maximum within the likelihoodfunction jumps from one peak to another peak.

6.2 Influence of the parameter ambiguities

In this section the effects of parameter ambiguities on the shape of a likelihood function is dis-cussed, as well as the propagation of the estimation errors of the parameter ambiguities errors. Forconvenience I only use theoretical likelihood functions here, although the influence of parameter am-biguities are also valid for real likelihood functions.

64

(a) |γ1| = 0.1, |γ2| = 0.3 (b) |γ1| = 0.5, |γ2| = 0.7

Figure 6.1: The influence of a variation of the magnitudes of coherence |γ1,2| on the theoretical likelihoodfunction PDF(h|φ1−2 = φ1−2

0,500) of an arcart.ref . The magnitudes of coherences are indicated below thefigures. The multi looking factor L is equal to 1.

One of the effects of parameter ambiguity variation is the ambiguity lengthening within the param-eter domain. Ambiguity lengthening can result in ambiguous parameter estimates, and is thereforean important topic. The computation of ambiguity lengthening is explained in Sec. 6.2.1.

Another effect of parameter ambiguity variation is, what I term here as the beat frequency-likephenomenon. Beat frequency-like phenomena are discussed in Sec. 6.2.2.

Finally the propagation of parameter ambiguity estimation errors is shortly touched in Sec. 6.2.3.

6.2.1 Ambiguity lengthening within the solution space

Let K1−N,j2π be the parameter ambiguity of parameter xj , see Sec. 3.1.1 for the definition of K1−N,j

2π ,

and let Ki,j2π and Ki′,j

2π be two individual parameter ambiguities within K1−N,j2π . If the ratio between

Ki,j2π and Ki′,j

2π are equal to a ratio of integers qi/qi′

(q ∈ N), xj might only be ambiguously estimated

with an ambiguity lengthening equal to Kjk2π. Only if another Ki′′,j

2π is available that does not result

in a ratio of integers of qi′′

/qi or qi′′

/qi′

the MLE has intrinsically enough resolution power to estimatexj unambiguously.

If such a Ki′′,j2π is not available, xj can only be unambiguously estimated if the ambiguity interval

Kjk2π is compared to the length of the boxcar distribution l of PDF(xj) (l = b− a). Only if Kj

k2π islarger than l, xj can be estimated unambiguously.

The computation of Kjk2π can be done using the following steps. First K1−N,j

2π need to be trans-

formed into integers. The transformation takes place by multiplying K1−N,j2π by 10c, in which c is an

integer that is sufficiently large to transform the floating values of K1−N,j2π into integers.

Secondly the integer number of phase cycles k = k1−N,j need to be computed such that Kjk2π =

ki,j ·Ki,j2π . The number of phase cycles ki,j can be computed according to:

ki,j =

i−1∏

i′=1

Ki,j2π · 10c

N∏

i′=i+1

Ki,j2π · 10c (6.1)

65

(a) L = 1 (b) L = 5

Figure 6.2: The influence of a different multi-looking factors on the theoretical likelihood functionPDF(h|φ1−2 = φ1−2

0,500). The multi-looking factors are indicated below the figures. |γ1,2| are equal to 0.3and 0.5 respectively.

Thirdly all the mutual prime numbers of k1−N,j need to be found, if these are existent. If a primefactor can be mutually found in all k1−N,j, all k1−N,j are divided by this prime factor. This is doneuntil the integers have no common prime factors anymore, resulting into the multiplication factorsk

′1−N .

Finally Kjk2π can be computed. Kj

k2π is equal to:

Kk2π =k′i

10cKi

2π (6.2)

An example. Let the height ambiguities h1−32π be equal to 20, 24 and 30 meters per phase cycle

and let c be equal to zero. (In this case h1−32π are already integers.) k1−3 follow from Eq. (6.1) and

are equal to 360, 300 and 240 respectively. After finding the common prime factors, k1−3 can bereduced to k

′1−N . k′1−N is here equal to 6, 5 and 4 respectively. Kk2π is therefore equal to 120

meters. Therefore, for this case, only if the length of the boxcar distribution of PDF(h) is smallerthan 120 meters an unambiguous estimate of h is possible.

6.2.2 Beat frequency-like phenomena within the likelihood function

In this section a certain phenomenon within a likelihood function is acknowledged. I termed the phe-nomenon as the beat frequency-like phenomenon, in analogue to the occurrence of beat-frequenciesin vibration theory, see Chapter 2 of [17].

Let K1−N,j2π be the parameter ambiguities of the parameter xj and let Ki,j

2π and Ki′,j2π be two

individual parameter ambiguities within K1−N,j2π . Furthermore, let the ratio between Ki,j

2π and Ki′,j2π

approximate a ratio of 1/1.

If no other Ki′′,j2π is available within K1−N,j

2π that does not approximate the ratio of 1/1, beatfrequency-like phenomena can be observed within the PDF0(x

1−M |φ1−N ).

66

The beat frequency-like phenomena is here illustrated by an example of a likelihood functionPDF(h|φ1−2 = φ1−2

0,500) using ERS parameters, see Fig. 6.3(a). |γ1,2| is here equal to 0.7 and 0.4

respectively and L is equal to 1. Furthermore the height ambiguities h1,22π are here equal to 20 and 21

meters per phase cycle respectively, close to the ratio of small integers 1/1. Following the procedureof Sec. 6.2.1 the height ambiguity lengthening is equal to 420 meters. In this example the boundariesof the boxcar PDF of PDF(h) are equal to 0 and 1000 meters (a = 0, b = 1000).

In Fig. 6.3(a) the beat frequency-like phenomenon can be observed by the gradually increasingand decreasing of the subsequent maxima within the theoretical likelihood function. A single beatoccurs within the ambiguity lengthening hk2π of 420 meters. In Fig. 6.3(b) the phenomenon cannot be seen. Here exactly the same parameters are used as in Fig. 6.3(a), except that h1,2

2π are hereequal to 20 and 34 meters per phase cycle respectively. The ratio between h1

2π and h12π is 10/17, far

from a ratio of 1/1. hk2π is here equal to 340 meters per phase cycle.

(a) A beat frequency-like phenomenon (b) No beat frequency-like phenomenon

Figure 6.3: The occurrence of a beat frequency-like phenomenon in a PDF(h|φ1−2 = φ1−2

0,500) can be seen in6.3, while in 6.3(b) the phenomenon can not be seen. In both the figures no multi-looking has been applied(L = 1). |γ1,2| is for both cases equal to 0.7 and 0.4 respectively. In 6.3 the height ambiguities are 20 and21 meters per phase cycle, close to the ratio of 1/1. In 6.3(b) the height ambiguities are 20 and 34 metersper phase cycle, far from the ratio of 1/1. The ambiguity lengthenings are here equal to 420 and 340 metersper phase cycle. The simulated height is in both figures 500 meters.

The occurrence of beat frequency-like phenomena is unlikely to happen for a likelihood functionthat is based on more than two phase observations. This is due to the low probability that multipleK1−N,j

2π have their ratios all approximating to the ratio of 1/1. Features of the beat frequency-like

phenomenon of individual Ki,j2π might still be seen however.

The beat frequency-like phenomenon might be useful for an optimization algorithm: it may shortenthe processing time for finding x1−M .

For time series of more than two phase observations first a likelihood function can be constructed onbasis of only two phase observations, such that the beat frequency-like phenomenon occurs. Using asmart Monte Carlo algorithms the global peak can be easily found, by simply following the graduallyincreasing or decreasing of maxima within the likelihood function.

When the global maximum is found for the likelihood function based on only the two phase ob-servations, it is likely that the global maximum for the likelihood function based on the whole timeseries of phase observations can also be found in the neighborhood of the previously found global

67

maximum. The search for the global maximum within the likelihood function based on the wholetime series of phase observations can then be narrowed down to a smaller domain: only the areaaround the two-phase-observation-based-ML estimate needs to be computed, saving computationtime.

6.2.3 Propagation of an ambiguity estimation error

Consider a general likelihood function PDF(x1−M |φ1−N ). If the parameter ambiguities K1−N,j2π need

to be estimated, estimation errors will propagate into PDF(x1−M |φ1−N ). Here the effects of a rank

variable estimation error on a PDF(x1−M |φ1−N = φ1−N0,x1−M ) is shortly studied.

If a rank variable estimation error occurs, this results into a stretching or shortening of the rele-vant Ki

2π. The stretching or shortening of Ki2π results into transformation errors of the phase PDFs

within the phase domain to the domain of the height h. The transformation error propagates theninto the computation of the likelihood function.

The rank variable estimation error is here illustrated by an example. In this example ERS pa-rameters are used, and |γ1−3| are here equal to 0.7, 0.4 and 0.3 respectively and no multi lookingis applied (L = 1). The boundaries of the boxcar PDF of the height are here set to -500 and 500meters (a = −500, b = 500). The height has been set here to 100 meters with respect to a referencepoint of zero meters. Fig. 6.4 shows the PDF(h|φ1−3 = φ1−3

0,E{h}) without baseline estimation errors.

h1−32π are here equal to 20, 50 and 28 meters per phase cycle.

Figure 6.4: A PDF(h|φ1−3 = φ1−3

0,E{h}) without baseline estimation errors. The following settings have been

used: |γ1−3| is equal to 0.7, 0.4 and 0.3 and no multi-looking has been used (L = 1). The height has beenset here to 100 meters. h1−3

2π have been set here to 20, 50 and 28 meters respectively.

In Fig. 6.5(a) the PDF(h|φ1−3 = φ1−30,E{h}) is shown again, except that here a single baseline

estimation error has occurred and that the same phase observations are used as in Fig. 6.4. Hereh1−3

2π are here equal to the correct 20 and 50 and erroneous 27.990 meters per phase cycle, introducinga realistic perpendicular baseline estimation error of about one centimeter to a perpendicular baselineof 347 meters.

The effect of the transformation error of one of the phase PDFs on the computation of PDF(h|φ1−3 =φ1−3

0,E{h}) can be seen in Fig. 6.5(b). The error is here defined as the correct PDF minus the erro-

neous PDF. The transformation error results into an erroneous overlap between the phase PDFs.The resulting errors within the erroneous PDF are varying, but are fortunately very low.

68

To illustrate the importance of the precision of the rank variables, Fig. 6.6(a) shows the PDF(h|φ1−3 =φ1−3

0,E{h}) for an unrealistic perpendicular baseline estimation error of about 6 meters such that h32π

is equal to 27.5 meters per phase cycle. The remainder of the settings are the same as in Fig. 6.4and 6.5(a).

Observe that in Fig. 6.6(a) a global maximum does not appear at the simulated height of 100

meters. hML is therefore wrongly estimated. I conclude therefore that rank variables need to beknown in high precision in order to not have desastreous effects within the estimation process.

(a) A PDF(h|φ1−3 = φ1−3

0,E{h}) with a baseline estima-

tion error

(b) The errors within the PDF(h|φ1−3 = φ1−3

0,E{h})

caused by a baseline estimation error

Figure 6.5: In Fig. 6.5(a) the same settings have been used as in Fig. 6.4: |γ1−3| is equal to 0.7, 0.4 and0.3 and no multi-looking has been used (L = 1). The same phase observations have been used as in Fig. 6.4.The height has been set here to 100 meters. h1−3

2π have been set to 20, 50 and 27.990 meters respectively,introducing a baseline estimation error of about one centimeter to a perpendicular baseline of 347 meters.

The effect of the transformation error of one of the phase PDFs on the computation of PDF(h|φ1−3 =φ1−3

0,E{h}) can be seen in Fig. 6.6(b). The transformation error results into an erroneous overlap be-

tween the phase PDFs. The resulting errors within the erroneous PDF are varying, and can belocally extremely large.

69

(a) A PDF(h|φ1−3 = φ1−3

0,E{h}) with a baseline estima-

tion error

(b) The errors within the PDF(h|φ1−3 = φ1−3

0,E{h})

caused by a baseline estimation error

Figure 6.6: In Fig. 6.6(a) the same settings have been used as in Fig. 6.4: |γ1−3| is equal to 0.7, 0.4 and0.3 and no multi-looking has been used (L = 1). The same phase observations have been used as in Fig.6.4. The height has been set here to 100 meters. h1−3

2π have been set to 20, 50 and 27.5 meters respectively,introducing an estimation error of about 6 meters of a perpendicular baseline of 347 meters.

70

Chapter 7

Hypothesis Testing and Outlier

Detection and Identification

Chap. 4 discussed four different reliability estimators that could be used for quality assessment ofthe parameter estimates.

Reliability estimates as defined in Chap. 4 are very useful to add a statement about the confi-dence of the parameter estimates. For InSAR reliability estimation is even more important, as themagnitudes of coherences throught the interferograms vary in general strongly, resulting in varyingqualities of parameter estimates.

The reliability estimates as defined in Chap. 4 are all a derivative of the likelihood function. Implic-itly the reliability estimators assume therefore that the observation model, used for the constructionof a likelihood function, is correct. This assumption is not always valid however.

Before reliability estimation, it is best to validate an observation model. This is also known ashypothesis testing. To test hypotheses first two or more hypotheses need to be proposed. Afterthat, the theoretical hypotheses need to be confronted with the only truth that can be measured:the (phase) observations. Using certain tests a number of test statistics are produced, equal to thenumber of proposed hypotheses, after which the test statistics of the hypotheses are compared toeach other. The hypothesis with the best test statistic is finally chosen.

The outcomes of hypothesis testing states which observation model performs best in the sense ofhaving no detected bias. The other observation models prone to testing, performing worse, will havebiased parameter estimates: the parameter estimates have offsets with respect to the parameterestimates based on the best observation model.

Only if the observation model is valid reliability estimates have a meaning. Hypothesis testing givesa statement about the (un)biasedness of parameter estimates, and should therefore be an intrinsicpart of the estimation process. Hypothesis testing is shortly discussed in Sec. 7.1.

A special case of hypothesis testing is outlier detection. In that case one hypothesis is formed thattakes the whole set of phase observations into account, while several other hypotheses are formed byomitting certain sets of phase observations from the whole set of phase observations. The detectionof biases using hypothesis testing, in this case resulting from a certain set of phase observations, andthe identification and removal of the source of the biases, the outlier(s), those phase observationsresponsible for the biases, are two important steps to reduce any biases within the parameter esti-mates. Outlier detection and identification will be thoroughly discussed in Sec. 7.2.

71

Notice that for the discussion about hypothesis testing and outlier detection and identification theapplication of MLE to time series of phase observations is used, as this is the estimation methodconsidered in my research. However, the discussion is also valid for other estimation methods basedon time series of interferometric phase observations.

7.1 Hypothesis testing

Hypothesis testing is considered as a very important aspect of an estimation process. Hypothesistesting is also relatively easy, and therefore its discussion will be short. Here the approach of Teu-nissen [25] is followed.

Hypothesis testing is introduced in Sec. 7.1.1 and demonstrated by an illustrated example in Sec.7.1.2.

7.1.1 Introduction into hypothesis testing

Several tests for the application of hypothesis testing are already available and can be directly ap-plied here. Amongst these are the Generalized Likelihood Ratio test (GLR), the Overall Model Test(OMT) and the Parameter Significance Test (PST).

The GLR test is the ratio between the maximum of a likelihood function based on the nominalhypothesisH0 and the maximum of a likelihood function based on an alternative hypothesisHa [ibid].For the application of MLE to time series of phase observations the GLR test can be mathematicallyexpressed as:

reject H0 ifargmax PDF0(x

1−M |φ1−N )

arg maxPDFa(x1−M |φ1−N )< a (7.1)

in which PDF0(x1−M |φ1−N ) and the PDFa(x

1−M |φ1−N ) are the likelihood function based on thenominal and alternative hypothesis respectively, and a is a threshold.

Notice that the GLR test can be easily adapted by switching to other parameter estimators, suchas those discussed in Chap. 4. The GLR test will not further be discussed here. Instead the OMTwill be used in the example in the next section.

The OMT, another test to test hypotheses, can mathematically be expressed as follows [25]:

reject H0 if eT0Q−1yy e0 > a (7.2)

in which e0 are the estimated errors in the observations based on the nominal hypothesis, Qyythe variance-covariance matrix and a a threshold. ·−1 and ·T are here the inverse and transformoperators for matrices.

Both the OMT and PST are similar to each other. While the OMT tests H0 versus Ha to decide ifthe nominal hypothesis has sufficient parameters with respect to the available observations, in whichH0 has more degrees of freedom than Ha, the PST does the opposite and tests if the H0 versus Ha

does not have too many parameters to model reality. Hereby H0 has less degrees of freedom thanHa.

Hypotheses can also be intrinsically different from each other and still have the same degrees offreedom. The measurements of deformation D for example can be modelled as a linear function oftime, E{D} = a1t, or as a sinus function, E{D} = a2 sin t, in which a1,2 are unknown factors and

72

t the time. Only after extensive testing one can decide which of the two hypotheses is the nominalone.

In my opinion it is better to not make any statements about the nominality of hypotheses priorextensive hypothesis testing. Therefore I have chosen to give the hypotheses numbers, indicated insubscript. Also, I test hypotheses only relatively, that is against other hypotheses, and not againstthresholds.

The OMT statistic Tk,OMT for the application of MLE to time series of phase observations forhypothesis Hk can therefore be expressed as follows:

Tk,OMT = eTQ−1yy e =

N∑

i=1

(W{(φi − φi0,k)})2

σ2i

(7.3)

in which e are here the estimated phase residuals or phase noise (φi − φi0,k), φi0,k being the phase

estimate for hypothesis Hk and phase observation φi and σ2i being the estimated phase variance for

that phase observation.A better observation model induces, in overall, lower phase noise while σ2

i remains the same.Therefore a lower value of the test statistic signifies that the observation model is better.

After having obtained all the test statistics of the various hypotheses, the test statistics can becompared to each other. The test statistic with the lowest value indicates for the OMT the bestobservation model. Mathematically this can be expressed as:

TBEST,OMT = argminT1−K,OMT (7.4)

in which K is the number of hypotheses.

7.1.2 On the importance of hypothesis testing: an example

The importance of hypotheses testing will now be demonstrated by an example in which the OMTis used, see Eq. 7.3 and 7.4.

Figure 7.1 shows the real deformation of an arcart.ref. with respect to time t in years. The deforma-tionD is here equal to the mathematical formulaD = Dt2, in which D is the deformation accelerationequal to -5 mm/year2. The real topographical height h is assumed to be known deterministicallyand has been set to 500 meters.

The real deformation can also be approximated by the linear deformation rate according to themathematical formula D = Dt, in which D is the deformation rate and equal to be -10 mm/year.

Both the real deformation and the approximated deformation model result in 20 mm subsidenceafter two years.

To apply the OMT, first a number of phase observations are simulated according to the realdeformation model. The phase observations have been simulated according to Sec. 5.2. For thesimulation ERS satellite parameters have been used to have phase measurements for approximatelytwo years, here equal to 21 phase observations for the ERS repetition time of 35 days.

73

1 2

−20

−15

−10

−5

0

time [years]

defo

rmat

ion

[mm

]

TRUE MODELAPPROXIMATED MODEL

Figure 7.1: The real deformation and the approximated deformation of a RC. The real deformation is herecomputed according to D = Dt2, in which D is here equal to -5 mm/year2. The approximated deformationis here computed according to D = Dt, in which D is here equal to -10 mm/year.

The observation model to simulate φ1−210 , needed for the simulations, is defined as:

E

φ1

φ2

...

φ21

=

β1x − 4π

λ t21 −2π

β2x − 4π

λ t22 −2π

......

. . .

β2x1 − 4π

λ t221 −2π

h

Da1

a2

...a21

(7.5)

in which β1−21x are the height-to-phase conversion factors dependent on the perpendicular baselines

B1−21⊥ , λ is the wavelenght and a1−21 are the phase ambiguities.Hereby B1−21

⊥ , λ and |γ1−21| are set using ERS satellite parameters. B1−21⊥ are here randomly

chosen according to the the boxcar distribution∏

10,1100. The computation of |γ1−21| takes placeaccording to a decomposition of three decorrelation sources, namely the geometrical, temporal andthermal decorrelation, see Sec. 3.4.2. Here B⊥,crit, T and the SNR have been set to 1100 meters, 5years and 11.7.

Although in this example we know which observation model has been used to simulate the phaseobservations, in practice an observation model needs first to be validated by making use of the onlymeasured truth that is available: the phase observations.

Two situations are here available: in one case the real deformation model is chosen, rejecting theapproximated deformation model rejected, while in the other case the approximated deformationmodel is chosen, rejecting the real deformation model rejected, both according to the outcome ofthe phase observations, or in other words the overall noise within the phase observations. Theobservation model that deals better with the noise, results in better test statistics, and should bechosen as the best observation model.

Here H1 is the hypothesis that for all the observations the deformation D is a function of thedeformation acceleration D, while hypothesis H2 states that for all the observations the deformationD is a function of the deformation rate D. (3.5) shows the observation model used to define H2.

74

To test the hypotheses using the application of MLE further additional information is needed toconstruct the domains in which the solutions can be found. Here I chose that the solution of Dand D need to be found between 0 and -20 mm/year2 and 0 and -20 mm/year respectively. Theparameter PDFs, either PDF(D or PDF(D, are distributed according to boxcar distributions.

Fig. 7.2 show the solutions of the two observation models based on a single set of 21 phaseobservations. Observe that the deformation function based on H1 is overshooting. H1 states thatafter two years the subsidence is equal to 22.7 mm. On the other hand the deformation functionbased on H1 shows much similarity with the shape of the real deformation, while H1 shows onlysimilarity in the middle region of the model where the deformation rate of the real deformation canbe well approximated as a linear deformation rate. H2 states furthermore that after two years thesubsidence is equal to 20.9 mm, which is relatively close to the -20 mm deformation after two years.Only by testing the hypotheses we will know which observation model is best to use.

Using Eq. 7.3 the test statistic of T1,OMT and T2,OMT have been computed. They are equal torespectively 20.47 and 19.51. The test of 7.4 decides that T2,OMT is best, as it has the lowest value.Therefore the hypothesis that the deformation is a function of the deformation acceleration has beenrejected on basis of the set of phase observations that has been used here. Hypothesis H2, on thebasis of the information of the two hypotheses, is here the best hypothesis to use.

The result of this example shows here the importance of hypothesis testing. Although in thisexample the deformation behavior was known a-priori, according to the noise within the phaseobservations, the measured truth, the estimate based on the real deformation model was biased,resulting in the rejection of the real deformation model.

1 2−25

−20

−15

−10

−5

0

time [years]

defo

rmat

ion

[mm

]

REAL MODELESTIMATED MODEL BASED ON HYPOTHESIS 1ESTIMATED MODEL BASED ON HYPOTHESIS 2

Figure 7.2: The real deformation and the approximated deformation of an arcart.ref., and the estimateddeformation functions based on the deformation acceleration, H1, and the deformation rate, H2. Hypothesistesting shows that H2 is better than H1, resulting into the rejection of H1.

75

7.2 Outlier detection and identification

In this section outlier detection and identification for the application of MLE to time series ofphase observations are discussed. My research focusses on outlier identification rather than outlierdetection, and compares five different outlier identification algorithms with each other.

From hindsight, the setup of my research was not chosen conveniently, resulting in difficulties inthe statistical analysis of the performance of the five different outlier identification algorithms. For-tunately the core of the problem could be fixed, although the thread through my research might bedifficult to follow. To follow the thread through my research about outlier detection and identifica-tion, I explain here the chronology of my research, including the problems that needed to be dealtwith.

My original setup was to do a statistical analysis of the performance of the five different outlieridentification algorithms using 3000 sets of 20 simulated phase observations for the estimation ofa single parameter. Some of the phase observations have on purpose been altered such that theywere no longer conform the observation model: they should become outliers. The settings weresuch that any phase observation had a 10% chance of being altered. After having simulated thephase observations, including the altered phase observations that should form outliers, the outlieridentification algorithms, embedded within the application of MLE to the simulated time series ofphase observations, were tested and their performance statistically analyzed.

The performance of the outlier identification algorithms that had been measured that time wasvery moderate. Although some of the outlier identification algorithms were successful and brought,from a statistical point of view, an improvement in the parameter estimates, the improvements inthe parameter estimates were not significant. Since I was not satisfied with these results, I changedthe setup to see if the results could be improved: I tried the same setup, but changed the numberof observations, the multi-looking factor and/or the percentage of chance for a phase observation tobe altered.

In some cases a certain outlier identification algorithm was successful and brought improvementsto the parameter estimates, although still not significant, while in some other cases it was not. In-stead that my new results brought closure to the performance analysis of the outlier identificationalgorithms, they brought a new mystery in my research: how is it possible that the performanceof the outlier identification algorithms is that sensitive to changes within the settings, resulting insome cases for an outlier identification algorithm to improve the estimates, while in other cases itdeteriorates the estimates?

Fortunately, after some reflection period, I found closure for the mystery. I assumed implicitly inthe original setup, and the setups that followed, that the alteration of the phase observations, basedon a certain observation model, would result automatically in outliers. From hindsight this couldnot have been the case however.

Conventionally an outlier is defined as an observation that results in biases within any parameterestimates. The observation is therefore not conform the established observation model, and can berecognized, most importantly, by the detection of a bias, after which the outlier needs to be identified.

For real data, the only means available to know if a observation is not conform a certain observa-tion model, are the bias detection and outlier identification algorithms. From the fact that a biasis (correctly) detected followed by the (correct) identification of any outliers, it can be easily de-duced that the identified outliers are automatically not conform the established observation model.For real data any observations that are not conform a certain observation model can therefore beautomatically defined as outliers.

This is not true for simulated data. For simulated data, it is known beforehand which phase obser-

76

vations are not conform a certain observation model, because it is known which phase observationshave been altered within a simulation. The altered phase observations do not always result in biaseswithin any parameter estimates, and can therefore, conventionally, not always be defined as outliers,although they are not conform the established observation model. The definition forced me to intro-duce the detection of biases within my simulations.

To solve the mystery, first I needed to ascertain that outliers were indeed formed amongst thealtered phase observations. Since the presence of outliers always result, per definition, in biasedparameter estimates, outlier detection needed to be performed before any outlier identification wouldtake place.

After embedding outlier detection within the setup, I found out that in quite some cases, using acertain bias detector still to be discussed, no bias could be detected, while in other cases a bias wasfalsely detected, while there were no phase observations altered. Both cases resulted from parameterestimation noise, that was too severely present within my parameter estimates.

To create biases within the simulations that could also be successfully detected, I had to change thesetup of the simulations. Since I had to deal with the random behavior of the phase observations, it isimpossible to determine, within a single simulation, if a certain setup results also in detectable biases.However, I could establish, for a single simulation, a certain likelihood of occurrence of a detectablebias. Consequently I changed the setup such that in some cases a bias could be successfully detected,after which the outlier identification algorithms would be activated to identify the outliers.

Finally the mystery could be solved: I could now perform the statistical analysis of the five differentoutlier identification algorithms successfully.

In my research about outlier detection and identification, first I show the importance of detectingand identifying outliers in Sec. 7.2.1. The impact of outliers on the parameter estimates can be largeand is shown in an example, resulting in possibly severe consequences in real life.

Sec. 7.2.2 and 7.2.3 discusses two different theories to detect and identify outliers, namely theB-method of testing and the outlier detection and identification based on the laws of conservation.While here the algorithms identify the outliers using essentially the same definition as in the B-method of testing, the outliers can not easily be detected using the same theory. The reason for it isthat the B-method of testing uses Gaussian PDFs, while the phase PDFs used here are not Gaussian,resulting in a missing important link between the criteria for outlier detection and the criteria foroutlier identification. To still detect the occurrence of biases, the detection takes place on accountof the laws of conservation, discussed in Sec. 7.2.3.

Sec. 7.2.4 explains the difficulty of identifying outliers, while Sec. 7.2.5 discusses two differentstrategies to identify the outliers and explains that the five different outlier identification algorithmscan be categorized according to those strategies. The settings in which the phase observations aresimulated, a very important issue in this research, are discussed in Sec. 7.2.6. The working principlesof the five different outlier identification algorithms are discussed in Sec. 7.2.7. Sec. 7.2.8 finallydiscusses the performance of the identification algorithms.

Finally, the future of outlier detection and identification for the application of MLE to time seriesof phase observations is discussed in Sec. 7.2.9.

7.2.1 On the impact of outliers

In this section I show that outliers can have a significant impact on parameter estimates, and thattheir removal can significantly improve the parameter estimates. This will be illustrated in an ex-ample.

77

Sec. 7.1.2 showed an example in which the deformation was modelled. Two hypotheses were tested,after which the OMT showed that one hypothesis was better than the other one in terms of its teststatistics.

Here the same real deformation model has been used to simulate the phase observations, see Fig.7.1. All the settings to simulate the phase observations have been exactly the same as in the exampleof Sec. 7.1.2.

To estimate the deformation, the linear deformation model of Sec. 7.1.2 is used as the observationmodel. It will be shown that the bias of the estimate can be significantly reduced by the detection,identification and removal of outliers.

Fig. 7.3(a) shows 20 estimates of the deformation rate using the observation model of (3.5). Theapproximation of the real deformation is shown as well by the blue line, resulting in a deformationrate of -10 mm/year. It can clearly be seen that the estimates are biased, as the arithmetic meanof the estimates has an offset with respect to the deformation rate based on the approximateddeformation model and is equal to -6.96 mm/year.

(a) Estimates with significant bias. (b) Estimates with only a small bias.

Figure 7.3: In Fig. 7.3(a) 20 estimates are made without removal of the outliers, while in Fig. 7.3(b)20 estimates are made after removal of the outliers. The result is a significant reduction in the bias of theestimates, as can be clearly seen by the arithmetic mean of the estimates in both figures.

Fig. 7.3(b) shows 20 estimates of the deformation rate using exactly the same observation model,but now after outlier detection. The arithmetic mean of the estimates still shows that these arebiased with respect to the approximated true deformation rate. However, the bias has been reducedby a significant 2.95 mm/year. The arithmetic mean of the estimates before the removal of theoutliers has a value of -6.86 mm/year, but could be improved to a value of -9.81 mm/year and isthus much closer to the approximated true -10 mm/year.

The significance of the bias within the estimate depends of course on the application, but mightbe significant for any further action.

For example: an early warning needs to be send to the authorities when a critical value of defor-mation has been reached, for example at a deformation of -15 mm. After two years of measuring,the probability that a warning needs to be send is very small: the arithmetical mean of the esti-mates without the removal of outliers is equal to -6.86 mm/year. A short calculation would give adeformation estimate of -13.72 mm, and would therefore give no reason to send an error.

If the removal of outliers is successfully applied however there is a large probability that a warning

78

needs to be send: the arithmetical mean of the estimates after outlier removal is equal to -9.81mm/year, resulting in a deformation estimate of -19.62 mm/year.

The consequence of not sending an early warning to the authorities could be that the authoritieswill not take further action, which may result in a disaster: humans may now not be evacuated tosafety zones, or a pipe line might break resulting in leakage of oil.

The example shows that outliers may have a huge impact on estimates, and, depending on theapplication of the estimates, the lack of removal may result in additional costs or even catastrophicdisaster.

7.2.2 Outlier detection and identification based on the B-method of test-

ing

Outlier detection and identification based on the B-method of testing is here introduced on basis ofTeunissen [25].

The detection of outliers used in Teunissen occurs by hypothesis testing and is applied to observa-tions y1−N that do not need to be unwrapped: y1−N ∈ R .

To detect outliers, the test of (7.3) is used. The test of (7.3), assuming a Gaussian distribution ofthe observations, has a χ2-distribution: T0,OMT ∼ χ

2(q, 0), in which q is the degrees of freedom andequal to N −M , the number of observations minus the number of the parameters of interest, andsubscript ·0 indicates the nominality of the hypothesis.

The alternative hypothesis Ha is the hypothesis that does not induce any restrictions to the ob-servations. It states the most relaxed conditions: E{y1−N} ∈ R, i.e. no behavior of the observations

is expected. The OMT is here again distributed as a χ2-distribution: Ta,OMT ∼ χ2(q, λ), in whichλ is the non-centrality parameter. The distribution of Ta,OMT has therefore an offset with respectto T0,OMT .H0 will only be rejected if the following condition is applied:

reject H0 if T0,OMT > χ2α(q, 0) (7.6)

in which α is a parameter to set a critical region, or confidence region, within the distribution. αis a probability that H0 is falsely rejected.

With the rejection of H0 a bias is of the hypothesis is detected. The source(s) of the bias, i.e. theoutliers, need to be identified next in the identification step.

The identification step is done in [25] by the w-test. HerebyH0 is expressed as follows: E{y} = Ax.

Ha on the other hand is expressed as: E{y} = Ax+ cy∇, in which ∇ is the bias induced by the ith

observation, and cy a vector of N × 1 zeros except for the ith position which is equal to one. N ishere the number of observations.

The w-test statistic can be computed as follows:

w =cTyQ

−1yy e

√

cTyQ−1yyQeeQ

−1yy cy

(7.7)

in which Qee is the variance-covariance matrix of the estimated errors.

79

The evidence on whether the model error can be identified by an outlier as the ith observation isthen tested according to:

observation i is an outlier, only if |w| > N1/2α1(0, 1) (7.8)

in which α1 is the level of significance, again a variable that expresses the confidence region, N(0, 1)being the normal distribution. α1 is, similar to α, a parameter that can be used to reject H0.

Only if observation i results in a too large value of the w-test statistic, it is accepted as an outlier.The outlier is then identified.α1 needs here to be set according to the value of α within the χ2-distribution of the OMT by

means of the non-centrality parameter λ0:

λ0 = λ(α, q = N −M,γ = γ0) = λ(α1, q = 1, γ = γ0) (7.9)

in which γ is the probability of detection, or the power of test, equal to 1 − β, in which β is theprobability of the error occurring in the cases that H0 should be rejected but instead is accepted.γ needs to be equal in both tests to γ = γ0 in order to have the same criteria in both tests to accept

or reject an hypothesis. If those criteria are different from each other, the bias that has been detectedby a choice of α can no be explained by any outliers that have been identified in the identificationstep.

Due to the choice of α and α1 by means of λ0 and γ0, the OMT and the w-test base their decisionson the same criteria to accept or reject an hypothesis.

This approach is also known as the B-method of testing, and was introduced by Baarda [1]. Fora thorough discussion about the B-method or the detection or identification step in [25] it is recom-mended to read [25] and [1].

The w-test, here applied here at the ith observation, can of course be applied to all the N phaseobservations. This is also known as data snooping, and its approach will be used in working out oneof the two strategies discussed in Sec. 7.2.5.

After the detection and identification of outliers, another hypothesis needs to be formulated inorder to adjust for or removal of the outlier.

Here it is chosen to not adjust the outliers, but to simply remove them. The advantage of thisapproach is that the redundancy will be lowered as the phase observation is no longer used withinthe estimation process, which is not the case in data adjustment. Using data adjustment no loss ofredundancy is taken into account, as the phase observation, although adjusted, is still used in theestimation process.

Notice that the theory that is discussed here assumes that the PDFs of the observations have aGaussian distribution. The phase PDF discussed in Sec. 3.2 assumes a homogeneous distribution ofscattering within a resolution cell, and does therefore not have a Gaussian distribution.

Only if Gaussian distributions of observations can be assumed Eq. (7.9) is valid. Only then theoutlier detection and identification are based on the same non-centrality parameter λ0, and thusare based on equivalent confidence parameters α and α1. The B-method of testing can, for an es-timation process using phase observations with a non-Gaussian distribution, not simply applied here.

However, there exist possibly two ways to still apply the B-method of testing.One way is to assume Gaussian distributions for the phase observations, such that the B-method of

testing can be directly applied. The consequences of this assumption need in that case be extensivelystudied. The error propagation need in that case be well understood.

80

The other way is to find another PDF to detect the biases and to connect a non-centrality parameterλ0 of that PDF with the phase PDF discussed in Sec. 3.2, similar to the χ2-distribution of the B-method of testing. This would include possibly the estimation of phase ambiguity numbers.

Both ways have not been studied here due to time constraints.

Within the outlier identification algorithms only the identification step of the B-method of testingis used. Like in the w-test of (7.7) outliers are here identified by defining a confidence region.

Here the confidence region is conveniently chosen to be the inner probability mass within the phasePDF, a definition that can be easily applied to all phase PDFs based on the varying |γ1−N |. Theouter probability mass define the detection zones.

Fig. 7.4(a) and 7.4(b) show the confidence intervals based on 95% confidence for two different phasePDFs. The 95% probability mass is here used to define the confidence intervals and the detectionzones. Using confidence intervals both outliers and wouldbe-outliers, to be discussed in Sec. 7.2.5,can be detected in any phase PDF.

(a) A confidence interval for a phase PDF based on95% confidence

(b) A confidence interval for another phase PDFbased on 95% confidence

Figure 7.4: In both figures a confidence interval of 95% has been taken. In Fig. 7.4(a) and 7.4(b) the phasePDF based on |γ| = 0.7 and |γ| = 0.3 are shown. The regions in red show the detection zones in which phaseobservations are considered as outliers or wouldbe-outliers. Each detection zone is deduced from the 2.5%probability mass.

7.2.3 Outlier detection and identification based on the laws of conserva-

tion

For the detection of any bias prior to the testing of the outlier identification algorithms, the B-methodof testing, as has been explained in Sec. 7.2.2, is not straight-forward to apply here. Therefore thedetection is done here by making use of the laws of conservation.

Since the outlier identification is done by the outlier identification algorithms based on the B-method of testing, the outlier identification based on the laws of conservation is only discussedshortly.

81

Using outlier detection based on the laws of conservation, a network of closed loops is generated byforming arcs between certain Resolution Cells (RC)s in the spatial domain. Here triangular relationsbetween the RCs are chosen to detect any bias, although other forms of loops are possible as well.Fig. 7.5 shows part of such a network.

Figure 7.5: Part of a network of triangles. Here two triangles are shown in red that are connected witheach other by the common arc 23. Of the triangles the triangle residuals are computed. If one of the triangleresiduals is below the set threshold, and the other one above the set threshold, one time series of phaseobservations that contains at least one outlier can be identified. After that the outlier needs to be identifiedwithin the time series. In the figure, taking into account only the triangle residuals of the triangles shown inred, only the time series of phase observations located at the first or fourth RC can be identified.

Using the triangular loops the parameter estimates within the arcs can be computed. The pa-rameter estimation needs to be done for all the arcs within a triangle. Hereby I chose to use theconvolution of the likelihood functions based on the arcsart.ref., as discussed in Sec. 3.1.4. In order todetect any outliers, the sum of the three estimates within a triangle needs to be computed, hereaftersimply termed as the triangle residual.

If no estimation noise would occur, any triangle residual needs to satisfy the laws of conservation:any residual should be zero in order to satisfy physical geometry. In practise however the estima-tion noise is not zero, resulting in non-zero triangle residuals. In reality therefore a threshold needsto be used in order to prevent the triggering of bias detection for nominal parameter estimation noise.

A bias caused by any outlier can be detected by conditioning the triangle residuals: they need tobe lower than a certain threshold. This can be explained as follows.

If the estimation noise is acceptable from a statistical point of view, that is, below a certain settriangle residual, the triangle residual should be lower than the threshold. In that case no outlierwould be detected.

If the triangle residual is larger than the threshold, this can only be the result of extreme noisepresent in at least one of the parameter estimates.

Notice that within the bias detection errors will occur due to parameter estimation noise. Thesize and frequency of the errors depends of course on the set threshold and the characteristics of theparameter estimation noise.

Consecutively a single time series of phase observations need to be identified that caused the biaswithin the triangle residual. The best way to identify the corresponding time series seems to be onbasis of exclusion.

82

To exclude time series of phase observations that do not have any outliers amongst themselves,here two triangles are connected with each other such that they have one arc in common. In Fig.7.5 that common arc is arc 23.

In case if one of the triangles has a residual below the threshold, while the other one has its residualabove the threshold, one of the four time series of phase observations can be easily identified: Sinceone of the triangles has a residual below the threshold, all three time series of phase observationsthat connect that triangle can be excluded. Therefore the cause of the bias need to be found withinthat RC that is not directly connected with the triangle that has a residual below the threshold.

Notice that in Fig. 7.5, of the two triangular relations shown in red, only the first and fourth timeseries of phase observations can be identified as the cause of a bias.

In my research the further identification of any outlier within the identified time series of phaseobservations occurs by using confidence regions like in the B-method of testing. The identificationprocess of every single outlier identification algorithm is explained in Sec. 7.2.7. However, also onbasis of the laws of conservation the identification of outliers should be possible. Its procedure isessentially simple.

When a time series of phase observations is identified as the source of a bias within a residualof a loop, a simple procedure of data snooping should be able to cope with the identification ofthe outlier. Omitting one observation per time, the parameter estimates are recomputed using theremaining observations.

This can be done N times, N being the number of observations. The N triangle residuals can thenbe easily assessed: the one with the lowest triangle residual should not not contain the outlier. Theobservation that was omitted is then identified as an outlier.

Although difficult to say, since I did not do any research on the identification of outliers based onthe laws of conservation, this approach seems to work fine for simple cases, in which the time seriesof phase observations that contains the outlier can be easily identified.

If not a single time series of phase observations that is causing a bias can be identified however,the identification of outliers will not be straight-forward and seems difficult to do.

On the other hand is outlier detection and identification based on the laws of conservation a verynatural way of detecting and identifying outliers: this approach is based on physical laws, unlike theB-method of testing which makes use of artificial set confidence regions.

In my research only the detection of biases is based on the laws of conservation, simply to detectbiases prior to the testing of the outlier identification algorithms. Of course no link can be foundbetween the criteria to detect biases and the criteria to identify outliers, such as within the B-methodof testing. The set confidence level remains constant and is only arbitrarily set within the settingsof the simulations. However, the bias detection based on the laws of conservation is useful to showat least that the parameter estimates are biased, and have therefore the potential to be improved.

7.2.4 On the difficulty of outlier identification

Although outliers can be in general detected by measuring a bias, in this case using the level ofsignificance α in the B-method of testing, it might be difficult to identify outliers.

A major source for unsuccessful outlier identification can be found by the estimation noise in theparameter estimates. This can be explained as follows.

Using the identification step of the B-method of testing, outliers are identified within the observa-tion domain. Therefore, to identify outliers, one need first to compute the phase estimates using the

83

observation model. This results in phase estimates and phase observations, their difference being theinterferometric phase difference or the phase residuals.

In outlier identification the values of the phase residuals are checked against a critical value. Herethe critical value is a function of the confidence level, as has been explained in Sec. 7.2.2.

The estimation noise causes the phase estimates to be different from the expected phase values,resulting in a shift of the confidence regions. This can be seen in Fig. 7.6, in which the expectedphase is equal to zero radians, and the phase estimate equal to one radian.

Figure 7.6: In the figure two phase PDFs are shown with their identification zones. One is based on theexpected phase, here equal to zero, and has its identification regions shown in red, while the other one is basedon the phase estimate, here equal to one, and has its identification regions colored in orange. Due to thenon-existence of overlap between the two identification regions some phase observations can be erroneouslyassessed to be outliers.

If the difference between the expected phase and the phase estimate is small, the identificationregions of both observation PDFs still have a significant overlap. Although from a statistical point ofview still errors occur in the computation of the phase residuals, resulting in an erroneous assessmentof phase observations to be outliers, the bulk of the observations will be assessed correctly.

If the difference between the expected observation value and the observation estimate is largehowever, the identification zones of both observation PDFs have a significantly smaller, or even nooverlap. This will in general result in large errors in the outlier identification step. Fig. 7.6 showsthe case in which no overlap takes place.

In the following example it will be shown how parameter estimation noise can contribute to anunsuccessful identification of the outlier. In Fig. 7.7(a) and 7.7(b) 10 height measurements can beseen. The true height value is here 500 meters. Using these height observations an estimate of theheight is made based on unweighted least squares. These first height estimates are plotted by thered dashed lines.

Then all the observations are assessed to be outliers. Hereby it is assumed that a bias exist, andthat the outliers can be identified by a critical value of 15 meters from the height estimate. Theobservations that have been identified as outliers are indicated by red.

After removal of the outliers, again height estimates are made. These are shown in the red con-tinuous line. Fig. 7.7(a) shows an improvement of the height estimate, while Fig. 7.7(b) shows

84

a deterioration of the height estimate due to the removal of the incorrectly identified outlier. Theparameter estimation noise was liable for the deterioration of the height estimate.

(a) Successful outlier identification. (b) Unsuccessful outlier identification.

Figure 7.7: In both figures an attempt has been made to identify an outlier using a critical value of 15meters from the height estimate. Here the suspected outlier is shown in red. While the true height valueis shown by the green line, the height estimates before the removal of the suspected outlier is shown bythe dashed red line. The continuous red line shows the height estimates after the removal of the suspectedoutlier. In Fig. 7.7(a) the outlier identification was successful: an improvement of the height estimate canbe seen. In Fig. 7.7(b) the outlier identification was unsuccessful: a deterioration of the height estimate canbe seen. Both figures show the difficulty to identify outliers: for the same criterium, in one case the outlieridentification was successful, while in the other one not.

7.2.5 Strategies for identifying outliers

In Sec. 7.2.2 the outlier was defined. Using the identification zones the outliers could be defined,although the identifying process was still not discussed here.

In order to identify outliers, it is convenient to take the impact of an outlier into account. Theimpact of an outlier has important consequences for the outlier identification. Here two types ofoutlier identification algorithms are distinguished: those that are based on outliers that do only havea small impact on the estimates, and those that do have a significant impact on the estimates.

For those outliers that only have a small impact on the estimates, that is, those that do not imposea significant shift on the estimates, one could simply use the estimate based on all the observations,including any outliers, in an attempt to identify the outliers.

The procedure to identify outliers is then as follows:

1. the likelihood function of an arcart.ref. is computed;

2. the parameters are estimated. See Sec. 4 for several estimation procedures. For the outlieridentification algorithms used here MLE is used;

3. the phase estimates are computed by inserting the parameter estimates into the observationmodel;

4. the phase residuals are estimated by computing the difference between the phase estimates andthe phase observations;

85

5. a comparison between the phase residuals and the identification zones will identify any outliers.

This assumption should theoretically lead to the identification of all the outliers that do not havea significant impact on the estimate. This procedure has been applied in outlier identification algo-rithm I.

If one assumes that the outlier does have a significant impact on the estimates, a cross-validation-procedure needs to be applied that is better known as the jackknife-procedure or data snooping [4][25].

The data snooping procedure consists out of two steps. First the effect of an observation on theparameter estimates is measured by estimating the parameter estimates without the concerningobservation. This is done for all the observations. Then cross-validation takes place. Depending onthe predefined conditions the outlier is then identified.

The procedure to identify outliers exists out of two steps. The first step is applied to all theobservations and is as follows:

1. the likelihood function of an arcart.ref. is computed after the temporal removal of the concerningphase observation;

2. the parameters are estimated. See Sec. 4 for several estimation procedures. For the outlieridentification algorithms MLE is used;

3. the phase estimates are computed by inserting the parameter estimates into the observationmodel;

4. the phase residuals are estimated by computing the difference between the phase estimates andthe phase observations;

5. a comparison between the phase residuals and the identification zones will identify any wouldbe-outliers. Here the wouldbe-outliers are defined as those observations that ly within the detectionzones; the phase observations identified within the identification zones would be outliers if thephase observation would be (permanently) omitted.

Notice the similarity with the w-test discussed in Sec. 7.2.2.

The second step exists out of a cross-validation. The cross-validation can be carried out in differentways but has one thing in common: it recognizes that the removal of an outlier will result into asudden drop of number of wouldbe-outliers, here also termed as the wouldbe-outliers number.

The sudden drop of wouldbe-outliers number can be explained as follows: if an outlier exists withinthe phase observations that has a significant impact on the estimate, the parameter estimates areshifted considerably. Due to this shift the phase estimates have significant biases, resulting in largerphase residuals. The outlier identification algorithms would therefore experience to identify outliersthat in reality do not exist.

The removal of the outlier has the opposite effect: phase residuals become smaller, and suddenlythe outlier identification algorithm experiences no outliers or at least less outliers.

A simple second step might consist therefore by selecting that (omitted) phase observation that hasa single minimum wouldbe-outliers number. This is done in outlier identification algorithm II. Anexample of the identifying procedure using a single minimum wouldbe-outliers number can be foundin Sec. 7.2.7. Other forms of cross-validation have been applied to outlier identification algorithmsIII to V and are explained in Sec. 7.2.7.

86

7.2.6 Settings of the simulations of time series of phase observations

Five different outlier identification algorithms will be consecutively tested for 3000 simulated timeseries of phase observations. During these simulations the outlier identification algorithms are there-fore subject to the same simulations. At the end of 3000 simulations a statistical analysis of theperformance of the outlier identification algorithms is done. This section explains the settings tosimulate the time series of phase observations for the identification of outliers amongst the time seriesof phase observations using MLE.

As has been explained in the beginning of Sec. 7.2, the simulations should be such that the biasshould be on overall successfully detected. The likelihood of detectable biases should therefore belarge when outliers are present, while the likelihood of a false bias detection when no outliers arepresent should be small. The attempts to identify outliers by the outlier identification algorithmsoccur here only after the detection of a bias.

In order to satisfy the constraints for bias detection, the overall parameter estimation noise withinthe simulations needed to be lowered such that the parameter noise, on overall, is unlikely to trigger abias detection. To have a low overall parameter noise the overall reliability of the parameter estimateshas to be significantly large. This has been achieved here by choosing on one hand the time series tohave a large number of phase observations, while on the other hand choosing a multi-looking factorto be significantly large as well.

Also, the simulations need to have a certain likelihood such that the altered phase observations cantrigger a bias detection, after which the outlier identification can take place. The outlier detectionpart is here based on the laws of conservation, as has been explained in Sec. 7.2.3. Hereby only inone arcart.ref. phase observations can be altered, possibly resulting in a bias, while in the other threearcsart.ref. the phase observations are not altered, such that the time series of phase observations inwhich outliers are present is easily identified. To detect further successfully any biases, the percentageof chance to alter a phase observation has been set such that a certain likelihood of bias detection isestablished.

In order to satisfy the constraints, 3000 sets of 70 phase observations are simulated using a certaindecorrelation model and a multi-looking factor of 10, while there exist a 50% chance that the phaseobservations is altered. The constraints resulted in 1030 times that a bias could be detected with anoutlier average of about 35 simulated outliers present in the estimates. The triangle residuals hadbeen based on the difference of the parameter estimates within the triangle, based on the convolutionof the arcsart.ref..

In all the simulations a single observation model is used, namely an observation model in whichthe height is assumed to be known deterministically and an unknown deformation rate that need tobe estimated. This results into the observation model of (3.5).

Hereby the reference height and reference deformation rate have been set to zero. For simplicityno Master bias has been taken into account. The phase observations of all the four arcsart.ref. are allbased on a height equal to 500 meters and a deformation rate equal to -10 mm/year.

To incorporate a certain decorrelation model to simulate the phase observations, |γ1−70|, L andB1−70

⊥ need to be known. Here L has been set to 10, while B1−70⊥ and |γ1−70| are set using ERS

parameters, while the deformation rate domain is set equal to [-20,0] mm/year. The stepsize withinthe deformation rate domain has been set to 0.2 mm/year, while the triangle residual needed to beequal to zero, or, taking the step size into account, below a threshold of 0.1 mm/year.B1−70

⊥ are here randomly chosen according to the the boxcar distribution∏

10,1100. The computa-

tion of |γ1−70| takes place according to a decomposition of three decorrelation sources, namely thegeometrical, temporal and thermal decorrelation, see Sec. 3.4.2 for the explanation of the decorrela-

87

tion sources. Here B⊥,crit, T and the SNR have been set to 1100 meters, 5 years and 11.7.

To alter a phase observation, simply the value of π is added to the phase observation value, afterwhich the value is wrapped back to the phase domain [−π, π]. The expected values of any resultingoutliers are therefore equal to the expected phase values plus π.

The last setting to define is the confidence level. Here the confidence level has been arbitrarilyset to 95%. Since the outliers are defined according to the confidence level, attempts can now bemade to identify them. A table with CDF-values needed to evaluate the phase residuals is computeda-priori to speed up the processing time. Notice that no relation exist between the criteria to detectbiases and the criteria to identify outliers, here equal to a percentage of confidence level.

Finally, the settings of the simulations can in short be characterized such that they produce reliable,and sometimes also biased parameter estimates for the application of MLE such that the constraintsand conditions discussed in this section are satisfied. Outliers could therefore be successfully simu-lated amongst the altered phase observations.

7.2.7 The working principles of the outlier identification algorithms

On basis of the outlier identification based on the B-method of testing, using an artificial confidenceinterval, five different outlier identification algorithms could been created. The working principles ofthe identification algorithms are discussed here.

Outlier identification algorithm I

Sec. 7.2.5 discussed two different types of outlier identification algorithms. Outlier identificationalgorithm I is based on the identification of outliers that have only a small impact on the estimate,and is the only outlier identification algorithm based on this strategy. The pseudocode in Algorithm1 shows the operations that are done by the algorithm.

Using outlier identification algorithm I, I chose to perform the identification of outliers consecutivelyinstead of simultaneously. The choice to identify outliers consecutively is rather arbitrary. An effectof consecutive detection is that it allows the intermediate parameter estimates to wander.

Although outlier identification algorithm I can only detect outliers that have a small impact, result-ing in only a small shift of the parameter estimates for consecutive identification, the identificationof multiple outliers can cumulatively result in a large shift. It is unknown if the wandering of the MLestimates has a positive or negative effect on the correct identification of outliers. It can be studiedhowever by studying the effect on the phase residuals. Due to time constraints this is not done here.

Outlier identification algorithm II

Sec. 7.2.5 discussed two different types of outlier detectors. Outlier detector II is based on theidentification of outliers that do have a large significant impact on the parameter estimates. Thepseudocode in Algorithm 2 shows the operations that are done within outlier identification algorithmII.

Since the impact of a single outlier can overshadow the impact of other outliers, only consecutiveoutlier identification should be applied here.

Outlier identification algorithm II is a simple algorithm that basically functions as the w-testdiscussed in Sec. 7.2.2.

88

The identification of an outlier is here illustrated by an example. In this example 20 phase obser-vations including outliers have been simulated. Following the procedure explained in Sec. 7.2.5, for20 times a single phase observation is omitted. An example of the resulting wouldbe-outliers numbercaused by the omittance of phase observation are shown in Fig. 7.2.7.

Here the 6th phase observation has been identified as an outlier as it is the only candidate that hasa minimum of wouldbe-outliers. The 6th phase observation will therefore be permanently removed.

0 5 10 15 200

1

2

phase observation that has been omitted

wou

ldbe

−ou

tlier

s

Figure 7.8: In the figure the wouldbe-outliers are shown per omitted phase observation. A single minimumwouldbe-outliers number is found at the 6th position, resulting into the identification of the outlier. Herethe 6th phase observation is identified as an outlier and will be permanently removed.

Outlier identification algorithm III

In order to explain the need for other outlier identification algorithms, I need to anticipate onthe performance results of outlier identification algorithm II. The thorough discussion about theperformance of all the outlier identification algorithms takes place in Sec. 7.2.8.

Outlier identification algorithm II can identify only a relative low number of outliers. To identifymore outliers it needs to be recognized that the reason for the low number of outlier identifications iscaused by the low occurrence in which the requirement of outlier identification algorithm II is satis-fied. The single minimum in wouldbe-outliers numbers, needed to identify a single phase observation,is difficult to find within the simulations.

I found two approaches to possibly improve the performance of the identification of outliers thathave a significant impact on the estimate. Here one approach is explained, while the other one isexplained in Sec. 7.2.7.

In outlier identification algorithm III the performance of identifying those outliers is improved byrelaxing the requirement of outlier identification algorithm II.

Here, if no single minimum in wouldbe-outliers numbers can be found when single phase observa-tions are omitted, a combination of kk phase observations are temporally omitted in order to observethe wouldbe-outliers number per combination of phase observations. A single minimum in wouldbe-outliers number indicates then, similar to outlier identification algorithm II, the combination of phaseobservations to be outliers.

After checking all the combinations of two phase observations, for example, and a lack of a sin-gle minimum in wouldbe-outliers numbers occurs, all combinations of three phase observations are

89

checked for wouldbe-outliers. The number kk is thus step-wise raised to expand the number of com-binations based on kk phase observations. The maximum number of kk can depend either on thenumber of phase observations within a time series, or a predefined maximum number of kk is reached.

Notice that the number of combinations can become extremely large for a set of phase observations.Fig. 7.2.7 shows the number of combinations versus the number of phase observations to be omittedfor a time series of 20 phase observations. If, using this example, 10 phase observations need to beomitted, this would result in about 180.000 combinations.

5 10 150

0.5

1

1.5

2x 10

5

number of omitted phase observations

num

ber

of c

ombi

natio

ns

Figure 7.9: The number of phase observations to be omitted versus the number of combinations to beassessed. Notice that easily a high number of combinations can be reached.

In practise, from a computational aspect, a predefined maximum number of kk needs therefore tobe set. For the 3000 simulations the maximum number of kk is set equal to two.

See Alg. 3 and 4 for the pseudo algorithm for outlier identification algorithm III. Notice that Alg.4 is invoked by Alg. 3.

The working principal of outlier identification algorithm III can be best illustrated in an example.In the example five phase observations have been simulated. Fig. 7.10(a) shows the wouldbe-outliersnumber for all the cases in which a single phase observation is omitted. It can clearly be seen thatno single minimum in wouldbe-outliers numbers can be found: all the phase observations except thethird one are candidates to be an outlier. Outlier identification algorithm II would therefore not beable to identify any outlier and would simply return all the five phase observations.

Outlier identification algorithm III however continues its search for outliers. The first step withinoutlier identification algorithm III is to compute all the possible combinations that can be madefor omitting a combination of two phase observations. (kk = 2.) The 10 possible combinations areshown in Table 7.1.

After the setup of the combinations, all the 10 combinations of dual phase observations are tem-porarily omitted to observe the wouldbe-outliers. Fig. 7.10(b) shows the wouldbe-outliers numberper combination. The number refers here to the combination number in Table 7.1.

In Fig. 7.10(b) a minimum in wouldbe-outliers numbers can be found at the seventh location, whilein Fig. 7.10(a) no single minimum in wouldbe-outliers numbers could be found. The observation ofthe single minimum in wouldbe-outliers numbers in Fig. 7.10(b) results into the permanent removalof the seventh combination, being equal to the identification of two outliers. Then the second and

90

1 2 3 4 50

1

2


wou

ldbe

−ou

tlier

s

(a) Wouldbe-outliers per omittance of a single phaseobservation.

1 2 3 4 5 6 7 8 9 100

1

2

3

combination that has been omitted

wou

ldbe

−ou

tlier

s(b) Wouldbe-outliers per omittance of a combina-tion of phase observations.

Figure 7.10: In both Fig. 7.10(a) and 7.10(b) the wouldbe-outliers can be seen in which respectively asingle phase observation and a combination of dual phase observations are omitted. See Table 7.1 whichcombination the location number represents. While in Fig. 7.10(a) a single minimum in wouldbe-outliersnumbers could not be found, resulting in no identification of an outlier for outlier identification algorithmII, in Fig. 7.10(b) outliers could be found, resulting into the identification of the second and the fifth phaseobservation to be outliers for outlier identification algorithm III. Both the second and the fifth observationare then permanently removed.

fifth phase observation of the seventh combination of dual phase observations are removed, see Table7.1, and the procedure is continued until no longer any outliers can be identified.

Outlier identification algorithm IV

Sec. 7.2.7 explained that outlier identification algorithm II is infamous for identifying only a lownumber of outliers. The cause for the low number of outlier identifications could be found by thelow occurrences in which the requirement of outlier identification algorithm II is satisfied. Thereforecould, in most cases, no single minimum in wouldbe-outliers numbers be found. To identify moreoutliers, two approaches exist. One has been already discussed in Sec. 7.2.7. Here the other approachis further discussed.

In outlier identification algorithm IV the performance of identifying those outliers is improvedby attempting to identify the most appropriate outlier candidate if more than one candidates areavailable. This is done by measuring the potential of an outlier candidate to be an outlier. Thepotential of an outlier is similar to MLE: the maximum potential of the outlier candidates shows thecandidate with the highest potential.

number of combination 1 2 3 4 5 6 7 8 9 10concerning phase observations 1, 2 1, 3 1, 4 1, 5 2, 3 2, 4 2, 5 3, 4 3, 5 4, 5

Table 7.1: All the possible combinations of selecting two out of five phase observations.

91

The procedure is as follows.The first step in outlier identification algorithm IV is to select the outlier candidates. This is done

by temporarily omitting a single phase observation and measuring the wouldbe-outliers. All theobservations that have a minimum of wouldbe-outliers are here selected as candidates.

Then combinations are made based on kk phase observations, similar as in outlier identifica-tion algorithm III. After temporally omitting the combination of kk phase observations, again thewouldbe-outliers are counted, resulting into the assignment of a wouldbe-outliers number per omittedcombination of phase observations.

After that the potential of all the outlier candidates is measured. This is done by selecting allthe combinations that, by their omittance, generate a minimum of wouldbe-outliers. Of all thosecombinations the occurrence of every outlier candidate is observed and counted. This is done for allcandidates, resulting in a count of minima of wouldbe-outliers per outlier candidate.

Finally, if a single maximum of the potential of the outlier candidates can be found, the outlier isidentified as the outlier candidate with the maximum potential. If not a single maximum potentialoccurs, the number kk is raised by one to make combinations of kk phase observations and theprocess will start all over again.

Algorithm 5 and 6 show the operations that take place within outlier identification algorithm IV.Algorithm 6 is invoked by algorithm 5.

Notice here the difference between outlier identification algorithm III and IV. While both outlieridentification algorithms make use of the omittance of combinations of phase observations, outlieridentification algorithm III is dependent on the occurrence of a single minimum of a combination ofphase observations, while outlier identification algorithm IV is not. On the other hand, outlier iden-tification algorithm III is allowed to identify multiple outliers, while outlier identification algorithmIV is only able to identify single phase observations.

The working principal of outlier identification algorithm IV can be best illustrated by an example.Here five phase observations have been simulated. Fig. 7.11 shows the wouldbe-outliers numbers ofall the cases in which a single phase observation is omitted. It can clearly be seen that no singleminimum of wouldbe-outliers can be found: all the phase observation except the second one are herecandidates to be an outlier. Outlier identification algorithm II would therefore not be able to identifyany outlier.

Then combinations of dual phase observations are made to omit, after which the wouldbe-outliernumbers are observed in Fig. 7.12(a). The location number refers here to the phase observationslisted in Table 7.1. As can be seen, combinations 2, 5, 7, 8 and 9 have a minimum in wouldbe-outliersnumbers, and therefore those combinations are chosen to measure the potential per outlier candidate.

The measurement of the potential per outlier candidate is shown in Fig. 7.12(b). Hereby theoccurrence of all outlier candidates are counted in combinations 2, 5, 7, 8 and 9. Outlier candidatefive, for example, occurs only two times, namely in combination 7 and 9, and is absent in combination2, 5, and 7, see Table 7.1. Fig. 7.12(b) shows that outlier candidate 3 has the highest potential, andtherefore is identified as an outlier.

Outlier identification algorithm V

Sec. 7.2.7 explained that outlier identification algorithm II is infamous for identifying a relative lownumber of outliers. It had been explained that the cause for the low number of outlier detectionscould be found by the low occurrences in which a single minimum of wouldbe-outliers could be found.Sec. 7.2.7 explained that there were two approaches of identifying more outliers, of which one wasdiscussed in Sec. 7.2.7, while the other approach was discussed in Sec. 7.2.7.

92

1 2 3 4 50

1

2


wou

ldbe

−ou

tlier

s

Figure 7.11: Here the number of wouldbe-outliers can be seen after the temporal removal of single phaseobservations. No single minimum in wouldbe-outliers numbers can be found. Here all the phase observationsexcept the second one are selected as candidates to be an outlier, as all these observations have a minimumin the number of wouldbe-outliers.

1 2 3 4 5 6 7 8 9 100

1

combination that has been omitted

wou

ldbe

−ou

tlier

s

(a) Wouldbe-outliers per omittance of a combi-nation of dual phase observation.

1 3 4 50

1

2

3

4

5

candidates to be omitted

pote

ntia

l of t

he c

andi

date

to b

e an

out

lier

(b) Potential of the outlier candidates.

Figure 7.12: Fig. 7.12(a) the number of wouldbe-outliers can be seen after the temporal removal of all thecombinations of dual phase observations. See Table 7.1 which combination the location number represents.In the figure no single minimum of wouldbe-outliers can be found, but instead seven minima can be found:combinations 2, 5, 7, 8 and 9 have a minimum of wouldbe-outliers. Fig. 7.12(b) shows the potential of theoutlier candidates by counting the occurrence of the respective outlier candidate within those combinationsthat have a minimum in wouldbe-outliers numbers. This can be done by using Table 7.1. Here phaseobservation 3 has the highest potential to be an outlier, and is therefore identified and removed as an outlier.

Outlier detector V makes use of both approaches. Within outlier identification algorithm V, firstoutlier identification algorithm III is applied to see if a single minimum in wouldbe-outliers numbersis possible to identify a combination of kk phase observations to be outliers. If that is not the case,outlier detector IV is applied in an attempt to identify a single outlier based on the potential of theoutlier candidates. If no outlier(s) could be identified, one is added to the number kk after whichthe outlier identification continues.

The operations for outlier identification algorithm V are shown in Alg. 7 and 8. Alg. 8 is invoked

93

by Alg. 7.

7.2.8 Performance of the outlier identification algorithms

To compare the outlier identification algorithms with each other, the performance of the outlieridentification algorithms needs to be measured. This is done by measuring the improvement ordeterioration of the parameter estimates, and by comparing the simulated outliers with the identifiedoutliers.

Of the 3000 simulations, 1030 times a bias was detected, after which the outlier identificationalgorithms were activated to identify outliers.

The following variables have been recorded:

• the triangle residuals. Only if one of the triangles is above the threshold and the other onebelow the threshold the identification step is proceeded. The threshold has been set to 0.1;

• the deformation estimate before and after the removal of any identified outliers 1;

• the number of identified outliers per simulation1;

• the improvement or deterioration of the estimate per simulation;

• the number of real outliers that could not be identified per simulation1;

• the number of phase observations that have incorrectly been identified as outliers1.

In App. B these recordings can be found of all the outlier identification algorithms, including thesituation in which the outliers are correctly removed1.

The triangle residuals

Fig. 7.13 shows the distribution of the triangle residuals in those cases in which the other triangle hadresiduals below the threshold, resulting in the identification of the time series of phase observationsthat contains outliers. The identification of the time series of phase observations is needed for thebias detection based on the laws of conservation, as has been explained in Sec. 7.2.3.

Notice the void just above the triangle residual value of 0.4, resulting in the division of low residualvalues of 0.2 and 0.4 and higher residual values from about 2 and above. The cause of the void I cannot fully explain. Part of the explanation is that the triangle residuals of 0.2 and 0.4, occurring about270 times, are caused by the incorrect detection of a bias: the precision of the parameter estimatesresults in some cases in a triangle residual above the threshold. We will see in the discussion of theparameter estimate improvements and discussions that this is only part of the explanation. Thedivision between low and high residual values is prominently present through the results.

1Although the term “outlier” is used here, the term is not entirely correct: it suggests that all the simulatedaltered phase observations after a bias detection are outliers. Instead it is likely that only some of the altered phaseobservations are outliers, while others do not cause any bias at all.

In the simulations the bias detection occurs only once at the beginning, and not consecutively after any identifiedset of outliers. The bias detection should be embedded consecutively, just like the outlier identification processes, suchthat only outliers would be identified. Although this is a flaw in the setup of my simulations, we will see in the resultsthat the removal of the identified altered phase observations, of which some of them are outliers, results in general inimprovements of the parameter estimates such that the outlier identification was in the end indeed successful. Due totime constraints this flaw is not corrected.

94

0 5 10 15 200

50

100

150

200

250

300

Triangle residuals above the threshold [mm/year]

Figure 7.13: The triangle residuals that have their values above the threshold. The void just below theresidual value of 0.2 shows a void.

The deformation estimates

In Fig. 7.14(a) and 7.14(b) the deformation estimates are shown before and after the removal of thetrue outliers. Since no errors are made here within the identification process, the deformation rateafter removal of the true outliers is about -10 mm/year.

Notice the high precision of the deformation estimates that becomes apparent after the removal ofthe outliers.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160

ML deformation rate estimates [mm/year]

(a) The deformation estimates before outlier detec-tion

−20 −15 −10 −5 00

100

200

300

400

500

600

700

800


(b) The deformation estimates after removal of thetrue outliers

Figure 7.14: Deformation rate estimates before and after the removal of the real outliers.

Fig. 7.14(a) can be seen again in App. A as Fig. 9.1(a), and is shown together with the deformationrate estimates after the testing of the outlier identification algorithms I until V in Fig. 9.1(b) until9.1(f).

95

Observe that all outlier identification algorithms show an increase of estimates being equal to thevalue of -10 mm/year, equal to the simulated value. This is an indication that suggests that allthe outlier identification algorithms function well for reliable but also biased parameter estimates.The observation and interpretation of the improvements and deteriorations of the deformation rateestimates, to be discussed later in this section, will make it possible to make some final judgementsabout the performance of the five different outlier identification algorithms.

The number of identified outliers

Fig. 9.2 shows the number of the outliers that have been identified per outlier identification algorithmversus the number of real outliers that have been simulated. For visual purposes a logarithmic scalehas been applied.

Table 7.2 shows furthermore the mean number of identified outliers per simulation.

Fig. 9.2(a) and Table 7.2 show that the number of outliers that have been identified by outlieridentification algorithm I is almost the same as the number of outliers that have been simulated.

Fig. 9.2(b) until 9.2(e) and Table 7.2 however show instead that only a small number of outlierscould be identified. This could mean that it is difficult to identify outliers with a large impact. Itcould also mean that only a low number of outliers with a large impact occurred. Or both.

Also the maximum number of phase observations to form combinations, kk, set in the algorithmidentification algorithms equal to 2, does of course also have a negative influence on the identificationprocess of outlier identification algorithms III until V, resulting in a premature interruption of theidentification process. I suspect however, on basis of my experience working with outlier identifi-cation algorithm III until V, that although the number of identified outliers is influenced by thenumber kk, it only influences the performance in a regressive manner: I experienced the occurrenceof high numbers of kk, although they were not frequently reached.

Fig. 9.2(b) shows that the number of outliers that have been identified by outlier identificationalgorithm II is very small compared to the number of outliers that have been simulated, and in mostcases even zero. The cause was already explained in Sec. 7.2.7. The low number of cases in which asingle minimum of wouldbe-outliers numbers is responsible for the low number of identified outliers.

By relaxing the conditions in outlier identification algorithm III until V more outliers can beidentified. Two approaches that relax these conditions have been discussed in Sec. 7.2.7, and bothidentify more outliers, as can be seen in Fig. 9.2(c) and 9.2(d), although the number of identifiedoutliers is still small.

Of outlier identification algorithm II until V, outlier identification algorithm IV and V identifythe largest number of outliers. Apparently the second approach, the approach based on the highestpotential of an observation to be an outlier, relaxes the conditions more than the first approach, theapproach based on the simultaneous identification of multiple outliers.

Notice that the number of identified outliers does not allow to make any conclusions about thecorrectness of the identified outliers.

96

Outlier identification algorithm I 31.43Outlier identification algorithm II 0.14Outlier identification algorithm III 0.23Outlier identification algorithm IV 0.83Outlier identification algorithm V 0.80Real situation 37.37

Table 7.2: The arithmetical mean of the identified outliers.

The improvements and deteriorations of the parameter estimates

Fig. 9.3 shows per outlier identification algorithm for the 1030 simulations the distribution of theimprovements and deteriorations of the parameter estimates after the identification and removal ofthe outliers . Also the improvements are shown here after the removal of the real outliers.

Notice in Fig. 9.3(a) that, after the removal of the real outliers, again the void between the valuesof 0 and 0.2 and the higher values starting with the value of about 2, and that in about 135 casesno bias seem to have occurred: in about 135 cases no improvement can be seen.

By observing the distribution after removal of all the real outliers and the distributions after theremoval of the identified outliers of outlier identification algorithm I until V it can be observed thatthere exist large differences between the reached improvements/deteriorations per simulation. Whilesome simulations show improvements, some others show deteriorations.

This is not surprising: 50% of the phase observations is altered, resulting in a large number ofoutliers to identify. Also, since the altered phase observations contribute to extra estimation noise,it becomes more difficult to identify outliers correctly, as Sec. 7.2.4 explained.

Still, from Fig. 9.3(b) until 9.3(f) it can be concluded that there are more estimates that havebeen improved than those that have been deteriorated.

Fig. 9.3(b) shows that most improvements and deteriorations are small. The lack of a signifi-cant number of large improvements or deteriorations can be explained well by the nature of outlieridentification algorithm I, as it should only be able to identify outliers that only have a small impact.

Fig. 9.3(e) and 9.3(f) instead show that a significant high number of large improvements ordeteriorations become apparent after their application, resulting from the strategy that these arebased on. Fig. 9.3(c) and 9.3(d) show that outlier identification algorithm II and III are not veryeffective in identifying algorithms, although outlier identification algorithm III performs better thanII.

Observing the peak values of Fig. 9.3(c) until 9.3(f) at the value of zero improvement or deteri-oration, I conclude that outlier identification algorithm II until V are progressively effective in theidentification of outliers: while outlier identification algorithm II still has a value of about 900 ofzero improvements, outlier identification algorithms III until V have values of about 860, 670 and620 respectively.

From a statistical point of view the application of all the identification algorithms on the setsimulations the estimates have been improved. This can also been seen in Table 7.3 that showthe mean of the improvements for the five identification algorithms and the real situation. Theapplication of all the identification algorithms results thus in a general positive improvement for theestimates.

97

Outlier identification algorithm I 0.158Outlier identification algorithm II 0.133Outlier identification algorithm III 0.206Outlier identification algorithm IV 0.426Outlier identification algorithm V 0.519Real situation 4.438

Table 7.3: The mean improvement of the deformation rate estimates after the removal of the identified/realoutliers.

The number of outliers that have not been identified.

Fig. 9.4 shows the distributions of the number of real outliers that have not been identified usingthe outlier identification algorithms. For visual purposes a logarithmic scale has again been applied.

Fig. 9.4(a) shows the distribution of the number of outliers that have not been identified byusing outlier identification algorithm I. Notice that more than half of the real outliers could notbe identified, although the amount is still significant compared to outlier identification algorithm IIuntil V.

Observe that a large amount of simulations has not identified 12 to 30 outliers, while a smallernumber of simulations has missed only 6 to 12 outliers and therefore clearly performs better. Thiscan be clearly seen by the two present clouds of outcomes in Fig. 9.4(a).

The distinction of the clouds might be caused by the void shown before in the triangle residuals.Moreover, I suspect that the lower cloud in Fig. 9.4(a) resembles those simulations that have theirdeformation estimates around -10 mm/year prior outlier identification and removal. By having theirdeformation estimates close to -10 mm/year, a large overlap between confidence regions should havebeen achieved, resulting in a relative good performance of outlier identification algorithm I for thosesimulations.

Fig. 9.4(b) until 9.4(e) show that most the real outliers have been missed in the identificationprocess. Due to the large number of simulations in which no outliers could be identified it is difficult tomake any statements about the performance of outlier identification algorithm II until V concerningthe non-identification of the real outliers.

It can be stated however that outlier identification algorithm IV and V perform better than outlieridentification algorithm II and III.

Table 7.4 shows the percentage of real outliers that could not be identified by outlier identificationalgorithm II until V. The percentage is based on all the 1030 simulations.

Outlier identification algorithm I 46.36%Outlier identification algorithm II 99.72%Outlier identification algorithm III 99.55%Outlier identification algorithm IV 98.47%Outlier identification algorithm V 98.57%

Table 7.4: The percentage of real outliers that could not be identified.

98

The number of outliers that have been incorrectly identified

Fig. 9.5 show the number of outliers that have been incorrectly identified versus the number ofidentified outliers. Table 7.5 shows the percentage of the incorrectly identified outliers per outlieridentification algorithm.

Outlier identification algorithm I 35.74%Outlier identification algorithm II 3.20%Outlier identification algorithm III 4.46%Outlier identification algorithm IV 12.80%Outlier identification algorithm V 14.69%

Table 7.5: The percentage of the incorrectly identified outliers.

In Fig. 9.5(a) again two clouds can be seen, of which one cloud performs better than the otherone. Again, the lower cloud might be the result of all the simulations that have their deformationrate estimates near the expected -10 mm/year, resulting in a significant better overlap of detectionzones.

Table 7.5 shows here that, although outlier identification algorithm I identifies more outliers, italso incorrectly identifies a relative large number of phase observations to be outliers.

Fig. 9.5(b) until 9.5(e) and Table 7.5 show that outlier identification algorithm II until V identify asignificantly less number of incorrectly identified outliers compared to outlier identification algorithmI. Outlier identification algorithm II performs hereby the best, although it does identify the lowestnumber of outliers.

99

7.2.9 The future for outlier detection and identification

This section is divided in two parts: one discusses the future of outlier detection and identificationin general, while the other one discusses the future of my developed outlier identification algorithms.

On outlier detection and identification

For outlier detection and identification two different methods have been discussed: the B-methodof testing and the method based on the laws of conservation. Although it is not easy to comparethese theories, as these are fundamentally different, their strength and weaknesses have already beenshortly discussed. I recommend to study both outlier and identification methods in more detail, andmoreover another one that has not been studied here.

The method based on the laws of conservation may have difficulties to identify the time seriesof phase observations that include outliers. Based on a network of triangles, time series could beexcluded to have no outliers, such that some time series could be identified to have outliers amongsttheir phase observations. The parameter estimation noise made it difficult though.

I expect that it will be easier to identify the time series having outliers amongst their phase ob-servations if loops are formed that are based on more than three RCs. Using loops that are basedon more than three RCs, I expect that the temporal omittance of a time series will aid in the iden-tification of the time series that has outliers amongst their phase observations. The process will besimilar to the approach of data snooping that had also been used in the outlier identification algo-rithms. Moreover, the knowledge about the outlier identification algorithms can be easily adaptedfor time-series-with-outliers-amongst-them identification algorithms. I suspect that the amount ofRCs within the loops will be an important parameter to identify the time series with outliers amongsttheir phase observations.

For the B-method of testing the criteria for outlier detection and the criteria for outlier identi-fication are still not linked. I recommend, as mentioned earlier, to study both the approaches inorder to link the outlier detection with the outlier identification: to study the error propagation byassuming Gaussian phase PDFs, and to attempt to find a similar PDF to detect outliers, similar tothe χ2-distribution of the B-method of testing.

My approach, far from optimal as has been explained earlier, has been a mix of the two methods.I would certainly not recommend to mix the two theories with each other in any future research,although it did show in my research that the parameter estimates were biased, and that the estimatescould be improved by identifying and removing the altered phase observations not conform thenominal observation model, resulting in the indirect identification of outliers.

The systematic improvements of the parameter estimates made me confident that the outlier iden-tification algorithms function well, although the degree in which these are successful is unknownwith respect to the relation between overall parameter noise and the bias magnitude induced by anyoutliers. I recommend to study the performance of outlier identification algorithms in more detailas well, preferably combined with outlier detection based on the B-method of testing.

Furthermore, I would like to underline that the bias detection should be embedded before theoutlier identification, and should be embedded consecutively, similar to any outlier identificationalgorithm. Although the identification of simulated phase observations not conform the nominal ob-servation model can give a measure of the performance of an outlier identification algorithm, withinthe simulations reality should be simulated well: in reality only those phase observations not conformthe nominal observation model can be identified if they bias the parameter estimates.

100

Another form of outlier detection and identification can be further studied as well. During myresearch I came up to the idea that all the maxima within a likelihood function can be, to a certainextent, tracked, while phase observations are temporarily omitted. Although it is natural that certainpeaks shifts, appear or disappear within a likelihood function, I believe that certain characteristicscaused by peak shifts, appearances of disappearances could identify outliers as well. I recommendto study this peak tracking approach as well.

On the outlier identification algorithms

Although I have already shown that my developed outlier identification algorithms perform well,some aspects can be further improved or studied.

For outlier identification algorithm I, I recommend to study the wandering effect of intermediateparameter estimates caused by the consecutive outlier identification as mentioned before, comparedto the simultaneous outlier identification without wandering effect.

Also, since outlier identification algorithm I is based on a identifying outliers that have only a smallimpact on the estimate, unlike the other identification algorithms that are based on the identificationof outliers with a large impact on the estimate, I recommend to study the effect of activating iden-tification algorithm I after one of identification algorithms II until V have been activated. I expectthat the parameter estimates would even improve more, from a statistical point of view.

I expect for outlier identification algorithms III until V that perform better if higher numbers ofkk can be reached than the set number of 2 within the algorithms. On the other hand the longprocessing time for the identification of outliers for a single time series need to be prevented. Otherprogramming languages might be considered as well, such as C++, such that the computation timewill take less time.

Also, although outlier identification algorithm III until V can positively identify outliers on basisof the wouldbe-outliers numbers, a proper quality assessment of the chosen outlier candidate is stillmissing: the choice of any (combination of) phase observations to be omitted is simply chosen, butits choice is not assessed.

One could think for example of a minimum of difference between the minimum of wouldbe-outliersnumbers and its second minimum of wouldbe-outliers numbers similar to the peak-to-peak ratio.Only if a certain condition is fulfilled the outlier(s) would be permanently removed.

Furthermore, outlier identification algorithm III until V base their selection of candidates afterthe first simple cycle of data snooping by omitting only one phase observation. I expect that theperformance of outlier identification algorithm III until V improve even more if this selection doesnot take place, but that simply all phase observations are selected as candidates.

Outlier identification algorithm IV and V identify outliers by measuring the potential of phaseobservations to be outliers. The potential estimate has here been based on the maximum numberof minima of wouldbe-outliers numbers. Similar to any parameter estimator, the potential estimatecan also be based on other estimators. One could consider for example also the number of secondaryminima of wouldbe-outliers numbers for the estimation of potential. Perhaps even weights perwouldbe-outliers number could be considered.

101

Chapter 8

The MLE Applied to Real Data

In this chapter the MLE is applied to real data. Although I planned to estimate for all the timeseries of phase observations available within the stack of interferograms, I could not for certain rea-sons. Also outlier detection and identification could not be done here due to time constraints. Still,valuable insight and information could be gained from the experience working with real data.

I wish to estimate the bias induced by the Master image, hereafter simply termed Master bias,next to another parameter, here the topographical height. The first reason to estimate the Masterbias is to satisfy the assumption of independency needed to compute the likelihood function, as hasalso been discussed in Sec. 3.1.2, while the second reason was to compute part of the atmosphericaldistortions. For the estimation of the height and the Master bias I use the observation model of 3.2.

In order to estimate the Master bias, I need to assume that the Master bias is only caused byatmospherical distortions. Moreover, I assume that the atmospherical distortions do not contributeto any scattering, such that it is only a time-delaying medium for the radar waves. This wouldmean that the scattering behavior of the scene are not affected by atmospherical distortions, re-sulting in unbiased coherence magnitude estimates and only affecting the phase observations of aninterferometric signal, not its magnitude.

Ferretti et. al. did not estimate the Master bias, but used small arcs in order to cancel the lowfrequency part of the atmospherical signal [6]. Eineder and Adam acknowledged that they did notuse small arcs, and could therefore not cancel the low frequency part of the atmospherical signal [5].It is therefore the first time that likelihood functions are build that have a two-dimensional domainin order to estimate the height and Master bias.

The computation of the likelihood function with a two-dimensional domain for an arcart.ref wasfrom a qualitative point of view not difficult. The processing time was heavy however: it took aboutthree minutes per arcart.ref to compute such a likelihood function for a computer with a ComputerPower Unit (CPU) of 2.4 GHz.

If it is desired to estimate a certain time-dependent deformation function variable, it would takethree parameters to estimate, resulting in the computation of a likelihood function with a three-dimensional domain. I computed such a likelihood function and noticed that the processsing timetook about four hours.

Of course these problems can be solved, although time constraints prohibit me to spend more timeto accomplish a significant reduction in processing time.

A reduction in processing time can be achieved by changing programming language. The pro-gramming language C++ should compute likelihood functions with multi-dimensional domains a lot

102

faster.Another way is to simply increase the computer power by using a cluster of computers, or, al-

ternatively, to use Graphical Power Units (GPU)s within the more advanced video cards withincomputers. Both approaches could probably very well accomplish a reduction in processing time,although the latter approach is relatively new and therefore any drawbacks unknown.

And last, but not least, can optimization algorithms speed up the process of finding the globalmaximum of likelihood functions.

Here, however, I show the results of the application of MLE for a 16 × 16 grid of arcsart.ref. inwhich the ML height and ML Master bias are estimated. No reliability estimation has been donehere due to time constraints.

For the formation of interferograms the programme DORIS 4.02 has been used. The 20 interfero-grams have been provided by Dr. A. Hooper and govern part of Iceland. Only the reference phaseof the flat Earth has been subtracted from these interferograms. Fig. 8 shows the interferometricphase of one of the interferogram combined with the the intensity information of a SAR image.

Figure 8.1: The interferometric phase, flat Earth signal subtracted, in combination with intensity informa-tion for part of Iceland. The infamous volcano Eyjafjallajokull is shown on the middle-left of the image.

The interferograms were formed using TerraSAR-X images. TerraSAR-X has a wavelength of 3.1cm and orbits at an altitude of 514 km. The SAR images have been formed by a looking angle of

103

28.7 degrees. 21 SAR images have been processed to 20 interferograms using a single Master stack.The perpendicular and temporal baselines of the interferograms are stated in Table 8.1. The

multi-looking factor is here equal to 100, and the CEW has been set to a window of 10× 10.

number of interferogram 1 2 3 4 5 6 7 8 9 10perpendicular baseline [m] 27.4 -5.3 69 81.2 -87.6 -93.9 -140.7 -238.7 -74.8 46.2temporal baseline [days] -77 -66 -55 -44 -33 -22 11 22 33 44

number of interferogram 11 12 13 14 15 16 17 18 19 20perpendicular baseline [m] -102.9 -27.1 133.3 -64.2 -167.2 16.9 16.7 -69.8 107.4 173.9temporal baseline [days] 55 66 77 154 209 220 231 341 352 363

Table 8.1: The perpendicular and temporal baselines of the interferograms.

An ASTER DEM has been provided as well, and has been radar coded into radar coordinates inorder to know the DEM values in slant direction. These DEM values are used for the estimation ofthe height. The DEM values in slant direction for the segment are shown in Fig. 8.2(a).

500 1000 1500

500

1000

1500

2000

2500

0

200

400

600

800

1000

1200

1400

1600

(a) The ASTER DEM of a scene in Iceland in slantdirection

500 1000 1500

500

1000

1500

2000

2500

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(b) The magnitude of coherence for one of the inter-ferograms

Figure 8.2: The DEM values of the scene in Iceland in slant direction are shown in Fig. 8.2(a). Thesegment chosen, indicated by the cross section, has height values between 600 and 603 meters. Fig. 8.2(b)shows the magnitude of coherence of the scene for one of the interferograms. The position of the grid isindicated by the cross section of the lines.

The position of the grid of 16× 16 arcsart.ref has been chosen such that the overall magnitude ofcoherence is high: values around 0.8 and 0.9 are common here. This can also be seen in 8.2(b). Theposition of the grid is indicated by the cross section of the lines.

104

For the application of the MLE the domains of the DEM error and the bias need to be set. Themaximum DEM error is here assumed to be 30 meters. The height domain is then set using theDEM values and the boundaries set by the DEM errors. The bias is estimated using the domain[-π,π].

As a reference point, the RC at the second column and seventh row (2,7) has been chosen. Iassume that the estimates have few errors within their estimates, such that the RC can be used asa deterministic reference point.

Fig. 8 shows the DEM values for the grid of 16 × 16 arcs. The values are based on the differencebetween the corresponding RCs and the RC at position (2,7). Notice the small differences betweenthe DEM values: a maximum difference of only three meters occurs here.

Fig. 8 can be used as a reference. I underline hereby that the DEM is not error free, and cantherefore not be used as an error free reference.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

0

0.5

1

1.5

2

2.5

3

Figure 8.3: The DEM values for the investigated segment.

Fig. 8.4(a) and 8.4(b) show the ML estimates of the height and the bias. Due to some significantchanges within the scene, probably caused by behavior that we can not explain by means of the usedobservation model, it is difficult to observe the overall behavior of the parameter estimates.

Due to time time constraints no hypothesis testing or outlier detection, identification and removalhas here been applied. Instead, simple spike removal has been obtained here by setting certain limitswithin the figures. The first limits have been selecting by only accepting the ML height estimatesbetween -15 and 10 meters and the ML Master bias estimates between -1 and 1 radians. The resultsare shown in Fig. 8.5(a) and 8.5(b).

Fig. 8.5(a) and 8.5(b) show some clear trends through the two grids. Most arcs of Fig. 8.5(a)show a similar trend within its height estimates as in the ASTER DEM values in Fig. 8. Fig. 8.5(b)shows that for most ML biases its parameter estimates are slowly varying within the area.

Still, some estimate values of Fig. 8.5(a) and 8.5(b) can not be easily explained. Some estimatesshow a consistent spatial behavior in both Fig. 8.5(a) and 8.5(b) that occur in both figures and seemtherefore seem strongly correlated.

My guess is that this is the result of the usage of the wrong observation model. Hypothesis testingcould ascertain this suspect, but is, as explained earlier, not done here.

Fig. 8.6(a) and 8.6(b) show the results if the limits are further narrowed. Hereby only those

105

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

−25

−20

−15

−10

−5

0

5

10

15

(a) The ML height estimates.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

−3

−2

−1

0

1

2

(b) The ML bias estimates.

Figure 8.4: The ML height and ML bias estimates of the investigated scene. Due to spikes it is difficult tosee the overall behavior of the estimates.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

−14

−12

−10

−8

−6

−4

−2

0

2

4

6

(a) The ML height estimates after spike removal.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

−0.2

0

0.2

0.4

0.6

0.8

(b) The ML bias estimates after spike removal.

Figure 8.5: The ML height and ML bias estimates of the investigated scene after spike removal. Only thoseestimates have been selected that have their ML height estimates between -15 and 10 meters and their MLbias estimates between -1 and 1 radians.

parameter estimates have been selected that have their ML height estimates between -4 and 10meters and their ML bias estimates between -1 and 1 radians.

By comparing Fig. 8.6(a) with the ASTER DEM values of Fig. 8 I observe that the North-Southtrend can be on overall seen in both the cropped images. In the West the trend is less present. Mostparameter estimates seem therefore to be quite reliable, since both the ASTER DEM values as theheight estimates values based on the TerraSAR-X data are independent from each other.

Furthermore it seems that a scaling error has occurred in either Fig. 8.6(a), Fig. 8 or in both thefigures. It is not certain what the source of the scaling error of Fig. 8.6(a) can be. The variation ofabout 7 meters in Fig. 8.6(a) is relatively small with respect to the DEM error domain of 60 meters,and therefore seem to be quite acceptable.

106

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

0

1

2

3

4

5

6

(a) The ML height estimates under limited condi-tions.

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

(b) The ML bias estimates under limited conditions.

Figure 8.6: The ML height and ML bias estimates of the investigated scene under. Only those estimateshave been selected that have their ML height estimates between -4 and 10 meters and their ML bias estimatesbetween -1 and 1 radians.

The ML Master bias estimates show larger variations within the grid, although the variations seemto be quite acceptable.

I expect that the estimation of the ML Master bias, next to the height estimation, can in generalbe successfully done if at least the redundancy is sufficiently high enough. My expectations arebased on the results shown here, and although no reliability estimates were made, the results seemto be sufficiently good in order to make this conclusion. The application of hypothesis testing andoutlier detection and identification should give enclosure about the correctness of the ML Masterbias estimates.

107

Chapter 9

Conclusions and Recommendations

In the introduction, Chap. 1, several research questions were defined. After the successful workof Ferretti et al. and Eineder and Adam that showed the viability of the application of MLE totime series of phase observations, several aspects were still unclear, or became apparent during myresearch, and an attempt has been made to answer all these questions.

In Chap. 1 the questions were formulated as followed:

1. Ferretti et al. and Eineder and Adam showed that the application of MLE to time series ofphase observations is viable for the estimation of the topographical height. Is it possible toestimate other parameters as well? And what are the conditions for an observation model toapply MLE? Is it possible to test the validity of the observation model?

2. Ferretti et al. and Eineder and Adam introduced two different reliability estimators and usedthe global maximum of a likelihood function to estimate the parameters. On what othercharacteristics can the parameter and reliability estimators be based on? And can parameterand reliability estimation be based on the same characteristics?

3. What would happen with the likelihood function if one of the phase observations within a timeseries is omitted? What impact would it have on the parameter estimate? Is it possible toimprove the parameter estimates by omitting one or more phase observations within the timeseries?

The first group of questions could be answered well: I have presented here a mathematical frame-work that allows in essence the use of any observation model, satisfying certain constraints, toestimate other parameters than the height. I also introduce the convolution within the formation ofthe likelihood function for an arc, such that parameter estimations can be made more precisely.

The primare conditions for the acceptance of parameters within the observation model are fairlysimple. First of all, the parameter to include need only to be a function of time. Secondly, theparameter ambiguities within the design matrix, caused by the linear relationship with the modulo2π behavior of the interferometric phase, need to be different from each other, such that at leasttwo different parameter ambiguities are existent within the design matrix. The latter condition is inmost cases automatically fulfilled.

Other, secondary conditions, are in most cases automatically fulfilled, although in some unlikelycases these are not fulfilled. I show that the occurrence of ambiguity lengthening within the solutionspace can cause still ambiguities within the parameter estimates. I also show that an eventualambiguity lengthening can be easily computed, and I conclude that that as long as the ambiguitylengthening is larger than the parameter domain, the parameters can not be ambiguously estimated.

108

During my research I encountered beat frequency-like phenomena. I have studied the phenomenain more detail, and conclude that the beat frequency-like phenomena can occur when the ratiobetween two parameter ambiguities approach one. The beat frequency-like phenomena could possiblycontribute in the optimization of finding a global maximum.

I also show that the parameter ambiguities within the design matrix need to be known precisely.I still did not find any cases in which this did not occur in reality however.

The validation of observation models is also possible by making use of the tools that were alreadyavailable: the OMT and GLR test can be easily applied for the purpose of hypothesis testing. Notonly do I show that it is important to test hypotheses, I also show that it is easy to apply for theapplication of MLE to time series of phase observations.

The second group of questions could be answered satisfactorily as well: parameter and reliabilityestimation can be based on other characteristics of a likelihood function. I find out that on basisof maxima, variance, probability within certain allowed deviations and the misalignment within alikelihood function the estimation of parameters and the reliability of parameter estimates can beeasily done. Moreover, I show that parameter and reliability estimation can be based on the samecharacteristics.

The parameter and reliability estimator based on the misalignment within a likelihood functionI invented myself, while the other parameter and reliability estimators were already existent, orsimply adopted from other estimation methods than MLE. The misalignment within a likelihoodfunction is compared to the perfect situation in which no misalignment takes place, and results intoa comparison between the likelihood function based on the real phase observations and a perfectlyaligned theoretical likelihood function based on artificial phase observations. This parameter andreliability estimator seems to be promising from a qualitative point of view, although the parameterand reliability estimator has not been subject to any tests. I recommend to further research theparameter and reliability estimator based on the misalignment within likelihood functions.

The parameter estimators, also generalized for any observation model that can be used for theapplication of MLE to time series of phase observations, have been compared to each other on basisof processing time. Since the parameter estimator based on global maxima, i.e. the parameter esti-mates from MLE, only needs to find the global maximum, the parameter estimator based on globalmaxima is the fastest. The reliability estimators have in some cases similar processing times, andcan therefore not be well compared to each other, although some reliability estimators are easierto understand for laymen than others. Reliability estimation on basis of probability within certainallowed deviations seems thereby the most easy to understand, while reliability estimation on basisof variance or standard deviation can be easily understood within the scientific community.

The last group of questions could only be answered satisfactorily to a certain degree.

I did not find out what happens exactly within a likelihood function if any phase observations areomitted. The random character of the phase observations makes it very difficult to predict whatwould happen in such cases. However, I did study the effects of parameter estimates based on MLEafter omittance of single and multiple phase observations.

Outliers, those phase observations that bias the parameter estimates, deteriorate the parameter es-timates, and the removal of outliers amongst the nominal phase observations result in improvementsof the parameter estimates. I show that the impact of outliers can be severe on parameter estimates,and that outlier identification is difficult to perform.

109

My research focusses here on outlier identification rather than outlier detection, and five differentoutlier identification algorithms have been developed on basis of either the small or large impact ofoutliers.

For small impacts, the parameter estimates can not change significantly, and therefore can the pa-rameter estimate immediately used for the identification of outliers, resulting in outlier identificationalgorithm I. For large impacts this is not the case however: the parameter estimates are significantlyshifted by outliers, and therefore data snooping needs to be applied, after which cross-validationfollows.

The reduction of wouldbe-outliers, phase observations that only would be outliers if a certainphase observation is omitted permanently, can be used in order to identify outliers, resulting inoutlier identification algorithm II. I found out that in quite some cases the outliers could not beidentified, caused by the multiplicity in which outlier candidates were still present. I could improvethe outlier identification based on data snooping and cross-validation though. Outlier identificationalgorithm III was developed to identify sets of combinations of outliers to counter this problem.Outlier identification algorithm IV, on the other hand, was developed to estimate the potential ofoutlier candidates to be outliers, and could encounter this problem as well. Also outlier identificationalgorithm V, developed on basis of both ideas, could encounter the problem of outlier identification.

The performance of the outlier identification algorithms have been statistically analyzed based on acertain number of simulations. It was complicated to make any statements about their performancedue to some encountered problems.

The identification of outliers had been based on the B-method of testing, while the outlier detectioncould not be based on the B-method of testing: a certain link between the criteria used within theoutlier detection and the criteria used within outlier identification was missing, such that the criteriafor outlier detection was unknown to identify an outlier defined by certain criteria within the outlieridentification. The link is well known for observations having Gaussian distribution. Unfortunatelythe interferometric phase PDF does not have Gaussian distributions. I recommend to study thisproblem further, and know that a solution can be found if either the error propagation of assumingGaussian phase PDFs is well understood, or that a PDF is adopted similar to the χ2-distributionused for the outlier detection of the B-method of testing.

I also show that other methods of outlier detection and identification exist. On basis of the physicallaws of conservation is outlier detection and identification possible as well. I noticed that this methodhas also some drawbacks, the most important one being that the time series that has outliers itsphase observations need to be identified, which is not easy as well. Parameter estimation noise makesit difficult to detect biases, although I could apply the bias detection rather successful.

I expect that the identification of the time series that has outliers amongst its phase observationscan be improved. I used a network of triangles, that is, three resolution cells, to form loops in orderto identify the corresponding time series that has outliers amongst its phase observations. However,using loops based on more than three resolution cells, data snooping of time series becomes possible,after which cross-validation needs to take place. Since outlier identification algorithm II until Vdo essentially the same for the identification of outliers, I expect that the acquired knowledge ofthose outlier identification algorithms can also be used for the identification of time series that haveoutliers amongst their phase observations. Therefore, I expect that also sets of combinations of thecorresponding time series can be identified, as well as that the corresponding time series can beidentified based on potential estimates.

Using a mix of both methods, that is, the B-method of testing for the outlier identification, andthe method based on the laws of conservation for the outlier detection, no link exists between thecriteria for the outlier detection and the criteria for outlier identification. I could however showthat the parameter estimates were biased and that, on overall, the parameter estimates have beenimproved after being subject to all the five outlier identification algorithms, resulting effectively in

110

outlier identification. The developed outlier identification algorithms were therefore successful intheir identification of outliers, although quite some differences exist between their performance.

I also came up with an idea for another method to detect and identify outliers based on the over-all behavior of likelihood functions after omittance of phase observations, which I recommend tostudy. The method is based on the tracking of peaks within likelihood functions, that shift, appear,or disappear after the omittance of phase observations, although other characteristics of likelihoodfunctions could be used as well. Not only will research in peak tracking give new insights into thebehavior of likelihood functions after omittance of phase observations, but I also suspect that outlierdetection and identification can be based on peak tracking.

The outlier identification algorithms, although shown to be successful, can be further improved.I still do not understand well the wandering effect of intermediate parameter estimates caused

by consecutive outlier identification in outlier identification algorithm I. Since outlier identificationalgorithm I could also be based on simultaneous outlier identification, the understanding of thewandering effect might result in a better performance of outlier identification algorithm I. ThereforeI recommend to study the wandering effect in more detail.

The mean parameter estimate improvement of outlier identification algorithm III until V used onlya combination of two phase observations to identify outliers, although I expect that combinationsof larger numbers of phase observations would allow an even higher effective outlier identification,resulting in higher mean parameter estimate improvements for outlier identification algorithm IIIuntil V. The assessment of omitting large numbers of combinations of phase observations takeshowever a lot of processing time. I suggest to use other programming languages, such as C++, tofurther reduce the computation time.

Furthermore, for the estimation of the potential of phase observations to be outliers reliability es-timation of the potential estimates is still missing. Also other outlier potential estimators should bepossible. I recommend to study in more detail the estimation of the potential of phase observationsto be outliers, as well as the reliability estimation of the potential estimates.

Further experience using real data showed that the computation of likelihood functions that havea multi-dimensional domain, can become easily very computationally expensive. For the furtherdevelopment of the application of MLE to time series of phase observations, I therefore recommendto deal with this problem.

Although the use of a cluster of computers can solve part of the problem, I expect that the use ofother programming languages, such as C++, can reduce the computation time. Also, the potentialof GPU in video cards within personal computers should be studied, as these may provide extracomputation power.

Furthermore, optimization in the finding of the global maximum within a likelihood function canfurther reduce the computation time. I also recommend, after research of the potential of the useof GPU and the use of other programming languages, to assess the need in a further reduction incomputation time, such that eventually the optimization of the finding of the global maximum canbe further studied.

111

Appendix A:

Algorithms for Outlier

Identification

112

Outlier identification algorithm I

Algorithm 1 Outlier identification algorithm I

-N = ... (N is the number of available interferograms)-load Table with CDF values-N phase observations are known-the magnitude of coherence of the corresponding N pixels have been estimated

5: -the multi-looking factor is known-the domain of the parameters is setrepeat

-compute the likelihood functionusing N phase observations

10: -compute the Maximum Likelihood Estimates (MLEs)-select the N phase estimates @ MLEs-estimate the N phase residuals (WRAP(phase observations - phase estimates) )for ii = 1 to N do

-compute the CDF-value for phase residual ii using the Table with CDF values15: end for

if one or more CDF-values are outside of confidence region (e.g. 95% confidence region) then-select and remove the single phase observation with lowest confidence-N ← N − 1

end if20: until no outliers can be found or N equals 2

return the remaining phase observations (eventually with other data)

113

Outlier identification algorithm II

Algorithm 2 Outlier identification algorithm II



for ii = 1 to N do-compute the likelihood function

10: after the temporal removal of the iith phase observationusing N − 1 phase observations-compute the Maximum Likelihood Estimates (MLE)s-select the N − 1 phase estimates @ MLEs-estimate the N − 1 phase residuals (WRAP(phase observations - phase estimates) )

15: -wouldbe-outliers(ii) = 0for jj = 1 to N − 1 do

-compute CDF-value(jj) for the remaining jjth phase residual using the Table with CDFvaluesif CDF-value(jj) is outside of confidence region (e.g. 95% confidence region) then

-wouldbe-outliers(ii) ← wouldbe-outliers(ii) + 120: end if

end forend forif wouldbe-outliers has a single minimum wouldbe-outliers number then

-select and remove the single phase observation that was omitted25: @ minimum value of wouldbe-outliers number

-N ← N − 1end if

until no outliers can be found or N equals 2return the remaining phase observations (eventually with other data)

114

Outlier identification algorithm III

Algorithm 3 Outlier identification algorithm III




10: after the temporal removal of the iith phase observationusing N − 1 phase observations-compute the Maximum Likelihood Estimate (MLE)s-select the N − 1 phase estimates @ MLEs-estimate the N − 1 phase residuals (WRAP(phase observations - phase estimates) )


-compute CDF-value(jj) for the jjth phase residual using the Table with CDF valuesif CDF-value(jj) is outside of confidence region (e.g. 95% confidence region) then


end forend forif wouldbe-outliers has a single minimum in wouldbe-outliers numbers then

-select and remove that phase observation that was omitted @ minimum25: -N ← N − 1

else-use function “find-combination-of-outliers”if a combination of outliers could be found using “find-combination-of-outliers” then

-select and remove the kk phase observations30: -N ← N − kk

end ifend if


115

Algorithm 4 find-combination-of-outliers

-kk = 2 (kk is the number of phase observations to be temporally removed)repeat

-compute all the sets of combinations possible using kk phase observationsout of N observations

5: -I is the number of possible combinationsfor ii = 1 to I do

-compute the likelihood functionafter the temporal removal of the iith combinationusing N − kk phase observations

10: -compute the Maximum Likelihood Estimate (MLE)s-select the N − kk phase estimates @ MLEs-estimate the N − kk phase residuals (WRAP(phase observations - phase estimates) )-wouldbe-outliers(ii) = 0for jj = 1 to N − kk do

15: -compute CDF-value(jj) for the remaining jjth phase residual using the Table with CDFvaluesif CDF-value(jj) is outside of confidence region (e.g. 95% confidence region) then

-wouldbe-outliers(ii) ← wouldbe-outliers(ii) + 1end if

end for20: end for

if wouldbe-outliers has multiple minimum in wouldbe-outliers numbers thenkk ← kk + 1

end ifuntil kk equals N − 1 or equals a predefined number or a combination of kk outliers is found

25: return the combination of outliers (if found)

116

Outlier identification algorithm IV

117

Algorithm 5 Outlier identification algorithm IV




10: after the temporal removal of the iith phase observation using N − 1 phase observations-compute the Maximum Likelihood Estimate (MLE)s-select the phase estimates @ MLEs-estimate the phase residuals (WRAP(phase observations - phase estimates) )-wouldbe-outliers(ii) = 0

15: for jj = 1 to N − 1 docompute CDF-value(jj) for the remaining jjth phase residual using the Table with CDFvaluesif CDF-value(jj) is outside of confidence region (e.g. 95% confidence region) then


20: end forend forif any wouldbe-outlier is found then

if wouldbe-outliers has a single minimum in wouldbe-outliers numbers then-select and remove the single phase observation that was omitted @ minimum

25: -N ← N − 1else if wouldbe-outliers has multiple minimum values of wouldbe-outliers numbers (mul-tiple candates) then

select all the C candidates-use function “find-most-appropriate-outlier” to find most appropriate candidateif outlier could be found by the function “find-most-appropriate-outlier” then

30: -select and remove the single phase observation-N ← N − 1

end ifend if

end if35: until no outliers can be found or N equals 2

return the remaining phase observations (eventually with other data)

118

Algorithm 6 find-most-appropriate-outlier

kk = 2 (kk is the number of phase observations to be temporally removed)repeat




10: -compute the Maximum Likelihood Estimate (MLE)s-select the N − kk phase estimates @ MLEs-estimate the N − kk phase residuals (WRAP(phase observations - phase estimates) )-wouldbe-outliers(ii) = 0for jj = 1 to N − kk do



end for20: end for

for ii = 1 to C do-potential-candidates(ii) is equal to all the minimum values withinwouldbe-outliers numbers in which the iith candidate occurs

end for25: -most-appropriate-outliers is equal to the maximum of potential-candidates

if most-appropriate-outliers has multiple remaining candidates left thenkk← kk + 1

end ifuntil a single most-appropriate-outlier is found or kk equals N − 1

30: return the single most-appropriate-outlier (if found)

119

Outlier identification algorithm V

(See the next pages.)

120

Algorithm 7 Outlier identification algorithm V




10: after the temporal removal of the iith phase observationusing N − 1 phase observations-compute the Maximum Likelihood Estimate (MLE)-select the phase estimates @ MLE-estimate the phase residuals (WRAP(phase observations - phase estimates) )


compute CDF-value(jj) for the remaining jjth phase residual using the Table with CDFvaluesif CDF-value(jj) is outside of confidence region (e.g. 95% confidence region) then


end forend forif any wouldbe-outliers is found then

if wouldbe-outliers has a single minimum in wouldbe-outliers numbers then25: -select and remove the single phase observation that was omitted @ minimum

-N ← N − 1else if wouldbe-outliers has multiple minimum values of wouldbe-outliers (multiple can-dates) then

-select all the C candidates-use function “find-most-appropriate-outlier/combination”

30: to find most appropriate candidate or a combination of outliersif outlier(s) could be found by the function “find-most-appropriate-outlier/combination”then

if a combination of kk outliers is found then-select and remove the kk phase observation(s)-N ← N − kk

35: else-select and remove the phase observation-N ← N − 1

end ifend if

40: end ifend if


121

Algorithm 8 find-most-appropriate-outlier/combination

kk = 2 (kk is the number of phase observations to be temporally removed)repeat




10: -compute the Maximum Likelihood Estimate (MLE)-select the N − kk phase estimates @ MLE-estimate the N − kk phase residuals (WRAP(phase observations - phase estimates) )-wouldbe-outliers(ii) = 0for jj = 1 to N − kk do



end for20: end for

-find all the minimum values within the wouldbe-outliers numbersif wouldbe-outliers has a single minimum of wouldbe-outliers then

-select the combination of outliers that has been found *else

25: for ii = 1 to C do-potential-candidates(ii) is equal to all the minimum values within outlier-wannabeesin which the iith candidate occurs

end for-most-appropriate-outliers is equal to the maximum of potential-candidates

30: if most-appropriate-outliers has multiple remaining candidates left thenkk ← kk + 1

else-select the single most-appropriate-outlier *

end if35: end if

untila single most-appropriate-outlier is found ora single combination of kk outliers is found or kk equals N − 1 or equals a predefined numberreturn

40: the single most-appropriate-outlier (if found) orthe combination of kk outliers (if found)

122

Appendix B: Results Simulations

123

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(a) Situation before outlier identification and re-moval.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(b) outlier identification algorithm I.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(c) outlier identification algorithm II.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(d) outlier identification algorithm III.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(e) outlier identification algorithm IV.

−20 −15 −10 −5 00

20

40

60

80

100

120

140

160


(f) outlier identification algorithm V.

Figure 9.1: The true and estimated deformation rate after removal of any identified outliers.

124

number of real outliers

num

ber

of id

entif

ied

outli

ers

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

(a) outlier identification algorithm I.

number of real outliersnu

mbe

r of

iden

tifie

d ou

tlier

s0 10 20 30 40 50 60 70

0

10

20

30

40

50

60

70

(b) outlier identification algorithm II.


num

ber

of id

entif

ied

outli

ers

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

(c) outlier identification algorithm III.


num

ber

of id

entif

ied

outli

ers

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

(d) outlier identification algorithm IV.


num

ber

of id

entif

ied

outli

ers

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

(e) outlier identification algorithm V.

Figure 9.2: The number of identified outliers versus the real outliers.

125

−20 −10 0 10 200

20

40

60

80

100

120

140

(a) real situation.

−20 −10 0 10 200

100

200

300

400

500

600

700

800

(b) outlier identification algorithm I.

−20 −10 0 10 200

100

200

300

400

500

600

700

800

900

(c) outlier identification algorithm II.

−20 −10 0 10 200

100

200

300

400

500

600

700

800

900

(d) outlier identification algorithm III.

−20 −10 0 10 200

100

200

300

400

500

600

700

(e) outlier identification algorithm IV.

−20 −10 0 10 200

100

200

300

400

500

600

(f) outlier identification algorithm V.

Figure 9.3: Improvements and deteriorations after the identification and removal of identified outliers persimulation.

126


num

ber

of m

isse

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70


number of real outliersnu

mbe

r of

mis

sed

outli

ers

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of m

isse

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of m

isse

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of m

isse

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70


Figure 9.4: The number of missed outliers versus the real outliers.

127

number of identified outliers

num

ber

of in

corr

ectly

iden

tifie

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of in

corr

ectly

iden

tifie

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of in

corr

ectly

iden

tifie

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of in

corr

ectly

iden

tifie

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70



num

ber

of in

corr

ectly

iden

tifie

d ou

tlier

s

0 10 20 30 40 50 60 700

10

20

30

40

50

60

70


Figure 9.5: The number of incorrectly identified outliers.

128

Bibliography

[1] W Baarda. A testing procedure for use in geodetic networks, volume 5 of Publications on Geodesy.Netherlands Geodetic Commission, Delft, 2 edition, 1968.

[2] Richard Bamler and Philipp Hartl. Synthetic aperture radar interferometry. Inverse Problems,14:R1–R54, 1998.

[3] Carlo Colesanti, Alessandro Ferretti, Fabrizio Novali, Claudio Prati, and Fabio Rocca. SARmonitoring of progressive and seasonal ground deformation using the Permanent ScatterersTechnique. IEEE Transactions on Geoscience and Remote Sensing, 41(7):1685–1701, July 2003.

[4] P Ditmar. Delft Institute of Earth Observation and Space Systems (DEOS), Delft Universityof Technology, The Netherlands, 2008.

[5] M Eineder and N Adam. Avoiding phase unwrapping in dem generation by fusing multi fre-quency ascending and descending interferograms. In International Geoscience and Remote Sens-ing Symposium, Anchorage, Alaska, 20–24 September 2004, volume 1, pages 477–480, sep 2004.

[6] A Ferretti, C Prati, F Rocca, and A Monti Guarnieri. Multibaseline SAR interferometry forautomatic DEM reconstruction. In Third ERS Symposium—Space at the Service of our Envi-ronment, Florence, Italy, 17–21 March 1997, ESA SP-414, pages 1809–1820, 1997.

[7] Alessandro Ferretti, Claudio Prati, and Fabio Rocca. Permanent scatterers in SAR interferom-etry. IEEE Transactions on Geoscience and Remote Sensing, 39(1):8–20, January 2001.

[8] Fabio Gatelli, Andrea Monti Guarnieri, Francesco Parizzi, Paolo Pasquali, Claudio Prati, andFabio Rocca. The wavenumber shift in SAR interferometry. IEEE Transactions on Geoscienceand Remote Sensing, 32(4):855–865, July 1994.

[9] Dennis C Ghiglia and Louis A Romero. Minimum lp-norm two-dimensional phase unwrapping.Journal of the Optical Society of America A., 13:1999–2013, 1996.

[10] Dennis C Ghiglia and Daniel E Wahl. Interferometric synthetic aperture radar terrain elevationmapping from multiple observations. In Digital Signal Processing Workshop, pages 33–36, oct1994.

[11] Fulvio Gini, Fabrizio Lombardini, and Monica montanari. Layover solution in multibaselinesar interferometry. IEEE Transactions on Aerospace and Electronic Systems, 38:1344–1356, oct2002.

[12] Richard M Goldstein, Howard A Zebker, and Charles L Werner. Satellite radar interferometry:Two-dimensional phase unwrapping. Radio Science, 23(4):713–720, July 1988.

129

[13] J O Hagberg, L M H Ulander, and J Askne. Repeat-pass SAR interferometry over forestedterrain. IEEE Transactions on Geoscience and Remote Sensing, 33(2):331–340, March 1995.

[14] Jan O Hagberg and Lars M H Ulander. On the optimization of interferometric SAR for to-pographic mapping. IEEE Transactions on Geoscience and Remote Sensing, 31(1):303–306,January 1993.

[15] Ramon F Hanssen. Radar Interferometry: Data Interpretation and Error Analysis. KluwerAcademic Publishers, Dordrecht, 2001.

[16] Andy Hooper, Howard Zebker, Paul Segall, and Bert Kampes. A new method for measuringdeformation on volcanoes and other non-urban areas using InSAR persistent scatterers. Geo-physical Research Letters, 31:L23611, doi:10.1029/2004GL021737, December 2004.

[17] Daniel J Inman. Prentice Hall, Uppersaddle River, New Jersey, second edition, 2001.

[18] Dieter Just and Richard Bamler. Phase statistics of interferograms with applications to syntheticaperture radar. Applied Optics, 33(20):4361–4368, 1994.

[19] B M Kampes and R F Hanssen. Ambiguity resolution for permanent scatterer interferometry.IEEE Transactions on Geoscience and Remote Sensing, 42(11):2446–2453, November 2004.

[20] Bert M Kampes, Ramon F Hanssen, and Laurens M Th Swart. Strategies for non-linear defor-mation estimation from interferometric stacks. In International Geoscience and Remote SensingSymposium, Sydney, Australia, 9–13 July 2001, pages cdrom, 4 pages, 2001.

[21] Jong-Sen Lee, A R Miller, and K W Hoppel. Statistics of phase difference and product magnitudeof multi-look processed Gaussian signals. Waves in Random Media, 4(3):307–319, July 1994.

[22] Jong-Sen Lee, Konstantinos P Papathanassiou, Thomas L Ainsworth, and Mitchell R. Grunesadn Andreas Reigber. A new technique for noise filtering of sar interferometric phase images.IEEE Transactions on Geoscience and Remote Sensing, 36(5):1456–1465, September 1998.

[23] E Rodriguez and J M Martin. Theory and design of interferometric synthetic aperture radars.IEE Proceedings-F, 139(2):147–159, April 1992.

[24] M S Seymour and I G Cumming. Maximum likelyhood estimation for SAR interferometry. InInternational Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 8–12 August1994, pages 2272–2275, 1994.

[25] P J G Teunissen. Testing theory; an introduction. Delft University Press, Delft, 1 edition, 2000.

[26] P J G Teunissen, D G Simons, and C C J M Tiberius. Probability and observation theory. DelftInstitute of Earth Observation and Space Systems (DEOS), Delft University of Technology, TheNetherlands, 2005.

[27] R J A Tough, D Blacknell, and S Quegan. A statistical description of polarimetric and inter-ferometric synthetic aperture radar. Proceeedings of the Royal Society London A, 449:567–589,1995.

[28] Wei Xu, Ee Chien Chang, Leong Keong Kwoh, Hock Lim, and Cheng Alice Heng. Phase-unwrapping of SAR interferogram with multi-frequency or multi-baseline. In InternationalGeoscience and Remote Sensing Symposium, Pasadena, CA, USA, 8–12 August 1994, pages730–732, 1994.

130

[29] Howard A Zebker and Katherine Chen. Accurate estimation of correlation in InSAR observa-tions. Geoscience and Remote Sensing Letters, 2(2):1682–1690, 2005.

[30] Howard A Zebker and Richard M Goldstein. Topographic mapping from interferometric syn-thetic aperture radar observations. Journal of Geophysical Research, 91(B5):4993–4999, April-101986.

[31] Howard A Zebker and John Villasenor. Decorrelation in interferometric radar echoes. IEEETransactions on Geoscience and Remote Sensing, 30(5):950–959, September 1992.

131

Documents

Novel Estimation Aspects for the Application of Maximum … · 2017-03-23 · Novel Estimation Aspects for the Application of Maximum ... The phase ambiguity is therefore an extra