Discrete Inverse Problem - Insight and Algorithms

5/20/2018 Discrete Inverse Problem - Insight and Algorithms

1/209

Chapter 1

Introduction and Motivation

If you have acquired this book, perhaps you do not need a motivation for studyingthe numerical treatment of inverse problems. Still, it is preferable to start with a fewexamples of the use of linear inverse problems. One example is that of computing themagnetization inside the volcano Mt. Vesuvius (near Naples in Italy) from measure-ments of the magnetic field above the volcanoa safe way to monitor the internalactivities. Figure 1.1 below shows a computer simulation of this situation; the leftfigure shows the measured data on the surface of the volcano, and the right figureshows a reconstruction of the internal magnetization. Another example is the com-

putation of a sharper image from a blurred one, using a mathematical model of thepoint spread function that describes the blurring process; see Figure 1.2 below.

Figure 1.1. Left: Simulated measurements of the magnetic field on the sur-

face of Mt. Vesuvius. Right: Reconstruction of the magnetization inside the volcano.

Figure 1.2. Reconstruction of a sharper image from a blurred one.

1

Downloaded07/26/14to129.107.136.1

53.RedistributionsubjecttoS

IAMl

icenseorcopyright;see

http://www.siam.org/journals/ojsa.php


2/209

2 Chapter 1. Introduction and Motivation

System

Input

Output

Knownbut with

errors

One of these is known

Inverse Problem

Figure 1.3. The forward problem is to compute the output, given a systemand the input to this system. The inverse problem is to compute either the inputor the system, given the other two quantities. Note that in most situations we haveimprecise (noisy) measurements of the output.

Both are examples of a wide class of mathematical problems, referred to asinverse problems. These problems generally arise when we wish to compute informa-

tion about internal or otherwise hidden data from outside (or otherwise accessible)measurements. See Figure 1.3 for a schematic illustration of an inverse problem.

Inverse problems, in turn, belong to the class ofill-posed problems. The term wascoined in the early 20th century by Hadamard who worked on problems in mathematicalphysics, and he believed that ill-posed problems do not model real-world problems (hewas wrong). Hadamards definition says that a linear problem is well-posed if it satisfiesthe following three requirements:

Existence: The problem must have a solution.

Uniqueness: There must be only one solution to the problem. Stability:The solution must depend continuously on the data.

If the problem violates one or more of these requirements, it is said to be ill-posed.The existence condition seems to be trivialand yet we shall demonstrate that

we can easily formulate problems that do not have a solution. Consider, for example,the overdetermined system

12

x=

12.2

.

This problem does not have a solution; there is no x such that x = 1 and 2x = 2.2.Violations of the existence criterion can often be fixed by a slight reformulation ofthe problem; for example, instead of the above system we can consider the associatedD

ownloaded07/26/14to129.107.136.1


IAMl




3/209

Chapter 1. Introduction and Motivation 3

least squares problem

minx

12 x 12.22

2

= minx

(x 1)2 + (2x 2.2)2

,

which has the unique solution x= 1.08.The uniqueness condition can be more critical; but again it can often be fixed

by a reformulation of the problemtypically by adding additional requirements to thesolution. If the requirements are carefully chosen, the solution becomes unique. Forexample, the underdetermined problem

x1+x2 = 1 (the worlds simplest ill-posed problem)

has infinitely many solutions; if we also require that the 2-norm ofx, given by x2=(x21 +x

22 )

1/2, is minimum, then there is a unique solutionx1 =x2 = 1/2.The stability condition is much harder to deal with because a violation implies

that arbitrarily small perturbations of data can produce arbitrarily large perturbationsof the solution. At least, this is true for infinite-dimensional problems; for finite-dimensional problems the perturbation is always finite, but this is quite irrelevant ifthe perturbation of the solution is, say, of the order 1012.

Again the key is to reformulate the problem such that the solution to the newproblem is less sensitive to the perturbations. We say that westabilize orregularize

the problem, such that the solution becomes more stable and regular. As an example,

consider the least squares problemminx A x b2 with coefficient matrix and right-hand side given by

A=

0.16 0.100.17 0.11

2.02 1.29

, b= A

11

+

0.010.03

0.02

=

0.270.25

3.33

.

Here, we can consider the vector (0.01 , 0.03 ,0.02)T a perturbation of the exactright-hand side(0.26 , 0.28 , 3.31)T. There is no vectorxsuch thatA x=b, and theleast squares solution is given by

xLS =

7.018.40

A xLS b2 = 0.022.

Two other solutions with a small residual are

x =

1.65

0

, x =

02.58

A x b2= 0.031, A x b2 = 0.036.

All three solutions xLS, x, and x have small residuals, yet they are far from the

exact solution( 1 ,1 )T!The reason for this behavior is that the matrix A is ill conditioned. When this is

the case, it is well known from matrix computations that small perturbations of theright-hand side bcan lead to large perturbations of the solution. It is also well knownD

ownloaded07/26/14to129.107.136.1


IAMl




4/209

4 Chapter 1. Introduction and Motivation

that a small residual does not imply that the perturbed solution is close to the exactsolution.

Ill-conditioned problems are effectively underdetermined. For example, for theabove problem we have

A

1.00

1.57

=

0.00300.0027

0.0053

,

showing that the vector (1.00 , 1.57)T is almost a null vector forA. Hence we canadd a large amount of this vector to the solution vector without changing the residualvery much; the system behaves almostlike an underdetermined system.

It turns out that we can modify the above problem such that the new solution

is more stable, i.e., less sensitive to perturbations. For example, we can enforce anupper bound on the norm of the solution; i.e., we solve the modified problem:

minx

A x b2 subject to x2 .

The solutionx depends in a unique but nonlinear way on ; for example,

x0.1 =

0.080.05

, x1 =

0.840.54

, x1.37 =

1.160.74

, x10 =

6.51

7.60

.

The solution x1.37

(for = 1.37) is quite close to the exact solution. By supplyingthe correct additional information we can compute a good approximate solution. Themain difficulty is how to choose the parameter when we have little knowledge aboutthe exact solution.

Whenever we solve an inverse problem on a computer, we always face difficultiessimilar to the above, because the associated computational problem is ill conditioned.The purpose of this book is:

1. To discuss the inherent instability of inverse problems, in the form of first-kindFredholm integral equations.

2. To explain why ill-conditioned systems of equations always arise when we dis-cretize and solve these inverse problems.

3. To explain the fundamental mechanisms of this ill conditioning and how theyreflect properties of the underlying problem.

4. To explain how we can modify the computational problem in order to stabilizethe solution and make it less sensitive to errors.

5. To show how this can be done efficiently on a computer, using state-of-the-artmethods from numerical analysis.

Regularization methodsare at the heart of all this, and in the rest of this book we willdevelop these methods with a keen eye on the fundamental interplay between insightandalgorithms.D

ownloaded07/26/14to129.107.136.1


IAMl




5/209

Chapter 2

Meet the Fredholm Integral

Equation of the First Kind

This book deals with one important class of linear inverse problems, namely, those thattake the form of Fredholm integral equations of the first kind. These problems arisein many applications in science and technology, where they are used to describe therelationship between the sourcethe hidden dataand the measured data. Someexamples are

medical imaging (CT scanning, electro-cardiography, etc.), geophysical prospecting (search for oil, land-mines, etc.), image deblurring (astronomy, crime scene investigations, etc.), deconvolution of a measurement instruments response.

If you want to work with linear inverse problems arising from first-kind Fredholmintegral equations, you must make this integral equation your friend. In particular, youmust understand the psyche of this beast and how it can play tricks on you if youare not careful. This chapter thus sets the stage for the remainder of the book bybriefly surveying some important theoretical aspects and tools associated with first-

kind Fredholm integral equations.Readers unfamiliar with inner products, norms, etc. in function spaces may ask:

How do I avoid reading this chapter? The answer is: Do not avoid it completely; readthe first two sections. Readers more familiar with this kind of material are encouragedto read the first four sections, which provide important background material for therest of the book.

2.1 A Model Problem from Geophysics

It is convenient to start with a simple model problem to illustrate our theory andalgorithms. We will use a simplified problem from gravity surveying. An unknownmass distribution with density f(t) is located at depthdbelow the surface, from 0 to1 on the taxis shown in Figure 2.1. We assume there is no mass outside this source,

5

Downloaded07/26/14to129.107.136.1


IAMl




6/209

6 Chapter 2. Meet the Fredholm Integral Equation of the First Kind

0 1 s

0 1 t

d

f(t)

g(s)

Figure 2.1. The geometry of the gravity surveying model problem: f(t) is

the mass density att, andg(s) is the vertical component of the gravity field ats.

which produces a gravity field everywhere. At the surface, along the saxis (see thefigure) from 0 to 1, we measure the vertical component of the gravity field, which werefer to as g(s).

The two functions f and gare (not surprisingly here) related via a Fredholmintegral equation of the first kind. The gravity field from an infinitesimally small partof f(t), of length d t, on the t axis is identical to the field from a point mass at tof strength f(t) dt. Hence, the magnitude of the gravity field along s is f(t) d t / r 2,where r = d2 + (s

t)2 is the distance between the source point at t and the

field point at s. The direction of the gravity field is from the field point to thesource point, and therefore the measured value ofg(s) is

d g= sin

r2 f(t) d t,

where is the angle shown in Figure 2.1. Using that sin = d /r, we obtain

sin

r2 f(t) d t=

d

(d2 + (s t)2)3/2 f(t) d t.

The total value of g(s) for any 0 s 1 consists of contributions from all massalong the taxis (from 0 to 1), and it is therefore given by the integral

g(s) =

1

0

d

(d2 + (s t)2)3/2 f(t) d t.

This is the forward problem which describes how we can compute the measurable dataggiven the source f.

The associated inverse problem of gravity surveying is obtained by swapping theingredients of the forward problem and writing it as

1

0

K(s, t) f(t) dt=g(s), 0 s 1,

where the function K, which represents the model, is given by

K(s, t) = d

(d2 + (s t)2)3/2 , (2.1)Downloaded07/26/14to129.107.136.1


IAMl




7/209

2.2. Properties of the Integral Equation 7

0 0.5 10

1

2

f(t)

0 0.5 10

5

10

g(s)

d= 0.25d= 0.5d= 1

Figure 2.2. The left figure shows the function f(the mass density distribu-tion), and the right figure shows the measured signal g (the gravity field) for threedifferent values of the depth d.

and the right-hand sidegis what we are able to measure. The function Kis the verticalcomponent of the gravity field, measured at s, from a unit point source located at t.From K andgwe want to compute f, and this is the inverse problem.

Figure 2.2 shows an example of the computation of the measured signal g(s),given the mass distributionfand three different values of the depth d. Obviously, thedeeper the source, the weaker the signal. Moreover, we see that the observed signalg(the data) is much smoother than the source f, and in fact the discontinuity in f isnot visible in g.

2.2 Properties of the Integral Equation

The Fredholm integral equation of the first kind takes the generic form

1

0

K(s, t) f(t) dt=g(s), 0

s

1. (2.2)

Here, both the kernel K and the right-hand side g are known functions, while f isthe unknown function. This equation establishes a linear relationship between the twofunctionsf andg, and the kernelKdescribes the precise relationship between the twoquantities. Thus, the function Kdescribes the underlying model. In Chapter 7 we willencounter formulations of (2.2) in multiple dimensions, but our discussion until thenwill focus on the one-dimensional case.

Iff andKare known, then we can compute gby evaluating the integral; this iscalled the forward computation. The inverse problem consists of computing f giventhe right-hand side and the kernel. Two examples of these ingredients are listed inTable 2.1.

An important special case of (2.2) is when the kernel is a function of the differ-ence betweensandt, i.e.,K(s, t) =h(s t), whereh is some function. This versionDo

wnloaded07/26/14to129.107.136.1


IAMl




8/209


Table 2.1. The ingredients and mechanisms of two inverse problems.

Problem Image deblurring Gravity surveying

Source f Sharp image Mass distributionData g Blurred image Gravity field componentKernel K Point spread function Field from point massForward problem Compute blurred image Compute gravity fieldInverse problem Reconstruct sharp image Reconstruct mass distrib.

of the integral equation is called a deconvolutionproblem, and it takes the form

1

0

h(s

t) f(t) d t=g(s), 0

s

1

(and similarly in more dimensions). It follows from (2.1) that the gravity surveyproblem from the previous section is actually a convolution problem. Some numeri-cal regularization methods for deconvolution problems are described in [35]; see alsoSections 7.1 and 7.5 on barcode reading and image deblurring.

The smoothing that we observed in the above example, when going from thesourcef to the data g, is a universal phenomenon for integral equations. In the map-ping from f to g, higher frequency components in f are damped compared to com-ponents with lower frequency. Thus, the integration with K in (2.2) has a smoothingeffect on the functionf, such that gwill appear smoother than f.

The RiemannLebesgue lemma is a precise mathematical statement of theabove. If we define the function fp by

fp(t) = sin(2 p t), p= 1, 2, . . . ,

then for arbitrary kernels Kwe have

gp(s) =

1

0

K(s, t) fp(t) d t 0 for p .

That is, as the frequency off increasesas measured by the integerpthe amplitudeof gp decreases; Figure 2.3 illustrates this. See, e.g., Theorem 12.5C in [19] for aprecise formulation of the RiemannLebesgue lemma.

In other words, higher frequencies are damped in the mapping of f to g, andtherefore g will be smoother than f. The inverse problem, i.e., that of computingf from g, is therefore a process that amplifies high frequencies, and the higher thefrequency, the more the amplification. Figure 2.4 illustrates this: the functions fpand gp are related via the same integral equation as before, but now all four gpfunctions are scaled such thatgp2 = 0.01. The amplitude offp increases as thefrequency p increases.

Clearly, even a small random perturbation ofgcan lead to a very large perturba-tion off if the perturbation has a high-frequency component. Think of the functiongp in Figure 2.4 as a perturbation of g and the function fp as the correspondingperturbation of f. As a matter of fact, no matter how small the perturbation ofg,D

ownloaded07/26/14to129.107.136.1


IAMl




9/209

2.2. Properties of the Integral Equation 9

0 0.5 11

0.5

0

0.5

1

fp(t)

0 0.5 11

0.5

0

0.5

1

gp(s)

p= 1p= 2p= 4p= 8

Figure 2.3. Illustration of the RiemannLebesgue lemma with the functionfp(t) = sin(2 p t). Clearly, the amplitude of gp decreases as the frequency p in-creases.

0 0.5 1

0.01

0

0.01

fp(t)

0 0.5 1

0.001

0

0.001

gp(s), || g

p||

2= 0.01

p= 1p= 2p= 4p= 8

Figure 2.4. The consequence of the RiemannLebesgue lemma is that highfrequencies are amplified in the inversion. Here the functions gp are normalizedsuch that

gp

2 = 0.01. Clearly, the amplitude offp increases as the frequencyp

increases.

the corresponding perturbation of f can be arbitrarily large (just set the frequency pof the perturbation high enough). This illustrates the fundamental problem of solvinginverse problems, namely, that the solution may not depend continuously on the data.

Why bother about these issues associated with ill-posed problems? Fredholmintegral equations of the first kind are used to model a variety of real applications. Wecan only hope to compute useful solutions to these problems if we fully understandtheir inherentdifficulties. Moreover, we must understand how these difficulties carryover to the discretized problems involved in a computer solution, such that we can dealwith them (mathematically and numerically) in a satisfactory way. This is preciselywhat the rest of this book is about.D

ownloaded07/26/14to129.107.136.1


IAMl




10/209


2.3 The Singular Value Expansion and the PicardCondition

The singular value expansion (SVE) is a mathematical tool that gives us a handleon the discussion of the smoothing effect and the existence of solutions to first-kindFredholm integral equations. First we need to introduce a bit of notation. Given twofunctions and defined on the interval from 0 to 1, their inner product is definedas

,

1

0

(t) (t) d t. (2.3)

Moreover, the 2-norm of the function is defined by

2 , 1/2 = 1

0

(t)2 d t1/2

. (2.4)

The kernel K in our generic integral equation (2.2) is square integrable if the

integral1

0

1

0 K(s, t)2 ds dt is finite. For any square integrable kernel K the singular

value expansion (SVE) takes the form

K(s, t) =

i=1

iui(s) vi(t). (2.5)

The functions ui and vi are called the left and right singular functions. All the ui-functions are orthonormal with respect to the usual inner product (2.3), and the sameis true for all the vi-functions, i.e.,

ui, uj = vi, vj =i j, i= 1, 2, . . . .

The quantities i are called the singular values, and they form a nonincreasing se-quence:

1 2 3 0.If there is only a finite number of nonzero singular values, then the kernel is called

degenerate. The singular values and functions satisfy a number of relations, and themost important is the fundamental relation

1

0

K(s, t) vi(t) d t=iui(s), i= 1, 2, . . . . (2.6)

We note that it is rare that we can determine the SVE analytically; later we shalldemonstrate how we can compute it numerically. For a more rigorous discussion ofthe SVE, see, e.g., Section 15.4 in [50].

2.3.1 The Role of the SVEThe singular valuesialways decay to zero, and it can be shown that the smootherthe kernel K, the faster the decay. Specifically, if the derivatives of K of orderD

ownloaded07/26/14to129.107.136.1


IAMl




11/209

2.3. The Singular Value Expansion and the Picard Condition 11

v1

(t)

v2

(t)

v3

(t)

v4

(t)

0 0.5 1

v5

(t)

0 0.5 1

v6

(t)

0 0.5 1

v7

(t)

0 0.5 1

v8

(t)

Figure 2.5. The number of oscillations, or zero-crossings, in ui and vi in-creases with i(as the corresponding singular values decay). Here we used the gravitysurveying model problem.

0, . . . , q exist and are continuous, then the singular values i decay approximatelyasO(iq1/2). If K is infinitely many times differentiable, then the singular valuesdecay even faster: theidecay exponentially, i.e., as O(i)for some positive strictlysmaller than one.

The singular functions resemble spectral bases in the sense that the smaller thei, the more oscillations (or zero-crossings) in the corresponding singular functionsuiandvi. Figure 2.5 illustrates this.

The left and right singular functions form bases of the function space L2([0, 1])of square integrable functions in the interval[0, 1]. Hence, we can expand both f and

g in terms of these functions:

f(t) =

i=1

vi, f vi(t) and g(s) =i=1

ui, g ui(s).

If we insert the expansion forf into the integral equation and make use of the funda-mental relation, then we see that gcan also be written as

g(s) =

i=1

i vi, f ui(s).

Since vi is mapped to iui(s), this explains why the higher frequencies are dampedmore than the lower frequencies in the forward problem, i.e., why the integration withKhas a smoothing effect.

When we insert the two expansions for f and g into the integral equation andmake use of the fundamental relation again, then it is easy to obtain the relation

i=1

i vi, f ui(s) =i=1

ui, g ui(s). (2.7)

The first observation to make from this relation is that if there are zero singular valuesi, then we can only have a solution f(t) that satisfies (2.2) if the correspondingcomponentsui, g ui(s) are zero.

For convenience let us now assume that alli= 0. It turns out that there is still acondition on the right-hand sideg(s). By equating each of the terms in the expansionD

ownloaded07/26/14to129.107.136.1


IAMl




12/209


0 0.5 10

5

10

g(t)f(s)

0 10 20 3010

10

105

100

No noise in g(s)

i

ui, g

ui, g /

i0 10 20 30

1010

105

100

Noise in g(s)

Figure 2.6. Illustration of the Picard condition. The left figure shows thefunctionsf andgused in the example. The middle figure shows the singular valuesiand the SVE expansion coefficientsui, g andui, g / i (forg andf, respectively)for a noise-free problem. The right figure shows how these coefficients change whenwe add noise to the right-hand side.

(2.7), we see that the expansion coefficients for the solution arevi, f = ui, g / ifor i= 1, 2, . . ., and this leads to the following expression for the solution:

f(t) =i=1

ui, gi

vi(t). (2.8)

While this relation is not suited for numerical computations, it sheds important lighton the existence of a solution to the integral equation. More precisely, we see thatfor a square integrable solution f to exist, the 2-normf2 must be bounded, whichleads to the following condition:

The Picard condition. The solution is square integrable if

f22 =

1

0

f(t)2 d t=i=1

vi, f2 = i=1

ui, gi

2

< . (2.9)

In words, the Picard condition says that the right-hand side coefficientsui, g mustdecay to zero somewhat faster than the singular values i. We recognize this as acondition on the smoothness of the right-hand side.

The trouble with first-kind Fredholm integral equations is that, even if the exactdata satisfies the Picard condition, the measured and noisy data g usually violatesthe condition. Figure 2.6 illustrates this. The middle figure shows the singular valuesi and the expansion coefficients

ui, g

and

ui, g

/ i for the ideal situation with

no noise, and here the Picard condition is clearly satisfied. The right figure showsthe behavior of these quantities when we add noise to the right-hand side; now theright-hand side coefficients ui, g level off at the noise level, and the Picard conditionDo

wnloaded07/26/14to129.107.136.1


IAMl




13/209

2.3. The Singular Value Expansion and the Picard Condition 13

0 2 4 6 810

7

105

103

101

|| g gk||

2

0 2 4 6 810

0

102

104

106

|| fk||

2

0

0.5

1f1(t)

10

0

10f2(t)

50

0

50f3(t)

500

0

500f4(t)

0 0.5 15000

0

5000f5(t)

0 0.5 12

0

2x 10

4

f6(t)

0 0.5 12

0

2x 10

5

f7(t)

0 0.5 12

0

2x 10

6

f8(t)

Figure 2.7. Ursells problem (2.10). Top left: The norm of the errorg

gk

in the approximation (2.11) for the right-hand side. Top right: The norm of theapproximate solutionfk(2.12). Bottom: Approximate solutionsfkfork= 1, . . . , 8.

is violated. As a consequence, the solution coefficientsui, g / i increase, and thenorm of the solution becomes unbounded.

The violation of the Picard condition is the simple explanation of the instabilityof linear inverse problems in the form of first-kind Fredholm integral equations. Atthe same time, the insight we gain from a study of the quantities associated with theSVE gives a hint on how to deal with noisy problems that violate the Picard condition;

simply put, we want to filter the noisy SVE coefficients.

2.3.2 Nonexistence of a Solution

Ursell [70] has provided an example of a first-kind Fredholm integral equation thatdoes not have a square integrable solution; i.e., there is no solution whose 2-norm isfinite. His example takes the form

1

0

1

s+ t+ 1f(t) d t= 1, 0 s 1. (2.10)

Hence, the kernel and the right-hand side are given by K(s, t) = (s+t+ 1)1 andg(s) = 1.D

ownloaded07/26/14to129.107.136.1


IAMl




14/209


Let us expand the right-hand side in terms of the left singular functions. Weconsider the approximation

gk(s) =

ki=1

ui, g ui(s), (2.11)

and the top left plot in Figure 2.7 shows that the error in gkdecreases as k increases;we have

g gk2 0 for k .Next we consider the integral equation

1

0(s+t+1)1 fk(t) d t=gk(s), whose solution

fk is given by the expansion

fk(t) =k

i=1

ui, gi

vi(t). (2.12)

Clearlyfk2 is finite for all k ; but the top right plot in Figure 2.7 shows that thenorm offkincreases with k:

fk2 for k .

The bottom plots in Figure 2.7 corroborate this: clearly, the amplitude of the functionfkincreases withk, and these functions do not seem to converge to a square integrable

solution.

2.3.3 Nitty-Gritty Details of the SVE

We conclude this section with some nitty-gritty details that some readers may wish toskip. If the kernel Kis continuous, then for all equations that involve infinite sums ofKs singular functions, such as (2.5), (2.6), and (2.8), the equality sign means thatthe sum converges uniformly to the left-hand side. IfK is discontinuous, then we haveconvergence in the mean square. For example, for (2.8), uniform convergence meansthat for every there exists an integer Nsuch that for all k > Nwe havef(t) ki=1 ui,gi vi(t)

< t [0, 1].For convergence in the mean square, the inequality takes the formf ki=1 ui,gi vi

2

< .

The expansion in terms of an orthonormal basis,

f(t) =

i=1

i, f i(t),

i.e., with expansion coefficientsi= i, f, is a standard result in functional analysis;see, e.g., [50]. The SVE is essentially unique, and by this we mean the following.D

ownloaded07/26/14to129.107.136.1


IAMl




15/209

2.4. Ambiguity in Inverse Problems 15

Wheniis isolated (i.e., different from its neighbors), then we can always replacethe functionsui andvi withui andvi, because

1

0

K(s, t) (vi(t)) d t=

1

0

K(s, t) vi(t) d t= iui(s) =i(ui(t)) .

When ihas multiplicity larger than one, i.e., when

i1 > i = =i+> i++1,

then the corresponding singular functions are not unique, but the subspacesspan{ui,. . . , u i+} and span{vi, . . . , v i+} are unique. For example, if1 = 2, then

1

0

K(s, t) ( v1(t) + v2(t)) = 1u1(s) + 2u2(s)

=1( u1(s) + u2(s)) ,

showing that we can always replace u1 and v1 with u1+ u2 and v1+ v2as long as 2 +2 = 1 (to ensure that the new singular functions have unit2-norm).

2.4 Ambiguity in Inverse Problems

As we mentioned in Chapter 1, inverse problems such as the first-kind Fredholmintegral equation (2.2) are ill-posed. We have already seen that they may not havea solution, and that the solution is extremely sensitive to perturbations of the right-hand side g, thus illustrating the existence and stability issues of the Hadamardrequirements for a well-posed problem.

The uniqueness issue of the Hadamard requirement is relevant because, asa matter of fact, not all first-kind Fredholm integral equations (2.2) have a uniquesolution. The nonuniqueness of a solution to an inverse problem is sometimes referred

to as ambiguity, e.g., in the geophysical community, and we will adopt this terminologyhere. To clarify things, we need to distinguish between two kinds of ambiguity, asdescribed below.

The first type of ambiguity can be referred to asnull-space ambiguityor nonunique-ness of the solution. We encounter this kind of ambiguity when the integral operatorhas a null space, i.e., when there exist functions fnull= 0 such that, in the genericcase,

1

0

K(s, t) fnull(t) d t= 0.

Such functions are referred to as annihilators, and we note that iffnullis an annihilator,so is also any scalar multiple of this function. Annihilators are undesired, in the sensethat if we are not careful, then we may have no control of their contribution to thecomputed solution.D

ownloaded07/26/14to129.107.136.1


IAMl




16/209


The null space associated with the kernel K(s, t) is the space spanned by all theannihilators. In the ideal case, the null space consists of the null function only; when

this is the case we need not pay attention to it (similar to the treatment of linearalgebraic equations with a nonsingular coefficient matrix). Occasionally, however, weencounter problems with a nontrivial null space, and it follows from (2.6) that thisnull space is spanned by the right singular functions vi(t) corresponding to the zerosingular valuesi= 0. The dimension of the null space is finite if only a finite numberof singular values are zero; otherwise the dimension is infinite, which happens whenthere is a finite number of nonzero singular values. In the latter case we say that thekernel is degenerate.

It is easy to construct an example of an integral operator that has a null space;ifK(s, t) =s+ 2t, then simple evaluation shows that

1

1

(s+ 2t) (3t2 1) d t= 0,

i.e., fnull(t) = 3t2 1 is an annihilator. In fact, this kernel is degenerate, and the null

space is spanned by the Legendre polynomials of degree 2; see Exercise 2.2.Annihilators are not just theoretical conceptions; they arise in many applications,

such as potential theory in geophysics, acoustics, and electromagnetic theory. It is

therefore good to know that (depending on the discretization) we are often able todetect and compute annihilators numerically, and in this way control (or remove) theircontribution to the solution. Exercise 3.9 in the next chapter deals with the numericalcomputation of the one-dimensional null space associated with an integral equationthat arises in potential theory.

Unfortunately, numerical computation of an annihilator may occasionally be atricky business due to the discretization errors that arise when turning the integralequation into a matrix problem (Chapter 3) and due to rounding errors on the com-puter. Exercise 4.9 in Chapter 4 illustrates this aspect.

The second kind of ambiguity can be referred to as formulation ambiguity, andwhile it has another cause than the null-space ambiguity, and the two types are occa-sionally mixed up. The formulation ambiguity typically arises in applications where twodifferent integral formulations, with two different kernels, lead to the same right-handside function g associated with a physical quantity. A classical example comes frompotential theory where we consider the field goutside a 3D domain . Then Greensthird identity states that any field g outside can be produced by both a sourcedistribution inside and an infinitely thin layer of sources on the surface of. Whichformulation to prefer depends on which model best describes the physical problemunder consideration.

How to deal with a formulation ambiguity is thus, strictly speaking, purely a mod-eling issue and not a computational issue, as in the case of a null-space ambiguity.Unfortunately, as we shall see in Section 7.9, discretized problems and their regularizedsolutions may show a behavior that clearly resembles a formulation ambiguity. For thisreason, it is important to be aware of both kinds of ambiguity.D

ownloaded07/26/14to129.107.136.1


IAMl




17/209

2.5. Spectral Properties of the Singular Functions 17

2.5 Spectral Properties of the Singular Functions

This is the only section of the book that requires knowledge about functional analysis.The section summarizes the analysis from [38], which provides a spectral characteri-zation of the singular functions and the Fourier functions ek s/

2, where denotes

the imaginary unit. If you decide to skip this section, you should still pay attentionto the spectral characterization as stated in (2.14), because this characterization isimportant for the analysis and algorithms developed throughout the book.

In contrast to existing asymptotic results, the point of this analysis is to describebehavior that is essentially nonasymptotic, since this is what is always captured upondiscretization of a problem. The material is not a rigorous analysis; rather, it uses sometools from functional analysis to gain insight into nonasymptotic relations between theFourier and singular functions of the integral operator. We define the integral operator

K such that[Kf](s) =

K(s, t) f(t) d t,

and we make the following assumptions.

1. We assume that the kernel K is real and C1([, ] [, ]) (this can beweakened to piecewiseC1, carrying out the argument below on each piece), andthat it is nondegenerate (i.e., it has infinitely many nonzero singular values).

2. For simplicity, we also assume that K(, t) K(, t)2= 0; this assumptioncan be weakened so that this quantity is of the order of the largest singular value

of the integral operator.

Since the operator Kis square integrable, it has left and right singular functionsuj(x)andvj(x)and singular values j >0, such that

[Kvj](s) =juj(s), [Kuj](t) =jvj(t), j= 1, 2, . . . .Now, define the infinite matrixBwith rows indexed fromk= , . . . , and columnsindexed by j= 1, . . . , , and with entries

Bk,j=

uj, ek s/

2

. (2.13)

A similar analysis with vj

replacing uj

is omitted. We want to show that the largestentries in the matrix B have the generic shape < , i.e., that of a V lying on theside and the tip of the V located at indices k= 0, j= 1.

This means, for example, that the function u1 is well represented by just thesmallest values of k (i.e., the lowest frequencies). Moreover, for larger values of jthe function uj is well represented by just a small number of the Fourier functionsek s/

2 for some |k| in a band of contiguous integers depending on j. This leads to

the following general characterization (see Figure 2.5 for an example):

Spectral Characterization of the Singular Functions. The singular func-tions are similar to the Fourier functions in the sense that large singular

values (for small j) and their corresponding singular functions correspondto low frequencies, while small singular values (for larger j) correspond tohigh frequencies.

(2.14)

Downloaded07/26/14to129.107.136.1


IAMl




18/209


In order to show the above result, we first derive an estimate for the elementsin the matrix B. We have, for k= 0,

uj, ek s

2

=

1

jKvj, e

k s2

= 1

j

vj, K e

k s

2

= 1

j

vj(t)

K(s, t)ek s

2ds dt

= 1

k j

vj(t)(1)k+1

K(, t)K(, t)+

K

s

ek s2

d s

dt,

where the last line is obtained through integration by parts. Therefore, using thesecond assumption,

uj,

ek s2

1k jKs e

k s

2

2

. (2.15)

It is instructive to show that the term in the norm on the right hand side isroughly bounded above by1. By definition of induced operator norms, we know that

1 = maxf2=1 Kf2,

and so in particular for k =1 we have 1Kes/2

2. Using integration by

parts again, we find

1 12 Ks es

2

.

From the RiemannLebesque lemma (see, e.g., Theorem 12.5C in [19]) it follows that

K

s ek s/

2

2

0 for k .

Therefore, we also expect for|k| >1 that

1 >Ks e

k s

2

2

,

so

uj, ek s

2


19/209

2.5. Spectral Properties of the Singular Functions 19

singular function

Fouriermode

deriv2

20 40 60 80 100

49

0

50

singular function

phillips

20 40 60 80 100

49

0

50

singular function

gravity

20 40 60 80 100

49

0

50

Figure 2.8. Characterization of the singular functions. The three figures

show sections of the infinite matrix B (2.13)forj= 1, . . . , 100andk= 49, . . . , 50for three test problems.

by Parsevals equality, since ek s/

2is a basis for L2([, ])and uj L2([, ]).Note that by orthonormality and CauchySchwarz

uj, ek s/21. Also, sincethe ujform an orthonormal set in L

2([, ]), we have Bessels inequality

j=1

|Bkj|2 =

j=1

uj,

ek s2

2

1, (2.18)

showing that for the elements in each row in B, the sum of squares is finite.Consider first j = 1. Because of (2.15) and (2.16) the terms closest to 1 in

magnitude in the first column of B occur only for k very close to k = 0. A littlecalculus on (2.18) using (2.16) shows that for any integer >0,

1 2

k, then the entries Bkj will be small.But inequality (2.16) does not tell us about the behavior for k close to zero since1/j 1. At this point, we know that the largest entries occur for indices |k| k.However, large entries in the jth column cannot occur for rows where |k| < j becauseof (2.18); that is, entries large in magnitude have already occurred in those rows inprevious columns, and too many large entries in a row would make the sum larger thanone. Therefore, the band of indices for which large entries can occur moves out fromthe origin as j increases, and we observe the pattern of a tilted V shape, as desired.

An analogous argument shows that the matrix involving inner products with vjand the Fourier functions have a similar pattern.

Figure 2.8 shows sections of the infinite matrix B(2.13) forj= 1, . . . , 100andk= 49, . . . , 50for three test problems (to be introduced later). The computationsDo

wnloaded07/26/14to129.107.136.1


IAMl




20/209


were done using techniques from Chapter 3, and both discretization effects and finite-precision effects are visible. The left plot, with the perfect tilted V shape, confirms

that the singular functions of the second derivative problem (see Exercise 2.3) areindeed trigonometric functions. The middle and left plots show the same overallbehavior: each singular function uj is dominated by very few Fourier functions withk j/2. In the rightmost plot, the results for j > 60 are dominated by roundingerrors, but note that the computed singular functions contain solely high-frequencyFourier modes.

2.6 The Story So Far

In this chapter we saw how the first-kind Fredholm integral equation (2.2) provides alinear model for the relationship between the source and the data, as motivated by agravity surveying problem in geophysics. Through the RiemannLebesgue lemma weshowed that for such models the solution can be arbitrarily sensitive to perturbationsof the data, and we gave an example where a square integrable solution does not exist.

We then introduced the singular value expansion (SVE) as a powerful mathemat-ical tool for analysis of first-kind Fredholm integral equations, and we used the SVEto discuss the existence and the instability of the solution. An important outcome ofthis analysis is the Picard condition (2.9) for existence of a square integrable solution.We also briefly discussed the ambiguity problem associated with zero singular valuesiof first-kind Fredholm integral equations.

Finally, we gave a rigorous discussion of the spectral properties of the singularfunctions ui and vi and showed that these functions provide a spectral basis with anincreasing number of oscillations as the index i increases. This result imbues the restof the discussion and analysis in this book.

The main conclusion of our analysis in this chapter is that the right-hand sideg(s) must be sufficiently smooth (as measured by its SVE coefficients) and thatsmall perturbations in the right-hand side are likely to produce solutions that arehighly contaminated by undesired high-frequency oscillations.

Exercises

2.1. Derivation of SVE Solution ExpressionGive the details of the derivation of the expression (2.8) for the solution f(t)to the integral equation (2.2).

2.2. The SVE of a Degenerate KernelThe purpose of this exercise is to illustrate how the singular value expansion(SVE) can give valuable insight into the problem. We consider the simpleintegral equation

1

1

(s+ 2t) f(t) d t=g(s), 1 s 1; (2.19)Downloaded07/26/14to129.107.136.1


IAMl




21/209

Exercises 21

i.e., the kernel is given byK(s, t) =s+ 2t.

This kernel is chosen solely for its simplicity. Show (e.g., by insertion) that theSVE of the kernelK is given by

1 = 4/

3, 2 = 2/

3, 3 = 4 = = 0and

u1(s) = 1/

2, u2(s) =

3/2 s, v1(t) =

3/2 t, v2(t) = 1/

2.

Show that this integral equation has a solution only if the right-hand side isa linear function, i.e., it has the form g(s) = s+ , where and are

constants. Hint: evaluate 1

1K(s, t) f(t) d tfor an arbitrary function f(t).

2.3. The SVE of the Second Derivative ProblemWe can occasionally calculate the SVE analytically. Consider the first-kindFredholm integral equation

1

0 K(s, t) f(t) d t = g(s), 0 s 1, with the

kernel

K(s, t) =

s(t 1), s < t,t(s 1), s t. (2.20)

This equation is associated with the computation of the second derivative of afunction: given a function g, the solution f is the second derivative ofg, i.e.,f(t) = g(t) for0t 1. It is available inRegularization Toolsas the testproblem deriv2. An alternative expression for the kernel is given by

K(s, t) = 22

i=1

sin(i s) sin(i t)

i2 .

Use this relation to derive the expressions

i= 1

(i )2, ui(s) =

2sin(i s), vi(t) =

2sin(i t), i= 1, 2, . . . ,

for the singular values and singular functions. Verify that the singular functionsare orthonormal.

2.4. The Picard ConditionWe continue the study of the second-derivative problem from the previous ex-ercise, for which we know the SVE analytically. Consider the two right-handsides and their Fourier series:

glinear(s) =s

= 2

i=1

(1)i+1sin(i s)i

,

gcubic(s) =s(1 + s) (1 s)

=

12

3

i=1

(1)i+1 sin(i s)

i3 .

Which of these two functions satisfies the Picard condition?Downloaded07/26/14to129.107.136.1


IAMl




22/209

Chapter 3

Getting to Business:

Discretizations of Linear

Inverse Problems

After our encounter with some of the fundamental theoretical aspects of the first-kindFredholm integral equation, it is now time to study how we can turn this beast intosomething that can be represented and manipulated on a computer. We will discussbasic aspects of discretization methods, and we will introduce the matrix-version ofthe SVE, namely, the singular value decomposition (SVD). We finish with a brief lookat noisy discrete problems.

All the matrix problems that we will encounter in this book will involve an m

n

matrix A, a right-hand side b with m elements, and a solution vector x of length n.When m = n the matrix is square, and the problem takes the form of a system oflinear equations,

A x=b, A Rnn, x, b Rn.Whenm > nthe system is overdetermined, and our problem takes the form of a linearleast squares problem,

minx

A x b2, A Rmn, x Rn, b Rm.

We will not cover the case of underdetermined systems (whenm < n), mainly becauseof technicalities in the presentation and in the methods.

3.1 Quadrature and Expansion Methods

In order to handle continuous problems, such as integral equations, on a computer(which is designed to work with numbers) we must replace the problem of computingthe unknown function f in the integral equation (2.2) with a discrete and finite-dimensional problem that we can solve on the computer. Since the underlying integralequation is assumed to be linear, we want to arrive at a system of linear algebraicequations, which we shall refer to as the discrete inverse problem. There are twobasically different approaches to do this, and we summarize both methods below; arigorous discussion of numerical methods for integral equations can be found in [4].

23

Downloaded07/26/14to129.107.136.1


IAMl




23/209

24 Chapter 3. Getting to Business: Discretizations of Linear Inverse Problems

3.1.1 Quadrature Methods

These methods compute approximations fjto the solutionfsolely at selected abscissast1, t2, . . . , t n, i.e.,

fj= f(tj), j= 1, 2, . . . , n .

Notice the tilde: we compute sampled values of some function f that approximatesthe exact solutionf, but we do not obtain samples off itself (unless it happens to beperfectly represented by the simple function underlying the quadrature rule).

Quadrature methodsalso called Nystrm methodstake their basis in the gen-eral quadrature rule of the form

1

0

(t) d t=

n

j=1

j(tj) +En,

where is the function whose integral we want to evaluate, En is the quadratureerror, t1, t2, . . . , t n are the abscissas for the quadrature rule, and 1, 2, . . . , n arethe corresponding weights. For example, for the midpoint rule in the interval [0, 1] wehave

tj = j 12

n , j =

1

n, j= 1, 2, . . . , n . (3.1)

The crucial step in the derivation of a quadrature method is to apply this ruleformallyto the integral, pretending that we know the values f(tj). Notice that theresult is still a function ofs:

(s) =

10

K(s, t) f(t) d t=

nj=1

jK(s, tj) f(tj) +En(s);

here the error termEn is a function ofs. We now enforce the collocationrequirementthat the function must be equal to the right-hand side g at n selected points:

(si) =g(si), i= 1, . . . , n ,

where g(si) are sampled or measured values of the function g. This leads to thefollowing system of relations:

nj=1

jK(si, tj) f(xj) =g(si) En(si), i= 1, . . . , n .

Finally we must, of course, neglect the unknown error term En(si) in each relation,and in doing so we introduce an error in the computed solution. Therefore, we mustreplace the exact values f(tj) with approximate values, denoted by fj, and we arriveat the relations

n

j=1jK(si, tj)fj=g(si), i= 1, . . . , n . (3.2)

We note in passing that we could have used m > n collocation points, in which casewe would obtain an overdetermined system. In this presentation, for simplicity wealways use m= n.D

ownloaded07/26/14to129.107.136.1


IAMl




24/209

3.1. Quadrature and Expansion Methods 25

The relations in (3.2) are just a linear system of linear equations in the unknowns

fj=

f(tj), j= 1, 2, . . . , n .

To see this, we can write the n equations in (3.2) in the form

1K(s1, t1) 2K(s1, t2) nK(s1, tn)1K(s2, t1) 2K(s2, t2) nK(s2, tn)

......

...1K(sn, t1) 2K(sn, t2) nK(sn, tn)

f1f2...

fn

=

g(s1)g(s2)

...g(sn)

or simply A x = b, where A is an n n matrix. The elements of the matrix A, theright-hand side b, and the solution vector xare given by

ai j=jK(si, tj)

xj=f(tj)

bi=g(si)

i , j= 1, . . . , n . (3.3)

3.1.2 Expansion Methods

These methods compute an approximation of the form

f(n)(t) =

n

j=1

j

j(t), (3.4)

where 1(t), . . . , n(t)are expansion functions or basis functions chosen by the user.They should be chosen in such a way that they provide a good description of thesolution; but no matter how good they are, we still compute an approximation f(n)

living in the span of these expansion functions.Expansion methods come in several flavors, and we will focus on the Galerkin

(or PetrovGalerkin) method, in which we must choose two sets of functions i andjfor the solution and the right-hand side, respectively. Then we can write

f(t) =f(n)

(t) +Ef(t), f(n)

span{1, . . . , n},g(s) =g(n)(s) +Eg(s), g

(n) span{1, . . . , n},where Ef and Eg are the expansion errors for f and gdue to the use of a finite setof basis vectors. To be more specific, we write the approximate solution f(n) in theform (3.4), and we want to determine the expansion coefficients j such that f

(n) isan approximate solution to the integral equation. We therefore introduce the function

(s) =

10

K(s, t) f(n)(t) dt=

n

j=1j

10

K(s, t) j(t) d t

and, similarly to f andg, we write this function in the form

(s) =(n)(s) +E(s), (n) span{1, . . . , n}.Do

wnloaded07/26/14to129.107.136.1


IAMl




25/209


How do we choose the functionsi andj? The basis functions ifor the solu-tion should preferably reflect the knowledge we may have about the solutionf (such as

smoothness, monotonicity, periodicity, etc.). Similarly, the basis functions j shouldcapture the main information in the right-hand side. Sometimes, for convenience, thetwo sets of functions are chosen to be identical.

The key step in the Galerkin method is to recognize that, in general, the functionis not identical tog, nor does lie in the subspacespan{1, . . . , n} in whichg(n) lies(leading to the errorE). Let

(n) andg(n) in the above equations be the orthogonalprojections of and gon the subspace span{1, . . . , n} (this ensures that they areunique). The Galerkin approach is then to determine the expansion coefficients jsuch that the two functions (n) and g(n) are identical:

(n)(s) = g(n)(s)

(s) E(s) = g(s) Eg(s) (s) g(s) = E(s) Eg(s).

Here, we recognize the function(s)g(s)as the residual. Since both functions E(s)and Eg(s) are orthogonal to the subspace span{1, . . . , n}, the Galerkin conditioncan be stated as the requirement that the residual (s) g(s) is orthogonal to eachof the functions 1, . . . , n.

In order to arrive at a system of equations for computing the unknowns1, . . . , n,we now enforce the orthogonality of the residual and the i-functions: That is, we

require thati, g = 0 for i= 1, . . . , n .

This is equivalent to requiring that

i, g = i, =

i,

10

K(s, t) f(n)(t) d t

, i= 1, . . . , n .

Inserting the expansion for f(n), we obtain the relation

i, g

=

n

j=1

j

i, 1

0

K(s, t) j

(t) d t , i= 1, . . . , n ,which, once again, is just an n n system of linear algebraic equations A x = bwhose solution coefficients are the unknown expansion coefficients, i.e., xi = i fori= 1, . . . , n, and the elements ofA andbare given by

ai j=

10

10

i(s) K(s, t) j(t) ds dt, (3.5)

bi= 1

0

i(s) g(s) d s. (3.6)

If these integrals are too complicated to be expressed in closed form, they must beevaluated numerically (by means of a suited quadrature formula). Notice that ifK isD

ownloaded07/26/14to129.107.136.1


IAMl




26/209

3.1. Quadrature and Expansion Methods 27

1 0 10

1

2

1 0 10

1

2

1 0 10

1

2

Figure 3.1. Three different ways to plot the solution computed by means ofa quadrature method (using the test problem shawfrom Exercise 3.6). In principle,only the left and middle plots are correct.

symmetric and i = i, then the matrix A is symmetric (and the approach is calledthe RayleighRitz method).

To illustrate the expansion method, let us consider a very simple choice of or-thonormal basis functions, namely, the top hat functions (or scaled indicatorfunctions) on an equidistant grid with interspacing h= 1/n:

i(t) =

h1/2, t [ (i 1)h , ih ],0 elsewhere,

i= 1, 2, . . . , n . (3.7)

These functions are clearly orthogonal (because their supports do not overlap), andthey are scaled to have unit 2-norm. Hence, by choosing

i(t) =

i(t) and

i(s) =

i(s) for i= 1, 2, . . . , n ,

we obtain two sets of orthonormal basis functions for the Galerkin method. The matrixand right-hand side elements are then given by

ai j=h1 ih(i1)h

jh(j1)h

K(s, t) ds dt,

bi=h1/2

ih(i1)h

g(s) d s.

Again, it may be necessary to use quadrature rules to evaluate these integrals.

3.1.3 Which Method to Choose?

This is a natural question for a given application. The answer depends on what isneeded, what is given, and what can be computed.

Quadrature methods are simpler to use and implement than the expansion meth-ods, because the matrix elements qi jare merely samples of the kernel Kat distinctpoints. Also, the particular quadrature method can be chosen to reflect the propertiesof the functionsKandf; for example, we can take into account ifK is singular. Sim-ilarly, if the integration interval is infinite, then we can choose a quadrature methodthat takes this into account.

One important issue to remember is that we compute approximate functionvalues fjonly at the quadrature points; we have information about the solution betweenD

ownloaded07/26/14to129.107.136.1


IAMl




27/209


0 1 2 30

0.5

1

0 1 2 30

0.5

1

0 1 2 30

0.5

1

WRONG!

Figure 3.2. Three different ways to plot the solution computed by means ofan expansion method using top hat basis functions (using the test problem baart).In principle, only the left plot is correct. The right plot is wrong for two reasons: Thescaling byh1/2 is missing, and abscissa values are not located symmetrically in thet-interval[0, ].

these points. Hence, one can argue that the solutions computed by quadrate methodsshould be plotted as shown in the left or middle plots of Figure 3.1; but as n increasesthe difference is marginal.

Expansion methods can be more difficult to derive and implement, because thematrix elements are double integrals, and sometimes numerical quadrature or otherapproximations must be used. The advantage, on the other hand, is that if the basisfunctions are orthonormal, then we have a well-understood relationship between theSVE and the SVD of the matrix. Another advantage is that the basis functions canbe tailored to the application (e.g., we can use spline functions, thin plate smoothingsplines, etc.).

Expansion methods produce an approximate solution that is known everywhereon thet-interval, becausef(n) is expressed in terms of the basis functionsj(t). Thus,one can argue that plots of the solution should always reflect this (see Figure 3.2 foran example using the top hat basis functions); again, as n increases the differenceis marginal. What is important, however, is that any scaling in the basis functionsshould be incorporated when plotting the solution.

In applications, information about the right-hand sidegis almost always availablesolely in the form of (noisy) samples bi = g(si) of this function. This fits very wellwith the quadrature methods using collocation in the sampling points si. For Galerkinmethods there are two natural choices of the basis functions i for the right-hand

side. The choice of delta functions i(s) = (s si) located at the sampling pointsrepresents a perfect sampling ofgbecause, by definition, 1

0

(s si) g(s) d s=g(si).

The choice of the top-hat functions i(s) =i(s) (3.7) corresponds better to howreal data are collected, namely, by basically integrating a signal over a short interval.

3.2 The Singular Value Decomposition

From the discussion in the previous chapter it should be clear that the SVE is a verypowerful tool for analyzing first-kind Fredholm integral equations. In the discretesetting, with finite-dimensional matrices, there is a similarly powerful tool known asD

ownloaded07/26/14to129.107.136.1


IAMl




28/209

3.2. The Singular Value Decomposition 29

thesingular value decomposition (SVD). See, e.g., [5] or [67] for introductions to theSVD.

While the matrices that we encountered in the previous section are square, wehave already foreshadowed that we will treat the more general case of rectangularmatrices. Hence, we assume that the matrix is either square or has more rows thancolumns. Then, for any matrix A Rmn with m n, the SVD takes the form

A= U VT =

ni=1

uiivTi . (3.8)

Here, Rnn is a diagonal matrix with the singular values, satisfying =diag(1, . . . , n), 1

2

n

0.

Two important matrix norms are easily expressed in terms of the singular values:

AF m

i=1

nj=1

a2i j

1/2

=

trace(ATA)1/2

=

trace(V2VT)1/2

=

trace(2)1/2

=

21+ 22+ +2n

1/2A2 maxx2=1 A x2 =1.

(The derivation of the latter expression is omitted here.) The matricesU Rmn andV Rnn consist of the left and right singular vectors

U= (u1, . . . , u n), V = (v1, . . . , v n),

and both matrices have orthonormal columns:

uTi uj=vTi vj=i j, i , j = 1, . . . , n ,

or simplyUTU=VTV =I .

It is straightforward to show that the inverse ofA (if it exists) is given by

A1 =V 1UT.

Thus we haveA12 = 1n , and it follows immediately that the 2-norm conditionnumber ofA is given by

cond(A) = A2 A12 = 1/n.The same expression holds ifA is rectangular and has full rank.

Software for computing the SVD is included in all serious program libraries, andTable 3.1 summarizes some of the most important implementations. The complexityof all these SVD algorithms is

O(m n2) flops, as long as m

n. There is also

software, based on Krylov subspace methods, for computing selected singular valuesof large sparse or structured matrices; for example, MATLABssvdsfunction belongsin this class.D

ownloaded07/26/14to129.107.136.1


IAMl




29/209


Table 3.1. Some available software for computing the SVD.

Software package Subroutine

ACM TOMS HYBSVDEISPACK SVDGNU Sci. Lib. gsl_linalg_SV_decompIMSL LSVRRLAPACK _GESVDLINPACK _SVDCNAG F02WEFNumerical Recipes SVDCMPMATLAB svd

3.2.1 The Role of the SVD

The singular values and vectors obey a number of relations which are similar to theSVE. Perhaps the most important relations are

A vi=iui, A vi2 =i, i= 1, . . . , n , (3.9)and, ifA is square and nonsingular,

A1ui =1i vi, A1ui2 = 1i , i= 1, . . . , n . (3.10)We can use some of these relations to derive an expression for the solution x=A1b.First note that since the matrix V is orthogonal, we can always write the vector x inthe form

x=V VTx=V

vT1 x...

vTn x

= n

i=1

(vTi x) vi,

and similarly for bwe have

b=

n

i=1

(uTi b) ui.

When we use the expression for x, together with the SVD, we obtain

A x=

ni=1

i(vTi x) ui.

By equating the expressions for A x and b, and comparing the coefficients in theexpansions, we immediately see that the naive solution is given by

x=A1b=n

i=1

uTi b

ivi. (3.11)

This expression can also be derived via the SVD relation for A1, but the abovederivation parallels that of the SVE. For the same reasons as in the previous chapterD

ownloaded07/26/14to129.107.136.1


IAMl




30/209

3.2. The Singular Value Decomposition 31

(small singular values iand noisy coefficients uTi b), we do not want to compute this

solution to discretizations of ill-posed problems.

While the SVD is now a standard tool in signal processing and many other fieldsof applied mathematics, and one which many students will encounter during theirstudies, it is interesting to note than the SVD was developed much later than theSVE; cf. [63].

3.2.2 Symmetries

The kernel in the integral equation (2.2) is symmetric ifK(s, t) =K(t, s). Whetherthe corresponding matrixA inherits this symmetry, such that AT =A, depends on thediscretization method used to obtain A; but this is recommended because we want

the linear system of equations to reflect the integral equation.Symmetric matrices have real eigenvalues and orthonormal eigenvectors: A =Udiag(1, . . . , n) U

T with U orthogonal. It follows immediately that the SVD ofAis related to the eigenvalue decomposition as follows: Uis the matrix of eigenvectors,and

= diag(|1|, . . . , |n|), V =

sign(1) u1, . . . , sign(n) un

.

In particular,vi= sign(i) ui fori= 1, . . . , n.Another kind of symmetry sometimes arises: we say that a matrix is persymmetric

if it is symmetric across the antidagonal. In other words, ifJ is the exchange matrix,

J=

1. . .1

,then persymmetry is expressed as

A J= (A J)T =J AT A= J AT J.

Persymmetry ofA dictates symmetry relations between the singular vectors. To derivethese relations, let the symmetric matrixA Jhave the eigenvalue decompositionA J=U UT, where Uis orthogonal; then

A= U UTJ=U (J U)T =

ni=1

ui |i| (sign(i) J ui)T,

and it follows thatvi= sign(i) J ui, i= 1, . . . , n .

This means that, except perhaps for a sign change, the right singular vector vi isidentical to the left singular vector uiwith its elements in reverse order.

If A is both symmetric and persymmetric, then additional symmetries occurbecause ui = sign(i) vi = sign(i) sign(i) J ui for i = 1, . . . , n. In words, the leftand right singular vectors of a symmetric and persymmetric matrix are identical exceptperhaps for a sign change, and the sequence of elements in each vector is symmetricaround the middle elements except perhaps for a sign change.D

ownloaded07/26/14to129.107.136.1


IAMl




31/209


0 32 640.2

0

0.2

v1

0 32 640.2

0

0.2

v2

0 32 640.2

0

0.2

v3

0 32 640.2

0

0.2

v4

0 32 640.2

0

0.2

v5

0 32 640.2

0

0.2

v6

0 32 640.2

0

0.2

v7

0 32 640.2

0

0.2

v8

Figure 3.3. The right singular vectors for the phillips test problem withn= 64. Note thatvi =J vi for i= 1, 3, 5, 7 andvi=

J vi for i= 2, 4, 6, 8.

We illustrate the above with the phillips problem from Regularization Tools,which is one of the classical test problems from the literature. It is stated as Example 1in [58] with several misprints; the equations below are correct. Define the function

(x) =

1 + cos

x3

, |x| i =

= i+ > i++1, the two sub-

spaces span{ui, . . . , u i+} and span{vi, . . . , v i+} are uniquely determined, butthe individual singular vectors are not (except that they must always satisfyA vi=iui).D

ownloaded07/26/14to129.107.136.1


IAMl




32/209

3.3. SVD Analysis and the Discrete Picard Condition 33

For this reason, different computer programs for computing the SVD may give differentresults, and this is not a flaw in the software.

If the matrix A has more rows than columns, i.e., ifm > n, then the SVD comesin two variants. The one used in (3.8) is often referred to as the thin SVD since thematrixU Rmn is rectangular. There is also a full SVD of the form

A= ( U , U )

0

VT,

where the left singular matrix is m mand orthogonal, and the middle matrix is m n.Both versions provide the same information for inverse problems.

If the matrix A has fewer rows than columns, i.e., if m < n, then the SVDtakes the form A = U VT = m

i=1uiiv

T

i , where now U

R

mm, V

Rnm, and

= diag(1, . . . , m) Rmm, and cond(A) =1/m.

3.3 SVD Analysis and the Discrete Picard Condition

The linear systems obtained from discretization of first-kind Fredholm integral equa-tions always exhibit certain characteristics in terms of the SVD, which we will discussin this section. These characteristics are closely connected to the properties of theintegral equation, as expressed through the SVE (see Section 2.3). If you did not readthat section, please skip the first part of this section.

The close relationship between the SVD and the SVE allows us to use the SVD,together with the Galerkin discretization technique, to compute approximations tothe SVE (see [25]). Specifically, if we compute the matrix A according to (3.5), thenthe singular values i ofA are approximations to the singular values i ofK. Moreprecisely, if the Galerkin basis functions are orthonormal, if we define the positivequantity n such that

2n = K22 A2F, K22 10

10

|K(s, t)|2 ds dt,

and if we use (n)

i to denote the singular values of then n matrix A, then it can beshown that

0 i (n)i n, i= 1, . . . , n , (3.13)and

(n)i (n+1)i i, i= 1, . . . , n . (3.14)

That is, the singular values ofA are increasingly better approximations to those ofK,and the error is bounded by the quantity n.

We can also compute approximations to the left and right singular functions viathe SVD. Define the functions

u(n)j (s) =

ni=1

ui ji(s), v(n)j (t) =

ni=1

vi ji(t), j= 1, . . . , n . (3.15)Downloaded07/26/14to129.107.136.1


IAMl




33/209


Then these functions converge to the singular functions, i.e., u(n)j (s) uj(s) and

v(n)j (t)

vj(t), as n

. For example, it can be shown that

max {u1 u(n)1 2,v1 v(n)1 2}

2 n1 2

1/2.

Note that this bound depends on the separation between the first and the secondsingular values. Similar, but more complicated, bounds exist for the other singularfunctions.

A nice, and important, consequence of this property of the singular functions isthat the SVD coefficientsuTi bare approximations to the corresponding inner products

(ui, g) in the SVE basis. To see this, we first note that the inner products

u(n)j , g

(n)

become increasingly better approximations toui, g. We now insert the expressionsforu

(n)j and g

(n) into the definition of their inner product:

u(n)j , g

(n)

=

10

ni=1

ui ji(s)

nk=1

bkk(s) d s

=

ni=1

ui j

nk=1

bk i, k =n

i=1

ui jbi=uTj b.

Hence we can study all the important SVE relations by computing the SVD of thematrixA, and in particular we can use the numerical quantities to investigate whetherthe Picard condition seems to be satisfied.

To illustrate the characteristics of discrete inverse problems, we consider a dis-cretization of the gravity surveying model problem (from Section 2.1) by means of themidpoint quadrature rule; see Exercise 3.5 for details. Figure 3.4 shows the singularvalues for a discretization with n= 64. Clearly, the singular values decay to zero withno gap anywhere in the spectrum. For i 55 they tend to level off, which is due tothe influence of the finite-precision arithmetic (these tiny singular values are smallerthan the machine precision times the largest singular value 1, and they cannot betrusted). The condition number of A is of the order 1018, which, for all practicalpurposes, is infinite.

Figure 3.4 also shows the singular values of the matrix obtained by discretizationof the second derivative problem in (2.20) in Exercise 2.3. In accordance with thetheory mentioned in the previous chapter, these singular values decay slower becausethe kernel K for this problem is less smooth than the kernel for the gravity problem.The kernelK in (2.20) is not differentiable at s=t.

Figure 3.5 shows the first nine left singular vectors ui for the gravity surveyingproblem; the right singular functions are identical, because the matrix is symmetric.We see that the singular vectors have more oscillations as the index i increases, i.e.,as the corresponding i decrease (for this problem, the number of sign changes inthe ith singular vector is i 1). We can safely conclude that the singular functionsbehave in the same manner. We note, however, that while all 64 singular vectorsare orthonormal, by construction, the last 10 vectors do not carry reliable informationD

ownloaded07/26/14to129.107.136.1


IAMl




34/209

3.3. SVD Analysis and the Discrete Picard Condition 35

0 10 20 30 40 50 60

1015

1010

105

100

i

Singular values

Gravity problem2. derivative problem

Figure 3.4. The singular valuesiof two64 64matricesA obtained fromdiscretization of the gravity surveying problem (circles) and the second derivativeproblem from Exercise 2.3 (diamonds).

0 20 40 600.2

0

0.2

u1

0 20 40 600.2

0

0.2

u2

0 20 40 600.2

0

0.2

u3

0 20 40 60

0.2

0

0.2

u4

0 20 40 60

0.2

0

0.2

u5

0 20 40 60

0.2

0

0.2

u6

0 20 40 600.2

0

0.2

u7

0 20 40 600.2

0

0.2

u8

0 20 40 600.2

0

0.2

u9

Figure 3.5. The first 9 left singular vectorsuiof the same matrix as in Figure3.4.

about the singular functions (because the corresponding singular values are so tiny, asseen in Figure 3.4).

Finally, let us investigate the behavior of the SVD coefficientsuTi b(of the right-hand side) and uTi b/i (of the solution). A plot of these coefficients, together withD

ownloaded07/26/14to129.107.136.1


IAMl




35/209


0 20 40 6010

16

1012

108

104

10

0

i

| ui

Tb|

| ui

Tb| /

i

Figure 3.6. The Picard plot for the discretized gravity surveying problemwith no noise in the right-hand side.

the singular values, is often referred to as a Picard plot, and Figure 3.6 shows sucha plot for the discretized gravity surveying problem for a noise-free right-hand side.

Clearly, the coefficients |uT

i b| decay faster than the singular values, until they level offfori 35at a plateau determined by the machine precision, and these tiny coefficientsare less than the machine precision times b2. The solution coefficientsuTi b/i alsodecay for i < 35, but for i 35 they start to increase again due to the inaccuratevalues ofuTi b.

This illustrates that even without errors in the data, we cannot always expectto compute a meaningful solution to the discrete inverse problem. The influence ofrounding errors in the computation of the SVD is enough to destroy the computedsolution. We note that the same is true if the solution x = A1b is computed viaGaussian elimination; only the influence of rounding errors leads to a different, butstill wrong, solution.

Figure 3.7 shows two Picard plots for the gravity problem with two noisy right-hand sides, obtained by adding Gaussian white noise with zero mean to the vectorA x.The standard deviation of the noise is larger in the right plot. Again, we see an initialdecay of the SVD coefficients |uTi b|, until they level off at a plateau determined by thenoise. The solution coefficients also decrease initially, but for larger values of the indexi they start to increase again. The computed solutions are completely dominated bythe SVD coefficients corresponding to the smallest singular values.

This kind of analysis of the SVD coefficients, together with an understanding oftheir relation to the SVE coefficients, leads to the introduction of a discrete versionof the Picard condition. While the solution of a finite-dimensional problem can neverbecome unbounded, its norm can become so large that it is useless in practice. Sincethe SVD coefficients are so closely related to their corresponding SVE coefficients,it is relevant and necessary to introduce the following condition for the discretizedD

ownloaded07/26/14to129.107.136.1


IAMl




36/209

3.4. Convergence and Nonconvergence of SVE Approximation 37

0 20 40 60

1016

1012

108

104

100

i

| ui

Tb|

| ui

Tb| /

i

0 12 24

104

102

100

102

i

| ui

Tb|

| ui

Tb| /

i

Figure 3.7. The Picard plots for the discretized gravity surveying problemwith a noisy right-hand side. The two figures correspond to two different noise levels.The larger the noise level, the fewer SVD coefficients uTi b remain above the noiselevel.

problem (originally proposed and studied in [27]).

The Discrete Picard Condition. Let denote the level at which thecomputed singular values i level off due to rounding errors. The discretePicard condition is satisfied if, for all singular values larger than , the

corresponding coefficients |uTi b|, on average, decay faster than the i.

(3.16)

We emphasize that it is the decayof the i and|uTi b| that matters herenotthe size of these quantities. For example, in Figure 3.7 we see that|uTi b| > i fori= 1, 2, 3, 4, but the discrete Picard condition is still satisfied for i= 1, . . . , 10.

A visual inspection of the Picard plot is often enough to verify if the discretePicard condition is satisfied. One should always ignore the part of the Picard plotthat corresponds to tiny singular values at the machine precision level, and when noiseis present in the right-hand side then one should also ignore all those coefficients forwhich|uTi b| level off at some noise plateau.

To further illustrate the power of the analysis that can be carried out by meansof the Picard plot, we return to Ursells test problem (2.10). Figure 3.8 shows thePicard plot for a discretization of this problem (computed with n = 16). Clearly,the discrete Picard condition is not satisfied; the coefficients |uTi b|decay more slowlythan the singular values i for all singular values larger than the machine precisionlevel. This is a very strong indication that the underlying continuous problem doesnot satisfy the Picard condition. And this is, indeed, the case for this problem.

3.4 Convergence and Nonconvergence of SVE

Approximation

We mentioned in Section 2.3 that for the singular value expansion (SVE) to exist, thekernelKmust be square integrable, that is, for our generic integral equation (2.2) weD

ownloaded07/26/14to129.107.136.1


IAMl




37/209


0 3 6 9 1210

15

1010

105

100

Picard plot

i

| ui

Tb|

| ui

Tb| /

i

Figure 3.8. The Picard plots for the discretized version of Ursells modelproblem(2.10). Clearly, the Picard condition is not satisfied.

must have

1

0 1

0

K(x, t)2 ds dt < .

In this section we illustrate the importance of this requirement with a numerical ex-ample, where we use the SVD to compute approximations to the SVE.

We consider the Laplace transform of a function f , which is defined as thefunctiong(s) =

0 e

st f(t) dt. This transform plays an important role in the studyof dynamical systems. The inverse Laplace transform is then the task of computingthe function f from a given function g, i.e., to solve the first-kind Fredholm integralequation

0

est f(t) dt=g(s), 0 s . (3.17)

This kernel in not square integrable, and hence it does not have an SVE. To see this,we note that a0

(est)2 d s= a0

e2st d s= 1 e2ta

2t 1

2t for a .

Moreover, it is known that the integral0 t

1 dt is infinite. Hence we conclude that0

0 (e

st)2 ds dt is also infinite.However, if f(t) stays bounded for t , then the integral in (3.17) exists

for all s 0, and we introduce only a small error by truncating the integration atsome (large) numbera. Moreover, the functiong(s)will decay fors , and againwe introduce only a small error by limiting s to the interval [0, a]. The so-obtainedintegral equation a

0

est f(t) d t=g(s), 0 s a,Downloaded07/26/14to129.107.136.1


IAMl




38/209

3.5. A Closer Look at Data with White Noise 39

0 51

0

1

u1(n)(s)

0 51

0

1

u2(n)(s)

0 51

0

1

u3(n)(s)

0 51

0

1

u4(n)(s)

0 51

0

1

u5(n)(s)

0 51

0

1

u6(n)(s)

0 51

0

1

u7(n)(s)

0 51

0

1

u8(n)(s)

Figure 3.9. Approximationsu(n)j to the left singular functions for the inverseLaplace transform restricted to the domain[0, 5][0, 5], such that the kernel is squareintegrable and the SVE exists. We usen = 16(dotted lines), n = 32(solid lines), andn= 64(dashed lines), and the approximations converge to the singular functions.

has a square integrable kernel (becauseK(s, t) =est is bounded and both integrationintervals are finite).

For illustration, we discretize this integral equation with the top hat basisfunctions (3.7) withh = a/n, and for the integrals in (3.5) we use the approximations

ai j=h K

(i 12 )h, (j 12)h

.

Figure 3.9 shows the approximate singular functions u(n)j , given by (3.15), for a = 5

andn = 16, 32, and 64. It is clear that when the inverse Laplace transform is restrictedto the finite domain [0, 5] [0, 5], such that the kernel is square integrable and theSVE exists, then the approximationsu

(n)j indeed converge to the singular functions for

this kernel. The convergence is faster for small values ofs.We now consider the behavior of the approximations u

(n)j when we fix n = 128

and increase the size a of the domain [0, a] [0, a], still using the Galerkin approachwith the top hat basis functions. The results are shown in Figure 3.10, and we see no

sign of convergence as we increase a. Of course, we cannot expect any convergencesince the SVE fails to exist as a , and our experiment is in accordance with thisfact.

We note in passing that the test problem i_laplace in Regularization Toolsuses a different discretization of the inverse Laplace transform (3.17), as described in[72]. Due to the nonexistence of the SVE for this problem, the SVD of the matrixproduced by i_laplacedoes not exhibit any convergence as the order n increases.

3.5 A Closer Look at Data with White Noise

As we have illustrated above, when solving problems with real, measured data, thetrouble-maker is the noise component in the right-hand side. The best way to dealwith trouble-makers is to study them carefully in order to be able to limit their damage.D

ownloaded07/26/14to129.107.136.1


IAMl

icenseo

Documents

Discrete Inverse Problem - Insight and Algorithms