Distributions - gatech.edu

Chapter 6

Distributions

6.1 INTRODUCTION

The notion of a weak derivative is an indispensable tool in dealing withpartial differential equations. Its advantage is that it allows one to dispensewith subtle questions about differentiation, such as the interchange of partialderivatives. Its main point is that every locally integrable function canbe weakly differentiated indefinitely many times, just as though it were aC∞-function. The weakening of the notion of a derivative makes it easierto find solutions to equations and, once found, these ‘weak’ solutions canthen be analyzed to find out if they are, in fact, truly differentiable inthe classical sense. An analogy in elementary algebra might be trying tosolve a polynomial equation by rational numbers. It is extremely important,at the beginning of the investigation, to know that solutions always existin the larger category of real numbers; many techniques are available forthis purpose, e.g. Rolle’s theorem, that are not available in the categoryof rationals. Later on one can try to prove that the solutions are, in fact,rational.

A theory developed around the notion that every L1loc-function is differ-

entiable is the theory of distributions (see [Schwartz], [Hormander], [Rudin,1991], [Reed–Simon, vol. 1]). Although we do not present some of the deeperaspects of this theory we shall state its basic techniques. In the following,for completeness, we define distributions for an arbitrary open set Ω in Rn

but, in fact, we shall mainly need the case Ω = Rn in the rest of the book.

127

128 Distributions

6.2 TEST FUNCTIONS (The space D(Ω))

Let Ω be an open, nonempty set in Rn; in particular Ω can be Rn itself. Recallfrom Sect. 1.1 that C∞c (Ω) denotes the space of all infinitely differentiable,complex-valued functions whose support is compact and in Ω. Recall alsothat the support of a continuous function is defined to be the closure of theset on which the function does not vanish, and compactness means that theclosed set is also contained in some ball of finite radius. Note that Ω is nevercompact.

The space of test functions, D(Ω), consists of all the functions inC∞c (Ω) supplemented by the following notion of convergence: A sequenceφm ∈ C∞c (Ω) converges in D(Ω) to the function φ ∈ C∞c (Ω) if and onlyif there is some fixed, compact set K ⊂ Ω such that the support of φm−φ isin K for all m and, for each choice of the nonnegative integers α1, . . . , αn,

(∂/∂x1)α1 · · · (∂/∂xn)αnφm −→ (∂/∂x1)α1 · · · (∂/∂xn)αnφ (1)

as m→∞, uniformly on K. To say that a sequence of continuous functionsψm converges to ψ uniformly on K means that

supx∈K|ψm(x)− ψ(x)| → 0 as m→∞.

D(Ω) is a linear space, i.e., functions can be added and multiplied by (com-plex) scalars.

6.3 DEFINITION OF DISTRIBUTIONS AND THEIRCONVERGENCE

A distribution T is a continuous linear functional onD(Ω), i.e., T : D(Ω)→C such that for φ, φ1, φ2 ∈ D(Ω) and λ ∈ C

T (φ1 + φ2) = T (φ1) + T (φ2) and T (λφ) = λT (φ), (1)

and continuity means that whenever φn ∈ D(Ω) and φn → φ in D(Ω)

T (φn)→ T (φ).

Distributions can be added and multiplied by complex scalars. Thislinear space is denoted by D′(Ω), the dual space of D(Ω).

Sections 6.2–6.4 129

There is the obvious notion of convergence of distributions: A sequenceof distributions T j ∈ D′(Ω) converges in D′(Ω) to T ∈ D′(Ω) if, for everyφ ∈ D(Ω), the numbers T j(φ) converge to T (φ).

One might suspect that this kind of convergence is rather weak. Indeed,it is! For example, we shall see in Sect. 6.6, where we develop a notion of thederivative of a distribution, that for any converging sequence of distributionstheir derivatives converge too, i.e., differentiation is a continuous operationin D′(Ω). This contrasts with ordinary pointwise convergence because thederivatives of a pointwise converging sequence of functions need not, ingeneral, converge anywhere.

Another instance, as we shall see in Sect. 6.13, is that any distributioncan be approximated in D′(Ω) by functions in C∞(Ω). To make sense ofthat statement, we first have to define what it means for a function to be adistribution. This is done in the next section.

6.4 LOCALLY SUMMABLE FUNCTIONS, Lploc(Ω)

The foremost example of distributions are functions themselves. We beginby defining the space of locally pth-power summable functions, Lploc(Ω),for 1 ≤ p ≤ ∞. Such functions are Borel measurable functions defined onall of Ω and with the property that

‖f‖Lp(K) <∞ (1)

for every compact set K ⊂ Ω. Equivalently, it suffices to require (1) to holdwhen K is any closed ball in Ω.

A sequence of functions f1, f2, . . . in Lploc(Ω) is said to converge (orconverge strongly) to f in Lploc(Ω) (denoted by f j → f) if f j → f inLp(K) in the usual sense (see Theorem 2.7) for every compact K ⊂ Ω.Likewise, f j converges weakly to f if f j f weakly in every Lp(K)(2.9(6)).

Note (for general p ≥ 1) that Lploc(Ω) is a vector space but it does nothave a simply defined norm. Furthermore, f ∈ Lploc(Ω) does not imply thatf ∈ Lp(Ω). Clearly, Lploc(Ω) ⊃ Lp(Ω) and, if r > p, we have the inclusion

Lploc(Ω) ⊃ Lrloc(Ω),

by Holder’s inequality (but it is false—unless Ω has finite measure—thatLp(Ω) ⊃ Lr(Ω)).

130 Distributions

As far as distributions are concerned, L1loc(Ω) is the most important

space. Let f be a function in L1loc(Ω). For any φ in D(Ω) it makes sense to

considerTf (φ) :=

∫Ωfφdx, (2)

which obviously defines a linear functional on D(Ω). Tf is also continuoussince

|Tf (φ)− Tf (φm)| =∣∣∣∣∫

Ω(φ(x)− φm(x))f(x) dx

∣∣∣∣≤ sup

x∈K|φ(x)− φm(x)|

∫K|f(x)| dx,

which tends to zero by the uniform convergence of the φm’s. Thus Tf is inD′(Ω). If a distribution T is given by (2) for some f ∈ L1

loc(Ω), we say thatthe distribution T is the function f . This terminology will be justifiedin the next section.

An important example of a distribution that is not of this form is theso-called Dirac ‘delta-function’, which is not a function at all:

δx(φ) = φ(x) (3)

with x ∈ Ω fixed. It is obvious that δx ∈ D′(Ω). Thus, the delta-measure ofSect. 1.2(6), like any Borel measure, can also be considered to be a distri-bution. In fact, one can say that it was partly the attempt to understandthe true mathematical meaning of the delta function, which had been usedso successfully by physicists and engineers, that led to the theory of distri-butions.

Although D(Ω), the space of test functions, is a very restricted class offunctions it is large enough to distinguish functions in D′(Ω), as we nowshow.

6.5 THEOREM (Functions are uniquely determined bydistributions)

Let Ω ⊂ Rn be open and let f and g be functions in L1loc(Ω). Suppose that

the distributions defined by f and g are equal, i.e.,∫Ωfφ =

∫Ωgφ (1)

for all φ ∈ D(Ω). Then f(x) = g(x) for almost every x in Ω.


PROOF. For m = 1, 2, . . . let Ωm be the set of points x ∈ Ω such thatx + y ∈ Ω whenever |y| ≤ 1

m . Ωm is open. Let j be in C∞c (Rn) withsupport in the unit ball and with

∫Rn j = 1. Define jm(x) = mnj(mx).

Fix M . If m ≥ M , then, by (1) with φ(y) = jm(x − y), we have that(jm ∗ f)(x) = (jm ∗ g)(x) for all x ∈ ΩM (see Sect. 2.15 for the definitionof the convolution ∗). By Theorem 2.16, jm ∗ f → f and jm ∗ g → g inL1

loc(ΩM ) as m → ∞. Thus f = g in L1loc(ΩM ) and therefore f(x) = g(x)

for almost every x ∈ ΩM . Finally let M tend to ∞.

6.6 DERIVATIVES OF DISTRIBUTIONS

We now define the notion of distributional or weak derivative. LetT be in D′(Ω) and let α1, . . . , αn be nonnegative integers. We define thedistribution (∂/∂x1)α1 · · · (∂/∂xn)αnT , denoted by DαT , by its action oneach φ ∈ D(Ω) as follows:

(DαT )(φ) = (−1)|α|T (Dαφ) (1)

with the notation

|α| =n∑i=1

αi. (2)

The symbol∂iT

denotes DαT in the special case αi = 1, αj = 0 for j 6= i.The symbol ∇T , called the distributional gradient of T , denotes the

n-tuple (∂1T, ∂2T, . . . , ∂nT ).

If f is a C |α|(Ω)-function (not necessarily of compact support), then

(DαTf )(φ) := (−1)|α|∫

Ω(Dαφ)f dx =

∫Ω

(Dαf)φ dx =: TDαf (φ),

where the middle equality holds by partial integration. Hence the notionof weak derivative extends the classical one and it agrees with the classicalone whenever the classical derivative exists and is continuous (see Theorem6.10 (equivalence of classical and distributional derivatives)). Obviously, inthis weak sense, every distribution is infinitely often differentiable and thisis one of the main virtues of the theory. Note however, that the distribu-tional derivative of a nondifferentiable function (in the classical sense) is notnecessarily a function.

132 Distributions

Let us show that DαT actually is a distribution. Obviously it is linear,so we only have to check its continuity on D(Ω). Let φm → φ in D(Ω). ThenDαφm → Dαφ in D(Ω) since

suppDαφm −Dαφ ⊂ suppφm − φ ⊂ Kand

Dβ(Dαφm −Dαφ) = Dβ+αφm −Dβ+αφ

converges to zero uniformly on compact sets. [Here β + α simply denotesthe multi-index given by (β1 + α1, β2 + α2, . . . , βn + αn)]. Thus, Dαφ andDαφm are themselves functions in D(Ω) with Dαφm → Dαφ as m → ∞.Hence, as m→∞,

(DαT )(φm) := (−1)|α|T (Dαφm) −→ (−1)|α|T (Dαφ) =: (DαT )(φ).

We end this section by showing that differentiation of distributions is acontinuous operation in D′(Ω). Indeed, if T j(φ) → T (φ) for all φ ∈ D(Ω),then, by the definition of the derivative of a distribution

(DαT j)(φ) = (−1)|α|T j(Dαφ) →j→∞

(−1)|α|T (Dαφ) = (DαT )(φ)

since Dαφ ∈ D(Ω).

6.7 DEFINITION OF W1,ploc (Ω) AND W 1,p(Ω)

L1loc(Ω)-functions are an important class of distributions, but we can usefully

refine that class by studying functions whose distributional derivatives arealso L1

loc(Ω)-functions. This class is denoted by W 1,1loc (Ω). Furthermore, just

as Lploc(Ω) is related to L1loc(Ω) we can also define the class of functions

W 1,ploc (Ω) for each 1 ≤ p ≤ ∞. Thus,

W 1,ploc (Ω) =

f : Ω→ C : f ∈ Lploc(Ω) and ∂if, as a distribution

in D′(Ω), is an Lploc(Ω)-function for i = 1, . . . , n.

We urge the reader not to use the symbol ∇f at first, since it is tempting toapply the rules of calculus which we have not established yet. One shouldjust think of ∇f as n functions g = (g1, . . . , gn), each of which is in Lploc(Ω),such that ∫

Ωf∇φ = −

∫Ω

gφ for all φ ∈ D(Ω).

This set of functions, W 1,ploc (Ω), forms a vector space but not a normed one.

We have the inclusion W 1,ploc (Ω) ⊃W 1,r

loc (Ω) if r > p.

We can also define W 1,p(Ω) ⊂W 1,ploc (Ω) analogously:


W 1,p(Ω) = f : Ω→ C : f and ∂if are in Lp(Ω) for i = 1, . . . , n.

We can make W 1,p(Ω) into a normed space, by defining

‖f‖W 1,p(Ω) =

‖f‖pLp(Ω) +n∑j=1

‖∂jf‖pLp(Ω)

1/p

(1)

and it is complete, i.e., every Cauchy sequence in this norm has a limitin W 1,p(Ω). This follows easily from the completeness of Lp(Ω) (Theorem2.7) together with the definition 6.6 of the distributional derivative, i.e., iff j → f and ∂if

j → gi, then it follows that gi = ∂if in D′(Ω). The proofis a simple adaptation of the one for W 1,2(Ω) = H1(Ω) in Theorem 7.3 (seeRemark 7.5). We leave the details to the reader.

The spaces W 1,p(Ω) are called Sobolev spaces. In this chapter onlyW 1,1

loc (Ω) will play a role.The superscript 1 in W 1,p(Ω) denotes the fact that the first derivatives

of f are pth-power summable functions.As with Lp(Ω) and Lploc(Ω), we can define the notions of strong and

weak convergence in the spaces W 1,ploc (Ω) or W 1,p(Ω) of a sequence of

functions f1, f2, . . . to a function f . Strong convergence simply meansthat the sequence converges strongly to f in Lp(Ω) and the n sequences∂1f

j, . . . , ∂nf j, formed from the derivatives of f j , converge in Lp(Ω) tothe n functions ∂1f, . . . , ∂nf in Lp(Ω). In the case of W 1,p

loc (Ω) we requirethis convergence only on every compact subset of Ω. Weak convergence inW 1,p(Ω) is defined similarly, i.e., for each Lp

′(Ω)-function g we require that∫

g(f j − f) → 0 and, for each i,∫g(∂if j − ∂if) → 0 as j → ∞. Here,

1/p + 1/p′ = 1. For W 1,ploc (Ω) we require this only for each g with compact

support.Similar definitions apply to Wm,p(Ω) and Wm,p

loc (Ω) with m > 1. Thefirst m derivatives of these functions are Lp(Ω)-functions and, similarly to(1),

‖f‖pWm,p(Ω) := ‖f‖pLp(Ω) +n∑j=1

‖∂jf‖pLp(Ω)

+ · · ·+n∑

j1=1

· · ·n∑

jm=1

‖∂j1 · · · ∂jmf‖pLp(Ω).

(2)

134 Distributions

• In the following it will be convenient to denote by φz the function φtranslated by z ∈ Rn, i.e.,

φz(x) := φ(x− z). (3)

6.8 LEMMA (Interchanging convolutions withdistributions)

Let Ω ⊂ Rn be open and let φ ∈ D(Ω). Let Oφ ⊂ Rn be the set

Oφ = y : suppφy ⊂ Ω.

It is elementary that Oφ is open and not empty. Let T ∈ D′(Ω). Then thefunction y 7→ T (φy) is in C∞(Oφ). In fact, with Dα

y denoting derivativeswith respect to y,

Dαy T (φy) = (−1)|α|T ((Dαφ)y) = (DαT )(φy). (1)

Now let ψ ∈ L1(Oφ) have compact support. Then∫Oφψ(y)T (φy) dy = T (ψ ∗ φ). (2)

PROOF. If y ∈ Oφ and if ε > 0 is chosen so that y+ z ∈ Oφ for all |z| < ε,we have that for all x ∈ Ω

|φy(x)− φy+z(x)| = |φ(x− y)− φ(x− y − z)| < Cε (3)

for some number C <∞. This is so because φ has continuous derivatives and(since it has compact support) these derivatives are uniformly continuous.For the same reason, (3) holds for all derivatives of φ (with C depending onthe order of the derivative). This means that φy+z converges to φy as z → 0in D(Ω) (see Sect. 6.2). Therefore, T (φy+z) → T (φy) as z → 0, and thusy 7→ T (φy) is continuous on Oφ.

Similarly, we have that

[φ(x+ δz)− φ(x)]/δ −∇φ(x) · z ≤ C ′δ|z|

and thus, by a similar argument, y 7→ T (φy) is differentiable. Continuing inthis manner we find that (1) holds.

To prove (2) it suffices to assume that ψ ∈ C∞c (Oφ). To verify this, we


use Theorem 2.16 to find, for each δ > 0, ψδ ∈ C∞c (Oφ) so that∫Oφ |ψ

δ−ψ| <δ. In fact, we can assume that suppψδ is contained in some fixed compactsubset, K, of Oφ, independent of δ. Then∣∣∣∣∫ ψ(y)− ψδ(y)

T (φy) dy

∣∣∣∣ ≤ δ sup|T (φy)| : y ∈ K

.

It is also easy to see that ψδ ∗ φ converges to ψ ∗ φ in D(Ω) and thereforeT (ψδ ∗ φ)→ T (ψ ∗ φ).

With ψ now in C∞c (Oφ) we note that the integrand in (2) is a productof two C∞c -functions. Hence the integral can be taken as a Riemann integraland thus can be approximated by finite sums of the form

∆m

m∑j=1

ψ(yj)T (φyj ) with ∆m → 0 as m→∞.

Likewise, for any multi-index α, (Dα(ψ ∗ φ))(x) is uniformly approximatedby ∆m

∑mj=1 ψ(yj)Dαφ(x− yj) as m→∞ (because φ ∈ C∞c (Ω)). Note that

for m sufficiently large every member of this sequence has support in a fixedcompact set K ⊂ Ω. Since Tm is continuous (by definition) and the functionηm(x) = ∆m

∑nj=1 ψ(yj)φ(x − yj) converges to (ψ ∗ φ)(x) as m → ∞, we

conclude that T (ηm) converges to T (ψ ∗ φ) as m→∞.

6.9 THEOREM (Fundamental theorem of calculus fordistributions)

Let Ω ⊂ Rn be open, let T ∈ D′(Ω) be a distribution and let φ ∈ D(Ω) be atest function. Suppose that for some y ∈ Rn the function φty is also in D(Ω)for all 0 ≤ t ≤ 1 (see 6.7(3)). Then

T (φy)− T (φ) =∫ 1

0

n∑j=1

yj(∂jT )(φty) dt. (1)

As a particular case of (1), suppose that f ∈ W 1,1loc (Rn). Then, for each

y in Rn and almost every x ∈ Rn,

f(x+ y)− f(x) =∫ 1

0y · ∇f(x+ ty) dt. (2)

136 Distributions

PROOF. Let Oφ = z ∈ Rn : φz ∈ D(Ω). It is clearly open andnonempty. Denote the right side of (1) by F (y). Observe that by Lemma 6.8,z 7→ (∂jT )(φz) is a C∞-function on Oφ and ∂(∂jT (φz))/∂zi = −∂jT (∂iφz).

With this infinite differentiability in mind we can now interchange deriva-tives and integrals, and compute

∂iF (y) = −n∑j=1

∫ 1

0t(∂jT )(∂iφty)yj dt+

∫ 1

0(∂iT )(φty) dt.

The first term is, by the definition of the derivative of a distribution,n∑j=1

∫ 1

0tT (∂j∂iφty)yj dt = −

∫ 1

0

n∑j=1

t(∂iT )(∂jφty)yj dt,

which can be rewritten (for the same reason as before) as∫ 1

0t

ddt

(∂iT )(φty) dt.

A simple integration by parts then yields ∂iF (y) = (∂iT )(φy). The functiony 7→ G(y) = T (φy) − T (φ) is also C∞ in y (by Lemma 6.8) and also has(∂iT )(φy) as its partial derivatives. Since F (0) = G(0) = 0, the two C∞-functions F and G must be the same. This proves (1).

To prove (2), note that since

(∂jf)(φty) =∫φ(x)(∂jf)(x+ ty) dx,

(1) implies that∫Rnφ(x)[f(x+ y)− f(x)] dx =

∫ 1

0

n∑j=1

yj

∫Rnφ(x)(∂jf)(x+ ty) dx

dt.

Since φ has compact support, the integrand is (t, x) integrable (even if ∂jf 6∈L1(Rn)), and hence Fubini’s theorem can be used to interchange the t andx integrations. Conclusion (2) then follows from Theorem 6.5.

6.10 THEOREM (Equivalence of classical anddistributional derivatives)

Sections 6.9–6.10 137

Let Ω ⊂ Rn be open, let T ∈ D′(Ω) and set Gi := ∂iT ∈ D′(Ω) for i =1, 2, . . . , n. The following are equivalent.

(i) T is a function f ∈ C1(Ω).

(ii) Gi is a function gi ∈ C0(Ω) for each i = 1, . . . , n.

In each case, gi is ∂f/∂xi, the classical derivative of f .

138 Distributions

REMARK. The assertion f ∈ C1(Ω) means, of course, that there is aC1(Ω)-function in the equivalence class of f . A similar remark applies togi ∈ C0(Ω).

PROOF. (i) ⇒ (ii). Gi(φ) = (∂iT )(φ) = −∫

Ω(∂iφ)f by the definitionof distributional derivative. On the other hand, the classical integration byparts formula yields ∫

Ω(∂iφ)f = −

∫Ωφ(∂f/∂xi)

since φ has compact support in Ω and f ∈ C1(Ω). Therefore, by the termi-nology of Sect. 6.4 and Theorem 6.5, Gi is the function ∂f/∂xi.

(ii) ⇒ (i). Fix R > 0 and let ω = x ∈ Ω : |x − z| > R for allz 6∈ Ω. Clearly ω is open and nonempty for R small enough, which wehenceforth assume. Take φ ∈ D(ω) ⊂ D(Ω) and |y| < R. Then φty ∈ D(Ω)for −1 ≤ t ≤ 1. By 6.9(1) and Fubini’s theorem

T (φy)− T (φ) =∫ 1

0

n∑j=1

yj

∫ωgj(x)φ(x− ty) dx dt

=∫ω

∫ 1

0

n∑j=1

gj(x+ ty)yj dt

φ(x) dx.

(1)

Pick ψ ∈ C∞c (Rn) nonnegative with suppψ ⊂ B := y : |y| < Rand

∫ψ = 1. The convolution

∫B ψ(y)φ(x − y) dy with φ ∈ D(ω) defines

a function in D(Ω). Integrating (1) against ψ we obtain, using Fubini’stheorem, ∫

Bψ(y)T (φy) dy − T (φ)

=∫ω

n∑j=1

∫Bψ(y)

∫ 1

0yjgj(x+ ty) dtdy

φ(x) dx.(2)

The first term on the left is∫ω φ(x)T (ψx) dx, which follows from Lemma 6.8

by noting that ψx for x ∈ ω is an element of D(Ω). Hence

T (φ) =∫ω

T (ψx)−n∑j=1

∫Bψ(y)

∫ 1

0yjgj(x+ ty) dtdy

φ(x) dx,

6.12 139

which displays T explicitly as a function, which we denote by f .Finally, by Theorem 6.9(2)

f(x+ y)− f(x) =∫ 1

0

n∑j=1

gj(x+ ty)yj dt (3)

for x ∈ ω and |y| < R. The right side is

n∑j=1

gj(x)yj + o(|y|)

and this proves that f ∈ C1(ω) with derivatives gi. This suffices, since xcan be arbitrarily chosen in Ω by choosing R to be small enough.

The following is a special case of Theorem 6.10, which we state separatelyfor emphasis.

6.11 THEOREM (Distributions with zero derivatives areconstants)

Let Ω ⊂ Rn be a connected, open set and let T ∈ D′(Ω). Suppose that∂iT = 0 for each i = 1, . . . , n. Then there is a constant C such that

T (φ) = C

∫Ωφ

for all φ ∈ D(Ω).

PROOF. By Theorem 6.10, T is a C1(Ω)-function, f , and ∂f/∂xi = 0.Application of 6.10(3) to f shows that f is constant.

6.12 MULTIPLICATION AND CONVOLUTION OFDISTRIBUTIONS BY C∞-FUNCTIONS

A useful fact is that distributions can be multiplied by C∞-functions. Con-sider T in D′(Ω) and ψ in C∞(Ω). Define the product ψT by its action onφ ∈ D(Ω) as

(ψT )(φ) := T (ψφ) (1)

140 Distributions

for all φ ∈ D(Ω). That ψT is a distribution follows from the fact that theproduct ψφ ∈ C∞c (Ω) if φ ∈ C∞c (Ω). Moreover, if φn → φ in D(Ω), thenψφn → ψφ in D(Ω). To differentiate ψT we simply apply the product rule,namely

∂i(ψT )(φ) = ψ(∂iT )(φ) + (∂iψ)T (φ), (2)

which is easily verified from the basic definition 6.6(1) and Leibniz’s differ-entiation formula ∂i(ψφ) = φ∂iψ + ψ∂iφ for C∞-functions.

Observe that when T = Tf for some f in L1loc(Ω), then ψT = Tψf . If,

moreover, f ∈W 1,ploc (Ω), then ψf ∈W 1,p

loc (Ω) and (2) reads

∂i(fψ)(x) = f(x)∂iψ(x) + ψ(x)(∂if)(x) (3)

for almost every x. The same holds for W 1,p(Ω) and it also clearly extendsto W k,p

loc (Ω) and W k,p(Ω).The convolution of a distribution T with a C∞c (Rn)-function j is

defined by

(j ∗ T )(φ) := T (jR ∗ φ) = T

(∫Rnj(y)φ−y dy

)(4)

for all φ ∈ D(Rn), where jR(x) := j(−x). Since jR ∗ φ ∈ C∞c (Rn), j ∗ Tmakes sense and is in D′(Rn). The reader can check that when T is afunction, i.e., T = Tf , then, with this definition, (j ∗Tf )(φ) = Tj∗f (φ) where(j ∗ f)(x) =

∫Rn j(x− y)f(y) dy is the usual convolution.

I Note the requirement that j must have compact support.

6.13 THEOREM (Approximation of distributions byC∞-functions)

Let T ∈ D′(Rn) and let j ∈ C∞c (Rn). Then there exists a function t ∈C∞(Rn) (depending only on T and j) such that

(j ∗ T )(φ) =∫

Rnt(y)φ(y) dy (1)

for every φ ∈ D(Rn). If we further assume that∫

Rn j = 1, and if we setjε(x) = ε−nj(x/ε) for ε > 0, then jε ∗T converges to T in D′(Rn) as ε→ 0.

Sections 6.13–6.15 141

PROOF. By definition we have that

(j ∗ T )(φ) := T (jR ∗ φ) = T

(∫Rnj(y − ·)φ(y) dy

)which, by Lemma 6.8, equals

∫Rn T (j(y−·))φ(y) dy. If we now define t(y) :=

T (j(y − ·)), then, by 6.8(1), t ∈ C∞(Rn). This proves (1). To verify theconvergence of jε ∗ T to T , simply observe that

(jε ∗ T )(φ) := T

(∫Rnjε(y)φ−y dy

)=∫

Rnjε(y)T (φ−y) dy =

∫Rnj(y)T (φ−εy) dy

(2)

by changing variables. It is clear that the last term in (2) tends to T (φ)since T (φ−y) is C∞ as a function of y, and j has compact support.

• The kernel or null-space of a distribution T ∈ D′(Ω) is defined byNT = φ ∈ D(Ω) : T (φ) = 0. It forms a closed linear subspace of D(Ω).The following theorem about the intersection of kernels is useful in connec-tion with Lagrange multipliers in the calculus of variations. (See Sect. 11.6.)

6.14 THEOREM (Linear dependence of distributions)

Let S1, . . . , SN ∈ D′(Ω) be distributions. Suppose that T ∈ D′(Ω) has theproperty that T (φ) = 0 for all φ ∈

⋂Ni=1NSi. Then there exist complex

numbers c1, . . . , cN such that

T =N∑i=1

ciSi. (1)

PROOF. Without loss of generality it can be assumed that the Si’s arelinearly independent. First, we show that there exist N fixed functionsu1, . . . , uN ∈ D(Ω) such that every φ ∈ D(Ω) can be written as

φ = v +N∑i=1

λi(φ)ui (2)

142 Distributions

for some λi(φ) ∈ C, i = 1, . . . , N , and v ∈⋂Ni=1NSi . To see this consider

the set of vectorsV = S(φ) : φ ∈ D(Ω), (3)

where S(φ) = (S1(φ), . . . , SN (φ)). It is obvious that V is a vector spaceof dimension N since the Si’s are linearly independent. Hence there existfunctions u1, . . . , uN ∈ D(Ω) such that S(u1), . . . , S(uN ) span V . Thus, theN ×N matrix given by Mij = Si(uj) is invertible. With

λi(φ) =N∑j=1

(M−1)ijSj(φ), (4)

it is easily seen that (2) holds.Applying T to formula (2) yields (using T (v) = 0)

T (φ) =N∑

i,j=1

(M−1)ijT (ui)Sj(φ),

which gives (1) with ci =∑N

j=1(M−1)jiT (uj).

6.15 THEOREM (C∞(Ω) is ‘dense’ in W1,ploc (Ω))

Let f be in W 1,ploc (Ω). For any open set O with the property that there exists

a compact set K ⊂ Ω such that O ⊂ K ⊂ Ω, we can find a sequencef1, f2, f3, . . . ∈ C∞(O) such that

‖f − fk‖Lp(O) +∑i

‖∂if − ∂ifk‖Lp(O) → 0 as k →∞. (1)

PROOF. For ε > 0 consider the function jε ∗ f , where jε(x) = ε−nj(x/ε)and j is a C∞-function with support in the unit ball centered at the originwith

∫Rn j(x) dx = 1. For any open set O with the properties stated above

we have that jε ∗ f ∈ C∞(O) if ε is sufficiently small since, on O,

Dα(jε ∗ f)(x) =∫

Rn(Dαjε)(x− y)f(y) dy

Sections 6.14–6.15 143

for derivatives of any order α. Further, since O ⊂ K ⊂ Ω with K compact,we can assume, by choosing ε small enough, that

O + suppjε := x+ z : x ∈ O, z ∈ suppjε ⊂ K.

Thus, since

∂i

∫Kjε(x− y)f(y) dy =

∫Kjε(x− y)(∂if)(y) dy,

and since f and ∂if are in Lp(K) for i = 1, . . . , n, (1) follows from Theorem2.16 by choosing ε = 1/k with k large enough.

• The reader is invited to jump ahead for the moment and compareTheorem 6.15 for p = 2 with the much deeper Meyers–Serrin Theorem 7.6(density of C∞(Ω) in H1(Ω)). The latter easily generalizes to p 6= 2, i.e.,to W 1,p(Ω) and, in each case, implies 6.15. The important point is thatif f ∈ H1(Ω), then ∇f(x) can go to infinity as x goes to the boundary ofΩ. Thus, convergence of the smooth functions fk to f in the H1(Ω)-norm,as in 7.6, is not easy to achieve. Theorem 6.15 only requires convergencearbitrarily close to, but not up to, the boundary of Ω. The sequence fk in6.15 is allowed to depend on the open subset O ⊂ Ω. In contrast, in 7.6 thefixed sequence fk must yield convergence in H1(Ω). On the other hand, afunction in W 1,2

loc (Ω) need not be in H1(Ω); it need not even be in L1(Ω).

6.16 THEOREM (Chain rule)

Let G : RN → C be a differentiable function with bounded and continuousderivatives. We denote it explicitly by G(s1, . . . , sN ). If

u(x) = (u1(x), . . . , uN (x))

denotes N functions in W 1,ploc (Ω), then the function K : Ω→ C given by

K(x) = (G u)(x) = G(u(x))

is in W 1,ploc (Ω) and

∂

∂xiK =

N∑k=1

∂G

∂sk(u) · ∂uk

∂xi(1)

144 Distributions

in D′(Ω).If u1, . . . , uN are in W 1,p(Ω), then K is also in W 1,p(Ω) and (1) holds—

provided we make the additional assumption, in case |Ω| = ∞, thatG(0) = 0.

PROOF. It suffices to prove that K ∈ W 1,p(O) and verify formula (1) forany open set O with the property that O ⊂ C ⊂ Ω, with C compact.

By Theorem 6.15 we can find a sequence of functions φm = (φm1 , . . . , φmN )

in (C∞(O))N such that (with an obvious abuse of notation)

‖φm − u‖W 1,p(O) → 0 (2)

as m → ∞. By passing to a subsequence we may assume that φm → upointwise a.e. and ∂

∂xiφm → ∂

∂xiu pointwise a.e. for all i = 1, . . . , n. Set

Km(x) = G(φm(x)). Since

maxi

∣∣∣∣∂G∂si∣∣∣∣ ≤M,

a simple application of the fundamental theorem of calculus and Holder’sinequality in RN shows that for s, t ∈ RN

|G(s)−G(t)| ≤MN1/p′

(N∑i=1

|si − ti|p)1/p

. (3)

Here 1/p+ 1/p′ = 1. Since O ⊂ C and G is bounded, K is in Lploc(O).Next, for ψ ∈ D(Ω)∫

Ω

∂ψ

∂xkKm dx = −

∫Ωψ

∂

∂xkKm dx

= −N∑l=1

∫Ωψ∂G

∂sl(φm)

∂

∂xkφml dx.

(4)

In (4) the ordinary chain rule for C1-functions has been used. Using (3)we find that

|K(x)−Km(x)| ≤MN1/p′

(N∑i=1

|ui(x)− φmi (x)|p)1/p

,

Section 6.16 145

which implies that Km → K in Lp(O), and therefore the left side of (4)tends to

∫Ω∂ψ(x)∂xk

K(x) dx. Each term on the right side can be written as

∫Oψ∂G

∂sl(φm)

∂

∂xkul dx+

∫Oψ∂G

∂sl(φm)

(∂

∂xkφml −

∂

∂xkul

)dx. (5)

The first term tends to

∫Oψ∂G

∂sl(u)

∂ul∂xk

dx

by dominated convergence and the second tends to zero since ∂G∂sl

is uniformly

bounded and ∂φml∂xk− ∂ul∂xk→ 0 in Lp(O). Clearly ∂G

∂sl(u) ∂ul∂xk

, which is a boundedfunction times an Lp(O)-function, is itself in Lp(O).

To verify the second statement about W 1,p(Ω), note that ∂G/∂sk isbounded for all k = 1, 2, . . . , N and, since ∇uk ∈ Lp(Ω), it follows from (1)that ∇K ∈ Lp(Ω) also. The only thing to check is that K itself is in Lp(Ω).It follows from (3) that

|K(x)|p ≤ A+B

N∑k=1

|uk(x)|p, (6)

where A and B are some constants. If |Ω| <∞, (6) implies that K ∈ Lp(Ω).If |Ω| =∞ we have to use the assumption G(0) = 0, which implies that wecan take A = 0 in (6). Again, K ∈ Lp(Ω).

146 Distributions

6.17 THEOREM (Derivative of the absolute value)

Let f be in W 1,p(Ω). Then the absolute value of f , denoted by |f | and definedby |f |(x) = |f(x)|, is in W 1,p(Ω) with ∇|f | being the function

(∇|f |)(x) =

1|f |(x)(R(x)∇R(x) + I(x)∇I(x)) if f(x) 6= 0,

0 if f(x) = 0;(1)

here R(x) and I(x) denote the real and imaginary parts of f . In particular,if f is real-valued,

(∇|f |)(x) =

∇f(x) if f(x) > 0,

−∇f(x) if f(x) < 0,

0 if f(x) = 0.(2)

Thus∣∣∇|f |∣∣ ≤ |∇f | a.e. if f is complex-valued and

∣∣∇|f |∣∣ = |∇f | a.e. if fis real-valued.

PROOF. We follow [Gilbarg–Trudinger]. That |f | is in Lp(Ω) follows fromthe definition of ‖f‖p. Further, since∣∣∣∣ 1

|f |(R∇R+ I∇I)

∣∣∣∣2 ≤ (∇R)2 + (∇I)2 (3)

pointwise, ∇|f | is also in Lp(Ω) once the claimed equality (1) is proved.Consider the function

Gε(s1, s2) =√ε2 + s2

1 + s22 − ε . (4)

Obviously Gε(0, 0) = 0 and∣∣∣∣∂Gε∂si

∣∣∣∣ =

∣∣∣∣∣ si√ε2 + s2

1 + s22

∣∣∣∣∣ ≤ 1 . (5)

Hence, by 6.16, the function Kε(x) = Gε(R(x), I(x)) is in W 1,p(Ω) and forall φ in D(Ω)∫

Ω∇φ(x)Kε(x) dx = −

∫Ωφ(x)∇Kε(x) dx

= −∫

Ωφ(x)

R(x)∇R(x) + I(x)∇I(x)√ε2 + |f(x)|2

dx.(6)

Sections 6.17–6.18 147

Since Kε(x) ≤ |f(x)| and∣∣∣∣∣R(x)∇R(x) + I(x)∇I(x)√ε2 + |f(x)|2

∣∣∣∣∣ ≤ |∇f(x)|2 ,

and since the two functions (4) and (5) converge pointwise to the claimedexpressions as ε→ 0, the result follows by dominated convergence.

6.18 COROLLARY (Min and Max of W 1,p-functions arein W 1,p)

Let f and g be two real-valued functions in W 1,p(Ω). Then the minimum of(f(x), g(x)) and the maximum of (f(x), g(x)) are functions in W 1,p(Ω) andthe gradients are given by

∇max(f(x), g(x)) =

∇f(x) when f(x) > g(x),

∇g(x) when f(x) < g(x),

∇f(x) = ∇g(x) when f(x) = g(x),(1)

∇min(f(x), g(x)) =

∇g(x) when f(x) > g(x),

∇f(x) when f(x) < g(x),

∇f(x) = ∇g(x) when f(x) = g(x).(2)

PROOF. That these two functions are in W 1,p(Ω) follows from the formulas

min(f(x), g(x)) =12[(f(x) + g(x))− |f(x)− g(x)|

]and

max(f(x), g(x)) =12[(f(x) + g(x)) + |f(x)− g(x)|

].

The formulas (1) and (2) follow immediately from Theorem 6.17 in thecases where f(x) > g(x) or f(x) < g(x). To understand the case f(x) = g(x)consider

h(x) = (f(x)− g(x))+ =12|f(x)− g(x)|+ (f(x)− g(x))

.

148 Distributions

Obviously |h|(x) = h(x), and hence by 6.17

∇h(x) = ∇|h|(x) = 0 when f(x) ≤ g(x).

But again by 6.17 ∇h(x) = 12(∇(f − g))(x), when f(x) = g(x) and hence

(∇f)(x) = (∇g)(x) when f(x) = g(x),

which yields (1) and (2) in the case f(x) = g(x).

It is an easy exercise to extend the above result to truncations ofW 1,p(Ω)-functions defined by

f<α(x) = min(f(x), α).

The gradient is then given by

(∇f<α)(x) = ∇f(x) if f(x) < α,

0 otherwise.Analogously, define

f>α(x) = max(f(x), α).

Then

(∇f>α)(x) = ∇f(x) if f(x) > α,

0 otherwise.

Note that when Ω is unbounded f<α ∈ W 1,p(Ω) only if α ≥ 0, and f>α ∈W 1,p(Ω) only if α ≤ 0.

The foregoing implies that if u ∈W 1,1loc (Ω), if α ∈ R and if u(x) = α on a

set of positive measure in Rn, then (∇u)(x) = 0 for almost every x in this set.This can be derived easily from 6.18. The following theorem, to be found in[Almgren–Lieb], generalizes this fact by replacing the single point α ∈ R bya Borel set A of zero measure. Such sets need not be ‘small’, e.g. α couldbe all the rational numbers, and hence α could be dense in R. Note thatif f is a Borel measurable function, then f−1(A) := x ∈ Rn : f(x) ∈ Ais a Borel set, and hence is measurable. This follows from the statement inSect. 1.5 and Exercise 1.3 that x 7→ χA(f(x)) is measurable.

6.19 THEOREM (Gradients vanish on the inverse of smallsets)

Let A ⊂ R be a Borel set with zero Lebesgue measure and let f : Ω → R bein W 1,1

loc (Ω). Let

B = f−1(A) := x ∈ Ω : f(x) ∈ A ⊂ Ω.

Sections 6.18–6.19 149

Then ∇f(x) = 0 for almost every x ∈ B.

PROOF. Our goal will be to establish the formula∫Ωφ(x)χO(f(x))∇f(x) dx = −

∫Ω∇φ(x)GO(f(x)) dx (1)

for each open set O ⊂ R. Here χO is the characteristic function of O andGO(t) =

∫ t−∞

χO(s) ds. Equation (1) is just like the chain rule except thatGO is not in C1(R). Assuming (1) for the moment, we can conclude theproof of our theorem as follows. By the outer regularity of Borel measurewe can find a decreasing sequence O1 ⊃ O2 ⊃ O3 ⊃ · · · of open sets suchthat A ⊂ Oj for each j and L1(Oj)→ 0 as j →∞. Thus A ⊂ C :=

⋂∞j=1Oj

(but it could happen that A is strictly smaller than C) and L1(C) = 0. Bydefinition, Gj(t) := GOj (t) satisfies Gj(t) ≤ L1(Oj), and thus Gj(t) goesuniformly to zero as j → ∞. The right side of (1) (with O replaced byOj) therefore tends to zero as j → ∞. On the other hand, χj := χOj isbounded by 1, and χj(f(x))→ χf−1(C)(x) for every x ∈ Rn. By dominatedconvergence, the left side of (1) converges to

∫Ω φχf−1(C)∇f , and this equals

zero for every φ ∈ D(Ω). By the uniqueness of distributions, the functionχf−1(C)(x)∇f(x) = 0 for almost every x, which is what we wished to prove.

It remains to prove (1). Observe that every open set O ⊂ R is the unionof countably many disjoint open intervals. (Why?) Thus O =

⋃∞j=1 Uj

with Uj = (aj , bj). Since f is a function, f−1(Uj) is disjoint from f−1(Uk)when j 6= k. By the countable additivity of measure, therefore, it suffices toprove (1) when O is just one interval (a, b). We can easily find a sequenceχ1, χ2, χ3, . . . of continuous functions such that χj(t) → χO(t) for everyt ∈ R and 0 ≤ χj(t) ≤ 1 for every t ∈ R. The everywhere (not just almosteverywhere) convergence is crucial and we leave the simple construction ofχj to the reader. Then, with Gj =

∫ t−∞ χ

j , equation (1) is obtained bytaking the limit j → ∞ on both sides and using dominated convergence.The easy verification is again left to the reader.

• An amusing—and useful—exercise in the computation of distributionalderivatives is the computation of Green’s functions. Let y ∈ Rn, n ≥ 1, andlet Gy : Rn → R be defined by

Gy(x) = −|S1|−1 ln(|x− y|), n = 2,

Gy(x) = [(n− 2)|Sn−1|]−1|x− y|2−n, n 6= 2,(2)

150 Distributions

where |Sn−1| is the area of the unit sphere Sn−1 ⊂ Rn.

|S0| = 2, |S1| = 2π, |S2| = 4π, |Sn−1| = 2πn/2/Γ(n/2).

These are the Green’s functions for Poisson’s equation in Rn. Recallthat the Laplacian, ∆, is defined by ∆ :=

∑ni=1∂

2/∂x2i . The notation

notwithstanding, Gy(x) is actually symmetric, i.e., Gy(x) = Gx(y).

6.20 THEOREM (Distributional Laplacian of Green’sfunctions)

In the sense of distributions,

−∆Gy = δy, (1)

where δy is Dirac’s delta measure at y (often written as δ(x− y)).

PROOF. To prove (1) we can take y = 0. We require

I :=∫

Rn(∆φ)G0 = −φ(0)

when φ ∈ C∞c (Rn). Since G0 ∈ L1loc(R

n), it suffices to show that

−φ(0) = limr→0

I(r),

whereI(r) :=

∫|x|>r

∆φ(x)G0(x) dx.

We can also restrict the integration to |x| < R for some R since φ hascompact support. However, when |x| > 0, G0 is infinitely differentiableand ∆G0 = 0. We can evaluate I(r) by partial integration, and note thatboundary integrals at |x| = R vanish. Thus, denoting the set x : r ≤ |x| ≤R by A,

I(r) =∫A

(∆φ)G0 = −∫A∇φ · ∇G0 +

∫|x|=r

G0∇φ · ν

= −∫|x|=r

φ∇G0 · ν +∫|x|=r

G0∇φ · ν,(2)

Sections 6.20–6.21 151

where ν is the unit outward normal to A. On the sphere |x| = r, we have∇G0 · ν = |Sn−1|−1r−n+1, and therefore the penultimate integral in (2) is

−∫|x|=r

φ∇G0 · ν = −|Sn−1|−1

∫Sn−1

φ(rω) dω,

which converges to −φ(0) as r → 0, since φ is continuous. The last integralin (2) converges to zero as r → 0 since ∇φ · ν is bounded by some constant,while

∣∣|x|n−1G0(x)∣∣ < |x|1/2 for small |x|. Thus, (1) has been verified.

6.21 THEOREM (Solution of Poisson’s equation)

Let f ∈ L1loc(R

n), n ≥ 1. Assume that for almost every x the functiony 7→ Gy(x)f(y) is summable (here, Gy is Green’s function given before 6.20)and define the function u : Rn → C by

u(x) =∫

RnGy(x)f(y) dy. (1)

Then u satisfies:u ∈ L1

loc(Rn), (2)

−∆u = f in D′(Rn). (3)

Moreover, the function u has a distributional derivative that is a func-tion; it is given, for almost every x, by

∂iu(x) =∫

Rn(∂Gy/∂xi)(x)f(y) dy. (4)

When n = 3, for example, the partial derivative is

(∂Gy/∂xi)(x) = − 14π|x− y|−3(xi − yi). (5)

REMARKS. (1) A trivial consequence of the theorem is that Rn can bereplaced by any open set Ω ⊂ Rn. Suppose f ∈ L1

loc(Ω) and y 7→ Gy(x)f(y)is summable over Ω for almost every x ∈ Ω. Then (see Exercises)

u(x) :=∫

ΩGy(x)f(y) dy (6)

152 Distributions

is in L1loc(Ω) and satisfies

−∆u = f in D′(Ω). (7)

(2) The summability condition in Theorem 6.21 is equivalent to thecondition that the function wn(y)f(y) is summable. Here

wn(y) =

(1 + |y|)2−n, n ≥ 3,

ln(1 + |y|), n = 2,

|y|, n = 1.(8)

The easy proof of this equivalence is left to the reader as an exercise. (Itproceeds by decomposing the integral in (1) into a ball containing x, andits complement in Rn. The contribution from the ball is easily shown to befinite for almost every x in the ball, by Fubini’s theorem.)

(3) It is also obvious that any solution to equation (7) has the form u+h,where u is defined by (6) and where ∆h = 0. Hence h is a harmonic functionon Ω (see Sect. 9.3). Since harmonic functions are infinitely differentiable(Theorem 9.4), it follows that every solution to (7) is in Ck(Ω) if and onlyif u ∈ Ck(Ω).

PROOF. To prove (2) it suffices to prove that IB :=∫B |u| < ∞ for each

ball B ⊂ Rn. Since |u(x)| ≤∫

Rn |Gy(x)f(y)| dy, we can use Fubini’s theoremto conclude that

I ≤∫

RnHB(y)|f(y)|dy with HB(y) =

∫B|Gy(x)| dx.

It is easy to verify (by using Newton’s Theorem 9.7, for example) that ifB has center x0 and radius R, then HB(y) = |B||Gy(x0)| for |y − x0| ≥ Rfor n 6= 2 and HB(y) = |B||Gy(x0)| when |y − x0| ≥ R + 1 when n = 2 (inorder to keep the logarithm positive). Moreover, HB(y) is bounded when|y − x0| < R. From this observation it follows easily that IB < ∞. (Note:Fubini’s theorem allows us to conclude both that u is a measurable functionand that this function is in L1

loc(Rn).)

To verify (3) we have to show that

−∫u∆φ =

∫fφ (9)

Sections 6.21–6.22 153

for each φ ∈ C∞c (Rn). We can insert (1) into the left side of (9) and useFubini’s theorem to evaluate the double integral. But Theorem 6.20 statesthat −

∫Rn ∆φ(x)Gy(x) dx = φ(y), and this proves (9).

To prove (4) we begin by verifying that the integral in (4) (call it Vi(x))is well defined for almost every x ∈ Rn. To see this note that |(∂Gy/∂xi)(x)|is bounded above by c|x−y|1−n, which is in L1

loc(Rn). The finiteness of Vi(x)

follows as in Remark (2) above. Next, we have to show that∫Rn∂iφ(x)u(x) dx = −

∫Rnφ(x)Vi(x) dx (10)

for all φ ∈ C∞c (Rn). Since the function (x, y)→ (∂iφ)(x)Gy(x)f(y) is Rn×Rn

summable, we can use Fubini’s theorem to equate the left side of (10) to∫Rn

∫Rn

(∂iφ)(x)Gy(x) dxf(y) dy. (11)

A limiting argument, combined with integration by parts, as in 6.20(2),shows that the inner integral in (11) is

−∫

Rnφ(x)∂Gy/∂xi)(x) dx

for every y ∈ Rn. Applying Fubini’s theorem again, we arrive at (4).

• The next theorem may seem rather specialized, but it is useful inconnection with the potential theory in Chapter 9. Its proof (which doesnot use Lebesgue measure) is an important exercise in measure theory. Weshall leave a few small holes in our proof that we ask the reader to fill in asfurther exercises. Among other things, this theorem yields a construction ofLebesgue measure (Exercise 5).

6.22 THEOREM (Positive distributions are measures)

Let Ω ⊂ Rn be open and let T ∈ D′(Ω) be a positive distribution (meaningthat T (φ) ≥ 0 for every φ ∈ D(Ω) such that φ(x) ≥ 0 for all x). We denotethis fact by T ≥ 0.

Our assertion is that there is then a unique, positive, regular Borel mea-sure µ on Ω such that µ(K) <∞ for all compact K ⊂ Ω and such that forall φ ∈ D(Ω)

T (φ) =∫

Ωφ(x)µ( dx). (1)

154 Distributions

Conversely, any positive Borel measure with µ(K) < ∞ for all compactK ⊂ Ω defines a positive distribution via (1).

REMARK. The representation (1) shows that a positive distribution canbe extended from C∞c (Ω)-functions to a much larger class, namely the Borelmeasurable functions with compact support in Ω. This class is even largerthan the continuous functions of compact support, Cc(Ω).

The theorem amounts to an extension, from Cc(Ω)-functions to C∞c (Ω)-functions, of what is known as the Riesz–Markov representation theo-rem. See [Rudin, 1987].

PROOF. In the following, all sets are understood to be subsets of Ω. Fora given open set O denote by C(O) the set of all functions φ ∈ C∞c (Ω) with0 ≤ φ(x) ≤ 1 and suppφ ⊂ O. Clearly, this set is not empty. (Why?) Nextwe define for any open set O

µ(O) = supT (φ) : φ ∈ C(O). (2)

For the empty set ∅ we set µ(∅) = 0. The nonnegative set function µ hasthe following properties:

(i) µ(O1) ≤ µ(O2) if O1 ⊂ O2,

(ii) µ(O1 ∪ O2) ≤ µ(O1) + µ(O2),

(iii) µ (⋃∞i=1Oi) ≤

∑∞i=1 µ(Oi) for every countable family of open sets Oi.

Property (i) is evident. The second property follows from the followingfact (F) whose proof we leave as an exercise for the reader:

(F) For any compact set K and open sets O1,O2 such that K ⊂ O1∪O2

there exist functions φ1 and φ2, both C∞ in a neighborhood O of K, suchthat φ1(x) + φ2(x) = 1 for x ∈ K and φ · φ1 ∈ C∞c (O1), φ · φ2 ∈ C∞c (O2) forany function φ ∈ C∞c (O).

Thus, any φ ∈ C(O1∪O2) can be written as φ1 +φ2 with φ1 ∈ C(O1) andφ2 ∈ C(O2). Hence T (φ) = T (φ1) + T (φ2) ≤ µ(O1) + µ(O2) and property(ii) follows. By induction we find that

µ

(m⋃i=1

Oi

)≤

m∑i=1

µ(Oi).

To see property (iii) pick φ ∈ C (⋃∞i=1Oi). Since φ has compact support, we

have that φ ∈ C(⋃

i∈I Oi)

where I is a finite subset of the natural numbers.

Section 6.22 155

Hence, by the above,

T (φ) ≤ µ

(⋃i∈IOi

)≤∑i∈I

µ(Oi) ≤∞∑i=1

µ(Oi),

which yields property (iii).For every set A define

µ(A) = infµ(O) : O open, A ⊂ O. (3)

The reader should not be confused by this definition. We have defineda set function, µ, that measures all subsets of Ω, but only for a specialsubcollection will this function be a measure, i.e., be countably additive.This set function µ will now be shown to have the properties of an outermeasure, as defined in Theorem 1.15 (constructing a measure from an outermeasure), i.e.,

(a) µ(∅) = 0 ,(b) µ(A) ≤ µ(B) if A ⊂ B,(c) µ (

⋃∞i=1Ai) ≤

∑∞i=1 µ(Ai) for every countable collection of sets A1,

A2, . . . .The first two properties are evident. To prove (c) pick open sets O1,

O2, . . . with Ai ⊂ Oi and µ(Oi) ≤ µ(Ai) + 2−iε for i = 1, 2, . . . . Now

µ

( ∞⋃i=1

Ai

)≤ µ

( ∞⋃i=1

Oi

)≤∞∑i=1

µ(Oi)

by (b) and (iii), and hence

µ

( ∞⋃i=1

Ai

)≤∞∑i=1

µ(Ai) + ε,

which yields (c) since ε is arbitrary. By Theorem 1.15 the sets A such thatµ(E) = µ(E ∩ A) + µ(E ∩ Ac) for every set E form a sigma-algebra, Σ, onwhich µ is countably additive.

Next we have to show that all open sets are measurable, i.e., we have toshow that for any set E and any open set O

µ(E) ≥ µ(E ∩ O) + µ(E ∩ Oc). (4)

156 Distributions

The reverse inequality is obvious. First we prove (4) in the case where E isitself open; call it V .

Pick any function φ ∈ C(V ∩O) such that T (φ) ≥ µ(V ∩O)− ε/2. SinceK := suppφ is compact, its complement, U , is open and contains Oc. Pickψ ∈ C(U ∩ V ) such that T (ψ) ≥ µ(U ∩ V )− ε/2. Certainly

µ(V ) ≥ T (φ) + T (ψ) ≥ µ(V ∩ O) + µ(V ∩ U)− ε≥ µ(V ∩ O) + µ(V ∩ Oc)− ε

and since ε is arbitrary this proves (4) in the case where E is an open set.If E is arbitrary we have for any open set V with E ⊂ V that E ∩ O ⊂V ∩ O, E ∩ Oc ⊂ V ∩ Oc, and hence µ(V ) ≥ µ(E ∩ O) + µ(E ∩ Oc). Thisproves (4). Thus we have shown that the sigma-algebra Σ contains all opensets and hence contains the Borel sigma-algebra. Hence the measure µ is aBorel measure.

By construction, this measure is outer regular (see (3) above). We shownext that it is inner regular, i.e., for any measurable set A

µ(A) = supµ(K) : K ⊂ A, K compact. (5)

First we have to establish that compact sets have finite measure. Weclaim that for K compact

µ(K) = infT (ψ) : ψ ∈ C∞c (Ω), ψ(x) = 1 for x ∈ K. (6)

The set on the right side is not empty. Indeed for K compact and K ⊂ Oopen there exists a C∞c -function ψ such that suppψ ⊂ O and ψ := 1 onK. (Such a ψ was constructed in Exercise 1.15 without the aid of Lebesguemeasure.)

Now (6) follows from the following fact which we ask the reader to proveas an exercise: µ(K) ≤ T (ψ) for any ψ ∈ C∞c (Ω) with ψ ≡ 1 on K. Giventhis fact, choose ε > 0 and choose O open such µ(K) ≥ µ(O) − ε. Alsopick ψ ∈ C∞c (Ω) with suppψ ⊂ O and ψ ≡ 1 on K. Then µ(K) ≤ T (ψ) ≤µ(O) ≤ µ(K) + ε. This proves (6).

It is easy to see that for ε > 0 and every measurable set A with µ(A) <∞there exists an open set O with A ⊂ O and µ(O ∼ A) < ε. Using thefact that Ω is a countable union of closed balls, the above holds for anymeasurable set, i.e., even if A does not have finite measure. We ask thereader to prove this.

Section 6.22 157

For ε > 0 and a measurable set A we can find O with Ac ⊂ O such thatµ(O ∼ (Ac)) < ε. But

O ∼ (Ac) = O ∩A = A ∼ (Oc)

and Oc is closed. Thus for any measurable set A and ε > 0 one can find aclosed set C such that C ⊂ A and µ(A ∼ C) < ε. Since any closed set in Rn

is a countable union of compact sets, the inner regularity is proven.Next we prove the representation theorem. The integral

∫Ω φ(x)µ( dx)

defines a distribution R on D(Ω). Our aim is to show that T (φ) = R(φ) forall φ ∈ C∞c (Ω). Because φ = φ1 − φ2 with φ1,2 ≥ 0 and φ1,2 ∈ C∞c (Ω) (asExercise 1.15 shows), it suffices to prove this with the additional restrictionthat φ ≥ 0. As usual, if φ ≥ 0,

R(φ) =∫ ∞

0m(a) da = lim

n→∞

1n

∑j≥1

m(j/n) (7)

where m(a) = µ(x : φ(x) > a). The integral in (7) is a Riemann integral;it always makes sense for nonnegative monotone functions (like m) and italways equals the rightmost expression in (7). For each n, the sum in (7)has only finitely many terms, since φ is bounded.

For n fixed we define compact sets Kj , j = 0, 1, 2, . . . , by setting K0 =suppφ and Kj = x : φ(x) ≥ j/n for j ≥ 1. Similarly, denote by Oj theopen sets x : φ(x) > j/n for j = 1, 2, . . . . Let χj and χj denote thecharacteristic functions of Kj and Oj . Then, as is easily seen,

1n

∑j≥1

χj < φ <1n

∑j≥0

χj .

Since φ has compact support, all the sets have finite measure by (6).For ε > 0 and j = 0, 1, . . . pick Uj open such that Kj ⊂ Uj and µ(Uj) ≤

µ(Kj)+ε. Next pick ψj ∈ C∞c (Rn) such that ψj ≡ 1 onKj and suppψj ⊂ Uj .We have shown above that such a function exists. Obviously φ ≤ 1

n

∑j≥0 ψj

and hence

T (φ) ≤ 1n

∑j≥0

T (ψj) ≤1n

∑j≥0

µ(Uj) ≤1n

∑j≥0

µ(Kj) + ε.

By the inner regularity we can find, for every open set Oj of finite measure,a compact set Cj ⊂ Oj such that µ(Cj) ≥ µ(Oj) − ε and, in the same

158 Distributions

fashion as above, conclude that T (φ) ≥ 1n

∑j≥1 µ(Oj) − ε. Since ε > 0 is

arbitrary,1n

∑j≥1

µ(Oj) ≤ T (φ) ≤ 1n

∑j≥0

µ(Kj).

By noting that Kj ⊂ Oj−1 for j ≥ 1, we have

1n

∑j≥1

m(j/n) ≤ T (φ) ≤ 1n

∑j≥1

m(j/n) +2nµ(K0),

which proves the representation theorem. The uniqueness part is left to thereader.

Exercises forChapter 6

1. Fill in the details in the last paragraph of the proof of Theorem 6.19, i.e.,(a) Construct the sequence χj that converges everywhere to χ(interval);(b) Complete the dominated convergence argument.

2. Verify the summability condition in Remark (2), equation (8) of Theorem6.21.

3. Prove fact (F) in Theorem 6.22.

4. Prove that for K compact, µ(K)(defined in 6.22(3)

)satisifies µ(K) ≤

T (ψ) for ψ ∈ C∞c (Ω) and ψ ≡ 1 on K.

5. Notice that the proof of Theorem 6.22 (and its antecedents) used only theRiemann integral and not the Lebesgue integral. Use the conclusion ofTheorem 6.22 to prove the existence of Lebesgue measure. See Sect. 1.2.

6. Prove that the distributional derivative of a monotone nondecreasingfunction on R is a Borel measure.

7. Let NT be the null-space of a distribution, T . Show that there is afunction φ0 ∈ D so that every element φ ∈ D can be written as φ =

Section 6.22–Exercises 159

λφ0 + ψ with ψ ∈ NT and λ ∈ C. One says that the null-space NT has‘codimension one’.

8. Show that a function f is in W 1,∞(Ω) if and only if f = g a.e. whereg is a function that is Lipschitz continuous on Ω, i.e., there exists aconstant C such that

|g(x)− g(y)| ≤ C|x− y| for all x, y ∈ Ω.

9. Verify Remark (1) in Theorem 6.21 that in this theorem Rn can be re-placed by any open subset of Rn.

Documents

Distributions - gatech.edu