Noun Phrases in L2 French and Haitian: Clues on the Origin - Unish
32
Variational principle L´aszl´oErd˝os Dec 21, 2010 1 Introduction The variational principle is a fundamental method to find solutions to partial differential equa- tions (PDE). It relies on the idea that many PDE’s originate from a basic physical principle that the state of the system is determined by minimizing its energy. Not every PDE is suitable for variational solutions, but many PDE’s with physical origin are. The primary example is the solution to the eigenvalue equation (−Δ+ V )ψ = Eψ of the Schr¨odinger operator H = −Δ+ V , but many ideas presented here are applicable in a broader context. The ground state energy of the system was defined as E 0 = inf {E (ψ): ‖ψ‖ 2 =1}, E (ψ)= |∇ψ| 2 + V |ψ| 2 Modulo (nontrivial) technicalities about operator domains, we have E (ψ)= 〈ψ,Hψ〉, thus E 0 = inf {〈ψ,Hψ〉 : ‖ψ‖ =1} = inf ψ=0 〈ψ,Hψ〉 ‖ψ‖ 2 This is exactly the same definition as one has in finite dimensions for the lowest eigenvalue of an N by N hermitian matrix H = H ∗ . In that case, we have the spectral theorem (unitary diagonalization), i.e. H = N j =1 λ j v j v ∗ j 1
Noun Phrases in L2 French and Haitian: Clues on the Origin - Unish
Untitled1 Introduction
The variational principle is a fundamental method to find solutions
to partial differential equa- tions (PDE). It relies on the idea
that many PDE’s originate from a basic physical principle that the
state of the system is determined by minimizing its energy. Not
every PDE is suitable for variational solutions, but many PDE’s
with physical origin are.
The primary example is the solution to the eigenvalue
equation
(− + V )ψ = Eψ
of the Schrodinger operator H = − + V , but many ideas presented
here are applicable in a broader context.
The ground state energy of the system was defined as
E0 = inf{E(ψ) : ψ2 = 1}, E(ψ) =
∫ |∇ψ|2 +
Modulo (nontrivial) technicalities about operator domains, we have
E(ψ) = ψ,Hψ, thus
E0 = inf{ψ,Hψ : ψ = 1} = inf ψ 6=0
ψ,Hψ ψ2
This is exactly the same definition as one has in finite dimensions
for the lowest eigenvalue of an N by N hermitian matrix H = H∗. In
that case, we have the spectral theorem (unitary diagonalization),
i.e.
H = N∑
j=1
λjvjv ∗ j
1
with λ1 ≤ λ2 ≤ . . . ≤ λN being the eigenvalues (counted with
multiplicity) and vj ∈ CN
being the normalized eigenvectors. Then for any x ∈ CN
x, Hx =
|vj,x|2 = λ1x2
(in the last step using that vj is an orthonormal basis).
Thus
inf x 6=0
x, Hx x2
≥ λ1
and by setting x = v1 it is easy to see that this lower bound can
be achieved, thus
inf x 6=0
x, Hx x2
= λ1
This is the variational characterization of the lowest eigenvalue.
In finite dimensions the infimum is always achieved (big difference
between finite and infinite dimensions!) and it defines the
corresponding eigenvector. It may happen that the minimizer is not
unique (even after taking into account the trivial change v1 →
(const.)v1), since the eigenspace of λ1 could be degenerate.
However, as long as the minimum is achieved, we can use variational
principle to find eigenvectors as well.
The key idea is that to find the solution of the eigenvalue
equation
Hv1 = λ1v1
inf x 6=0
x, Hx x2
The latter is mathematically and numerically easier to handle (many
numerical methods for eigenvalues rely on this). It also gives rise
to estimates on λ1. Upper bounds on λ1 are easy; one just needs a
good guess for x, since
λ1 ≤ x, Hx
holds for any normalized x. Lower bounds are typically much harder,
since somehow one has to test all possible vectors x.
2
What about higher eigenvalues? They are also given by a variational
principle. For example, for the second lowest eigenvalue
λ2 = inf{x, Hx : x = 1, x,v1 = 0}
i.e. it minimizes the quadratic form on the subspace orthogonal to
the first eigenvector. It may happen that the lowest eigenvalue is
degenerate, in this case λ2 = λ1 (we will use the convention that
we list eigenvalues with their multiplicities).
Exercise 1.1 Using the spectral decomposition of H, prove that the
k-th lowest eigenvalue is given by
λk = inf{x, Hx : x = 1, x,vj = 0 ∀j ≤ k − 1} and that the infimum
is achieved for x = vk.
Note that the variational principle can be used only for hermitian
(or, in real case, sym- metric) matrices.
2 Domination of the kinetic energy
Let us summarize the consequences of the Sobolev inequality for the
ground state energy problem of − + V , i.e. for
E(ψ) =
Introduce the notation
Lp + Lq = {f(x) : ∃f1 ∈ Lp, f2 ∈ Lq, f = f1 + f2}
i.e. this is the set of functions that can be written as a sum of
an Lp and an Lq function. The decomposition is not unique.
Theorem 2.1 Let the negative part of the potential satisfy
V− ∈ Ld/2 + L∞, if d ≥ 3
V− ∈ L1+ε + L∞, if d = 2
V− ∈ L1 + L∞, if d = 1
3
for some ε > 0. Then there are finite constants Cand D,
depending only on V such that
E(ψ) ≥ −Cψ2
(i.e. E0 > −∞) and ∫ |∇ψ|2 ≤ 2E(ψ) +Dψ2 (2.2)
i.e. the kinetic energy is dominated by the total energy.
Proof. For simplicity we work in d ≥ 3 dimensions, it is an easy
exercise to modify the proof for d = 1, 2. We have
E(ψ) ≥ ∫
|∇ψ|2 − ∫ V−|ψ|2 (2.3)
Write V− = V1 + V2, with V1 ∈ L∞, V2 ∈ Ld/2. Note that one can
choose the decomposition such that V2d/2 is arbitrary small. To see
this, write
V2 = V2 · 1{x : |V2(x)| ≤M} + V2 · 1{x : |V2(x)| > M}
for any M > 0 (here 1{A} is the characteristic function of the
set A). The first term is in L∞, so one can add it to the V1 part
for any fixed M . The second term converges to zero in Ld/2 by
dominated convergence, therefore by choosing M sufficiently large,
the Ld/2-norm of the second term can be made arbitrarily
small.
Now by Holder and Sobolev inequalities
∫ V−|ψ|2 ≤ V1∞ψ2 +
∫ |V2||ψ|2
d−2
≤ V1∞ψ2 + 1
2
(2.4)
where C = Cd is the Sobolev constant, and where we used that V2d/2
is sufficiently small (smaller than 1/2C). Thus, from (2.3) we
have
E(ψ) ≥ 1
which proves both statements in the theorem.
4
3 Minimizing sequence
Now we would like to know if the infimum in (2.1) is achieved, i.e.
if there is a minimizer (in this case, this will be called ground
state). A more refined question: what are the properties of the
ground state? How regular is it? For example, the ground state of
the Hydrogen atom was e−Z|x|/2, i.e. it is Lipschitz continuous,
but it fails to be once continuously differentiable. This
singularity comes from the singularity of the Coulomb potential,
Z
|x| , but notice that the
singularity in the eigenfunction is much less severe that the
singularity in the potential (which is even not bounded, let alone
continuous). This “regularizing effect” is a general feature of
Schrodinger type equations (actually for a much bigger class of
so-called elliptic equations) and it is due to the presence of the
Laplacian (as the highest order derivative term in the equation).
The phenomenon goes under the name of elliptic regularity. We will
not have time to discuss it, but it is a very important feature of
PDE’s.
How to find a minimizer to (2.1)? Simply take a sequence ψn of
normalized wavefunctions, ψn = 1, such that
E(ψn) → E0,
since E0 was an infimum, such a minimizing sequence always exists.
The natural guess is then to look for the limit of ψn, i.e. to ask
whether there is a ψ such that
ψn → ψ (3.5)
and check that the map ψ → E(ψ) is continuous, i.e. whether
E(ψn) → E(ψ) (3.6)
Both of these steps are problematic in infinite dimensional spaces
(like H1). First, the limit may not exist. Second, the functional
E(ψ) (as a map from, say, H1 to R) may not be continuous.
Recall an analogous statement from real analysis, namely, that if f
: K → R is defined on a compact set K ⊂ Rd and it is continuous,
then
inf K f(x)
is attained and it can be found by the following procedure. Choose
a minimizing sequence, xn ∈ K, such that
f(xn) → inf K f, (3.7)
since K is compact, by Bolzano-Weierstrass one can select a
convergent subsequence, xnk → x,
whose limit is also in K, and then by continuity of f , we have
f(xnk ) → f(x), which, together
with (3.7) means that x is a minimizer.
5
The key ingredient here was compactness. Our functional E(ψ) is
defined on the whole H1, which is not compact. So the first
question is, how to modify the argument above for functions defined
on the whole Rd. We certainly need some extra condition, since in
general the infimum of a continuous function on the whole R is not
attained (e.g. f(x) = (1+x2)−1). We need some weak condition on the
behaviour of f at infinity. It is sufficient to assume that
lim inf |x|→∞
f(x) > inf f
or, with other words, that there exists a constant M > inf f ,
such that the level set
SM := {x : f(x) ≤M}
is a bounded subset of Rd. Now we can extend the argument above: we
choose a minimizing sequence so that f(xn) →
inf f . Since M > inf f , we can assume that xn ∈ SM . Since f
is continuous, the level set SM is closed and by assumption it is
bounded, thus SM compact. Then we can use the previous argument
with K = SM .
This argument used the finite dimensionality in an essential way,
namely that a bounded and closed set is compact. In other words,
that from a bounded sequence one can always choose a convergent
subsequence. This does not hold in infinite dimension. Here is an
example. Let ψn(x) = e2πinx in the Hilbert space L2[0, 1]. Clearly
{ψn} is a bounded sequence (it is bounded in any Lp[0, 1]) with
norm 1, but it has no convergent subsequence (HOMEWORK:
check!)
[Remark for mathematicians: whenever we say compactness, we really
mean the concept of sequential compactness. These two concepts
coincide in all spaces we discuss; there is a deviation for
topological spaces with uncountable basis of neighborhoods, i.e. in
spaces where the topology is not determined by convergence of
sequences.]
The situation will be remedied by changing the topology, i.e. by
redefining what we mean by convergence in (3.5). Since E(ψ) was
defined on H1, it was natural to work in the topology determined by
the H1-norm. Alternatively, one could even come to the idea to work
with the other natural norm available, i.e. the L2 norm. But none
of them is good, bounded sequences may have no convergent
subsequence.
We want to weaken the topology to give more chance to a subsequence
of the minimizing sequence to converge. Weaker topology means there
are less open sets, so it will be easier to converge. But one
cannot overdo it; in a weaker topology, there are less functions
that are continuous (recall, continuity of a function(al) from a
topological space X to R means that all level sets {x ∈ X : f(x)
< M} are open for any M ∈ R). So in a too strong topology (3.5)
may fail, in a too weak topology, (3.6) may fail. The goal is to
find an intermediate one, and this will be the weak
convergence.
6
Since the concept of weak convergence is discussed in sufficient
details in Lieb-Loss, Sec- tions 2.9, 2.10, 2.11, 2.12, 2.13 and
2.18, I will not write it up – read these sections. Although it can
be defined for general normed spaces, we will need it only for
separable Hilbert spaces (like H1) and for Lp spaces (1 ≤ p ≤ ∞).
Recall that a sequence ψj in a normed space X converges weakly to ψ
∈ X, in notation ψj ψ if (ψj) → (ψ) for any bounded linear
functional on X, i.e. for any ∈ X∗ (dual space).
In our main examples: the dual space of Lp is Lq for 1 ≤ p <∞
(with 1 = 1/p+1/q). The dual space of L∞ contains L1 but it is
considerably bigger. The dual space a Hilbert space is itself (with
the appropriate identification of vectors with linear functionals,
following Riesz theorem).
We will need the following facts
• Weak convergence is indeed weaker than the usual (norm or strong)
convergence
• Weak convergence still separates points, i.e. the limit is unique
(if it exists)
• The norm is lower semicontinuous, i.e. it may drop along the weak
limit but never increase:
ψj ψ =⇒ f ≤ lim inf fj (3.8)
• Weakly convergent sequences are bounded. Even more general, if
the sequence ψj sat- isfies that (ψj) is a bounded numerical
sequence for any bounded linear functional, then supj ψj <∞
[Uniform boundedness principle]
• Mazur theorem: If fj ∈ Lp converges weakly to f ∈ Lp, then there
is a convex combi- nation of fj that converges strongly, i.e. there
are coefficients, cjk ≥ 0, 1 ≤ k ≤ j with∑j
k=1 cjk = 1 such that
Fj :=
j∑
strongly.
• Banach-Alaoglu theorem: Let fj be a bounded sequence in Lp with 1
< p < ∞ or in a separable Hilbert space. Then fj has a weakly
convergent subsequence.
The only remark is that Theorem 2.18 is for Lp spaces, but it can
be immediately extended to separable Hilbert spaces, see the remark
at the end of Section 2.21.
7
4 Existence of the ground state
I will follow Theorem 11.5 of Lieb-Loss, but supplement with some
more details.
Theorem 4.1 Let the potential V : Rd → R satisfy the
conditions
V ∈ Ld/2 + L∞, if d ≥ 3
V ∈ L1+ε + L∞, if d = 2
V ∈ L1 + L∞, if d = 1
for some ε > 0 [Unlike in Theorem (2.1), we make these
assumptions not only on the negative part of V ]. Assume that V
vanishes at infinity in the sense that for any a > 0 we
have
|{x : |V (x)| > a}| <∞
Let E(ψ) = ∫ |∇ψ|2 +
∫ V |ψ|2 for any ψ ∈ H1 as before and let
E0 = inf{E(ψ) : ψ ∈ H1, ψ = 1}
be the ground state energy. By Theorem 2.1, E0 > −∞ and it is
easy to see [HOMEWORK!] that E0 ≤ 0. Assume that
E0 < 0 (4.9)
Then there exists ψ ∈ H1, ψ2 = 1 such that E(ψ) = E0, i.e. there is
a ground state. Moreover, the ground state satisfies the
Schrodinger equation
−ψ + V ψ = E0ψ (4.10)
in a weak sense, i.e. for any φ ∈ C∞ 0 we have
∫ ψ(−φ) +
E(ψn) → E0, ψn2 = 1.
By Theorem 2.1, we have ∫ |∇ψn|2 ≤ 2E(ψn) +Dψn2
2
8
sup n
ψnH1 <∞ (4.11)
Using that H1 is a separable Hilbert space, we can apply
Banach-Alaoglu theorem to se- lect a weakly convergent subsequence
ψnj
ψ. For notational simplicity, we reindex this subsequence: ψj :=
ψnj
. We make three claims that we will prove later:
∫ |∇ψ|2 ≤ lim inf
E(ψ) ≤ lim inf j→∞
E(ψj) = E0
but E0 was the infimum of all E(ψ) with ψ2 = 1, thus there must be
equality in the above inequality: E0 = E(ψ). Thus we have found a
ground state (note that the ground state may not be unique, the
constructed ψ may depend on the chosen subsequence).
Before we prove the above claims, we show that the ground state
satisfies (4.10). Pick any φ ∈ C∞
0 and set ψε = ψ + εφ. Define the Rayleigh-quotient:
R(ε) := E(ψε)
ψε2
Note that R(ε) is a rational function (ratio of two quadratic
polynomials in ε) that is finite at ε = 0, thus it is
differentiable in a small neighborhood of 0. Since R(0) = minR(ε),
we have that
0 = dR(ε)
∫ [ ∇ψ · ∇φ+ V ψφ
∫ ψ φ
Re
∫ ψ
∇ψ · ∇φ =
∫ ψ(−φ)
which is allowed if ψ ∈ H1 and ∇φ ∈ H1 which holds since φ is a
smooth, compactly supported function (see, e.g. Theorem 7.7 in
Lieb-Loss).
Replacing by i in the previous argument, we immediately obtain
that
Im
∫ ψ
] = 0
for any φ ∈ C∞ 0 , thus ψ satisfies (4.10) in a weak sense.
Proof of (4.12). This proof is almost the same as proving that the
norm can only drop along the weak limit. Recall the variational
characterization of the L2-norm:
f2 = sup{|f, g| : g ∈ L2, g2 = 1} = sup{|f, g| : g ∈ C∞ 0 , g2 =
1}
(the first identity holds in any Hilbert space, the second one uses
the fact that C∞ 0 is dense
in L2, THINK IT OVER!) Thus, by integration by parts,
∇ψ2 = sup{|∇ψ, g| : g ∈ C∞ 0 , g2 = 1}
= sup{|ψ,∇ · g| : g ∈ C∞ 0 , g2 = 1}
= sup {
}
}
}
∇ψj
(4.15)
10
[THINK OVER the last step when we interchanged sup and lim inf!],
where we used that ψn converges to ψ weakly in L2 as well.
Remark: Note that in this proof we did not use that ψj ψ in H1,
just that ψj ψ weakly in L2. The weak convergence in L2 is weaker
than the weak convergence in H1, i.e. every H1-weak-convergent
sequence converges weakly also in L2. This is because for any g ∈
L2 we have
ψj, g =
1 + 4π2k2 (1 + 4π2k2)dk = ψ, g
In the middle limit we used that ψj weakly converges in H1 and that
the function
g(k)
1 + 4π2k2
is the Fourier transform of an H1-function because its H1 norm is
given by
∫ g(k)
This statement holds in full generality:
Lemma 4.2 Suppose that the vectorspace X is equipped with two
norms, · 1 and · 2 such that the second norm is stronger, i.e.
there is a constant C such that
x1 ≤ Cx2, ∀x ∈ X
and (X, · 2) is dense in (X, · 1). Suppose that a sequence xn
converges weakly in the norm · 2, then it converges weakly in · 1
as well.
Proof: Let X1 and X2 denote the normed spaces that are the
vectorspace X equipped with · 1 and · 2, respectively. Since X2 ⊂
X1 is a dense subset, we easily see that X∗
1 ⊂ X∗ 2
(CHECK!) Thus, if xn converges to x weakly in X2, then (xn) → (x)
for any ∈ X∗ 2 , but
then (xn) → (x) for any ∈ X∗ 1 as well, i.e. xn converges to x
weakly in X1 as well.
Proof of (4.13). Cut off the function V as follows
V δ(x) = V (x) · 1{x : |V (x)| ≤ 1/δ}
11
≤ V − V δd/2ψjH1 → 0
as δ → 0 uniformly in j (because of (4.11)). Therefore it is enough
to show [THINK IT OVER!] that
lim j→∞
∫ V δ|ψj |2 =
∫ V δ|ψ|2
for each fixed δ. Fix now δ, choose ε > 0 and define
Aε = {x : |V δ(x)| > ε}
Since V vanishes at infinity, we have |Aε| <∞. Write
∫ V δ|ψj|2 =
∫
V δ|ψ|2
∫
and similary
V δ|ψ|2 ≤ εψ2 ≤ ε
using the lower semicontinuity of the norm ψ ≤ lim inf ψj = 1 (see
(3.8)). Thus it is sufficient to show that
lim j→∞
12
for every fixed ε > 0, δ > 0. [Make sure you really
understand this argument! Uniformity in j in the truncation bounds
guarantees that it is sufficient to show the j → ∞ limit for each
fixed value of the truncation!]
We estimate
∫
∫
lim j→∞
= 0
i.e. that a sequence that is weakly convergent in H1 is converges
strongly in L2 on any set of finite measure. This is the content of
the Rellich-Kondrashev theorem (see Theorem 5.1 later) and this
proves (4.13).
Proof of (4.14). Using (4.12) and (4.13), we know that E(ψ) ≤ lim
inf E(ψj). Thus
E0 = lim j→∞
E(ψj) ≥ E(ψ) ≥ E0ψ2
where, in the last step, we used that E0 was the infimum of E(ψ) on
the unit sphere, ψ = 1. Thus
E0 ≥ E0ψ2
and since E0 < 0, we get ψ2 ≥ 1. But ψ was the weak limit of a
normalized sequence in H1
(thus also in L2), and the norm can only drop along the weak limit,
we have
ψ ≤ lim inf j→∞
5 Rellich-Kondrashev theorem
We have seen that strong (norm) convergence is indeed stronger than
weak convergence. So how can it happen that fn 0 weakly in Lp (p
< ∞) but fn does not converge strongly? There are essentially
three qualitative possibilities. First is that the function
oscillates to death, second is that it scales up to a spike or down
to a flat pancake, and third is that it wanders out to infinity.
The oscillation and the spike features can occur only if the
derivative blows up (or the function wanders out to infinity in
Fourier space!). The pancake and the wandering features can occur
only in infinite volumes. Therefore it is natural to guess that if
we keep the derivative under control and we restrict ourselves to a
finite domain, then weak convergence actually implies strong
convergence. This is the content of the following important
theorem.
Theorem 5.1 (Rellich-Kondrashev) Let B ⊂ Rd a subset with a finite
Lebesgue measure, |B| <∞. Let fn converge weakly in H1(Rd) to f
. Then for any 1 ≤ q < 2d
d−2 we have
|fn − f |q = 0
i.e. χBfn converges strongly (in Lq) to χBf . Here the exponent q
must satisfy the following
1 ≤ q < 2d
d− 2 , if d ≥ 3
1 ≤ q <∞, if d = 2
1 ≤ q ≤ ∞, if d = 1
Corollary 5.2 Let fn converge weakly in H1(Rd) to f . Then there is
a subsequence fnj that
converges to f(x) pointwise almost everywhere.
Proof of the Corollary. Consider the sequence of balls Bk centered
at the origin with radius k. Recall that by the Riesz-Fischer
theorem, one can select a pointwise almost convergent subsequence
from a strongly convergent sequence in Lp. Thus, by
Rellich-Kondrashev, we can find a subsequence fn1(j) that converges
pointwise almost everywhere on B1. Repeating the argument for the
ball B2 and this subsequence instead of the original sequence, we
can choose a sub-subsequence fn2(j) that converges almost
everywhere on B2. Etc. Finally, by Cantor diagonalization, we can
choose the subsequence f (j) = fnj(j) that will converge for almost
every x ∈ Rd.
14
This theorem is sometimes formulated as a compact embedding. Notice
that the allowed range for q is exactly the Sobolev exponents in
finite volume, i.e. by Holder and Sobolev inequalities
fLq(B) ≤ C(B, q)f2d/(d−2) ≤ C ′(B, q)fH1 (5.16)
This means that the restriction operator
iB : f → f |B
(defined as (iBf)(x) = f(x)1B(x)) as a map from H1 to Lq(B)
iB : H1 → Lq(B)
is a compact linear map, i.e. it maps bounded sets (in
theH1-topology) into relatively compact ones (in the
Lq(B)-topology). [recall: a set is relatively compact if its
closure is compact]. In term of sequences this means that any
bounded sequence in H1 has a convergent subsequence in Lq(B).
To see this, suppose we have a sequence fn such that supn fnH1
<∞, i.e. it is uniformly bounded in H1. In particular, from the
Sobolev inequality (5.16) it follows that {fn} is a bounded subset
of Lq(B), since supn fnLq(B) <∞. However, it is even compact,
i.e. there is a subsequence fnj
that converges in Lq(B). To see this, just recall that by the
Banach-Alaoglu theorem (applied to the Hilbert space H1) there is a
subsequence fnj
that converges weakly in H1 to some function f . But then by
Rellich-Kondrashev the same sequence, restricted to B, converges
strongly in Lq(B).
The theorem also holds if we assume the weak convergence of fn in
H1(B) if B is a bounded, sufficiently nice domain (e.g. with
piecewise C1 boundary without cusps) so that the Sobolev inequality
holds:
fLq(B) ≤ CqfH1(B)
in other words H1(B) ⊂ Lq(B)
The Rellich-Kondrashev theorem for this case says that this
containment is actually compact, i.e. a closed bounded set in H1 is
a compact set in Lq. In other words, that the identity
operator
id : H1(B) → Lq(B)
is a compact linear map. There are similar compact embeddings for
general Sobolev spaces, see Theorem 8.9 of
Lieb-Loss for a more general statement. The most general
formulation is given in R. Adams, J. Fournier: Sobolev Spaces,
Theorem 6.3.
15
Recall the Arzela-Ascoli theorem which is of similar spirit. It
characterizes compact sub- sets of C(K), where K ⊂ Rd is compact.
For compactness it is not sufficient for a sequence {fn} ⊂ C(K) to
be simply bounded, because of the possible oscillation. But if,
addition- ally, the sequence is equicontinuous (which, on a compact
set is equivalent being uniformly equicontinous), then it is
compact.
Proof of Theorem 5.1. We will give the proof in d ≥ 3 case, the
other two cases are analogous (HOMEWORK!).
The proof starts with a standard smoothing idea; define a function
φ ∈ C∞ 0 with
∫ φ = 1
φm(y) := mdφ(my), so that
∫ |φ(y)||y|dy
First we present the idea, for simplicity assume that q = 2, and we
write
fn − fL2(B) ≤ fn − fn φmL2(B) + fn φm − f φmL2(B) + f φm − fL2(B)
(5.17)
We have seen in the proof of the density of C∞ 0 in Lp that the
last term (even after extending
the norm from L2(B) to L2 = L2(Rd)) converges to zero:
lim m→∞
f φm − f2 = 0
Similarly, the first term converges to zero as m → ∞, but this
convergence is in general not uniform in n. Finally, the function
in the middle term in (5.17) converges at least pointwise to zero
for any fixed m as n→ ∞:
( fn φm − f φm
≤ fn − f2φm2 → 0 (5.18)
Note that the L2 norm of φm is not uniformly bounded, since the
scaling set the normalization of φm in L1.
Thus we have to circumvent the lack of uniformity in the double
limit. Without further conditions (like boundedness in H1) it will
not work, think over what happens if fn = e2πinx
in L2[0, 1]!
Since fn weakly converges in H1, it is in particular bounded
sup n
fnH1 <∞
16
by the uniform boundedness principle. The boundedness in H1 will
guarantee that the con- vergence in the first term in (5.17),
limm→∞ fn− fn φm2 = 0, is uniform in n. To see this, let h ∈ Rd and
f ∈ H1, and compute
∫ |f(x+ h) − f(x)|2dx =
∫ |k|2|f(k)|2dk = |h|2 ∇f2
2
≤∇f2
(5.20)
In the first step we used ∫ φ = 1 to “smuggle in” the additional
f(x) inside the x-integration.
∑
|an|fn
but the summation can be replaced with an integral and the
coefficients an with any weight function φ(y). Finally, the last
step is (5.19).
Applying this inequality for fn and φm, and using that supn ∇fn2
<∞, we have
fn φm − fn2 ≤ ∇fn ∫
|φm(y)||y|dy ≤ C
m
with some constant, uniformly in n. Thus, given any ε > 0, we
can find an m such that
fn φm − fn2 ≤ ε
3 (5.21)
3 (5.22)
i.e. the first and the third term in (5.17) are under control. For
the middle term, we note that the weak convergence of fn in H1 also
implies weak
convergence in L2, thus by (5.18) we have that
fn φm(x) =
∫ fn(y)φm(x− y)dy →
∫ f(y)φm(x− y)dy = f φm(x)
holds for all x and all m (test the weak convergence fn f against
the L2-function y → φm(x− y) for any fixed x). Moreover,
|fn φm(x)| = ∫ fn(y)φm(x− y)dy
≤ fn2φm2 ≤ Cφm2
with some constant C, uniformly in n. Thus, by dominated
convergence and the finiteness of |B|, we have
lim n→∞
fn φm − f φmL2(B) = 0 (5.23)
(this limit is not uniform in m). Combining this bound with (5.17),
(5.21), (5.22), we obtain the statement of the theorem
for q = 2. (THINK it over! For a given ε first choose a
sufficiently big m, dictated by (5.21), (5.22), then fix this big
m, and choose a sufficiently large n so that the norm in (5.23) is
smaller than ε/3).
We have thus proved the Rellich-Kondrashev theorem for q = 2. For 1
≤ q < 2 we simply use Holder’s inequality and the fact that B
has finite measure
fn − fLq(B) ≤ C(B, q)fn − fL2(B) → 0
Finally, for 2 < q < 2d d−2
we use Holder and Sobolev inequalities
fn − fq ≤ fn − fθ2fn − f1−θ 2d/(d−2) ≤ fn − fθ2
( ∇fn2 + ∇f2
≤ Cfn − fθ2 → 0
by using the uniform boundedness of ∇fn2. Here the exponent θ is
dictated by the Holder inequality (check that θ = d(1/q − (d−
2)/2d) > 0 will do the job).
18
6 Distributions in nutshell
The concept of distributions is not really necessary for the main
arguments in this course, so we will not introduce them in fully
rigorous details, however, they will be (mildly) needed in the
following proof. So we summarize the basic idea following Chapter 6
of Lieb-Loss (for more details, look up the book).
We have already defined the concept of weak solutions of a PDE.
Look at the following very simple differential equation:
f ′(x) = 2H(x) − 1 (6.24)
where f is the unknown function and H(x) = 1{x : x > 0} is the
Heaviside function. What could f be?
If we insist on the usual definition of the derivative, then there
is no solution, because of Darboux theorem from real analysis
[Recall: if f is differentiable on a closed interval [a, b] and it
has right derivative at a and left derivative at b, then f ′ takes
on any value between f ′
+(a) and f ′
−(b).] On the other hand, the function f(x) = |x| + c (with some
constant c) “should be” a
solution in any reasonable sense, since f ′(x) = 2H(x) − 1 for all
x 6= 0. Since we are anyway prepared to forget about a zero measure
set of points, this “hole” should not really prevent us from
declaring f(x) = |x| + c to be a solution.
Now we can be even bolder, and differentiate (6.24):
f ′′(x) = 2H ′(x)
Of course H ′(x) = 0 for any x 6= 0 and we agreed not to care about
one single point, so it looks like
f ′′(x) = 0
But we can solve this differential equation, the solution is f(x) =
bx + c for some constants b, c. However, now we have a problem,
since the solution f(x) = |x| + c is not recovered. So maybe
something still happened at x = 0 which we missed? What is allowed
and what is not?
The right concept is the weak derivative, as we already mentioned
earlier. Recall that f ∈ L1
loc(R d) (locally integrable function) if its restriction to every
compact set K is L1.
We say that a function f ∈ L1 loc(R
d) has a weak derivative, Df(x), if there is a function g ∈
L1
loc(R d) such that ∫
Rd
Dφ(x)f(x)dx = − ∫
Rd
φ(x)g(x)dx
for all φ ∈ C∞ 0 testfunction. (D denotes the derivative in
general, so in case of functions on Rd
it is just the gradient, but we sometimes use the more general
notation as we will differentiate other objects as well).
19
It is easy to see that g(x) is unique (as an element of L1
loc(R
d)) and we denoteDf(x) = g(x). It is also easy to see that the weak
derivative coincides with the strong one if f is differentiable. In
other words, the weak derivative, if it exists, is defined with the
requirement that the formal integration by parts holds when tested
against a nice (smooth, compactly supported) test function. It is
easy to check [HOMEWORK!] that in this sense
(|x| + c)′ = 2H(x) − 1
but H ′(x) 6= 0. So what is H ′(x)? We need a g(x) such that
∫ Dφ(x)H(x)dx = −
∫ φ(x)g(x)dx
(here, in one dimension Dφ = φ′). But it is easy to see that
∫ Dφ(x)H(x)dx =
so we need g(x) such that ∫ φ(x)g(x)dx = φ(0) (6.25)
for any φ ∈ C∞ 0 . Obviously, there is no such function g(x) ∈
L1
loc, in particular the above argument, saying that H ′ = 0 was
wrong. [THINK it over: g = 0 does not do the job, but otherwise
there is a set of positive Lebesgue measure A, separated away from
zero, dist(A, 0) > 0, such that, say,
∫ A g > 0. We can assume that A is bounded. By the regularity of
the measure
with density g, there is a sequence of open sets On such that A ⊂
On, ∫ On g →
∫ A g > 0. In
particular, there is an open interval I, separated away from zero,
such that ∫ I g > 0. For any
ε > 0 choose a smooth function φε such that 0 ≤ φε ≤ 1I , and
such that φε → 1I pointwise as ε→ 0. Conclude that
0 <
∫ g(φε − 1I)
The first term is φε(0) = 0, the second goes to zero as ε → 0 by
dominated convergence. Contradiction.]
So it seems that H ′(x) cannot be defined with the above definition
of the weak derivative. However, we can just define H ′(x) to be
the “object” that assigns to any function φ(x) its value at zero.
In other words, it will not be a function, but a linear functional
on the space of test functions. The key idea is thus to depart from
the concept of functions, and replace them by linear functionals on
C∞
0 .
20
Definition 6.1 For any ⊂ Rd non-empty open set we define the space
of test-functions, D() to be the space C∞
0 () equipped with the following notion of convergence. A sequence
φn ∈ D() converges to φ ∈ D() if there is a (common!) compact set K
⊂ such it contains the support of φn − φ,
supp(φn − φ) ⊂ K
and Dαφn → Dαφ
uniformly on K for any multiindex α = (α1, α2, . . . , αd).
Obviously, D() is a linear space.
For mathematicians: The space D() is not a normed space, there is
not a single norm that generates this topology, but it can be
generated by a family of seminorms. It is not metrizable, but
complete. In this presentation we avoided defining the topology
precisely (via bases of neighborhoods etc), we only defined
convergence. In general, convergence does not determine the
topology, but in “not too big” spaces it does. This is the case for
D(), so actually the convergence given above defines the
topology.
Definition 6.2 A distribution T is a continuous linear functional
on D(). The space of distributions is a linear space, denoted by
D′() and it is equipped with the natural convergence:
Tn → T in D′() iff Tn(φ) → T (φ) for any φ ∈ D()
For any f ∈ L1 loc(R
d) we can define a distribution
Tf(φ) =
Rd
fφ
It is easy to see that this is indeed a distribution [CHECK!]. It
can be easily shown that if f, g ∈ L1
loc, and Tf = Tg as distributions (i.e. ∫ fφ =
∫ gφ for
any φ ∈ C∞ 0 ), then f = g almost everywhere, i.e. f = g as
elements of L1
loc. We can therefore identify f with the distribution Tf , and in
case of such distributions we will not distinguish between a
function f and the distribution Tf .
The Dirac delta distribution at the point x ∈ Rd is defined
as
δx(φ) = φ(x)
We have shown above that this distribution is not given by a
function (the proof above was given only in d = 1, CHECK that it
works in any dimensions). Despite this fact, it is often called
(especially in physics) the Dirac delta function.
Distributions can be differentiated:
21
Definition 6.3 For any distribution T ∈ D′() and any multiindex α
we define its α-order partial derivative ( ∂
∂x1
)α1 ( ∂
∂x2
)α2
(DαT )(φ) = (−1)|α|T (Dαφ)
with |α| = ∑d
j=1 αj. This concept is called the weak or distributional
derivative. [HOME- WORK: check that it is indeed a
distribution]
It is easy check that for smooth functions f , the old concept of
derivative and the new one coincide:
DαTf = TDαf
Actually it is true even in a stronger form [See Theorem 6.10 of
Lieb-Loss]: If T ∈ D′() and we find that ∇T is a continuous
function G : → Rd, then T is a C1 function F , i.e. T = TF and of
course G = ∇F (in classical sense).
The main advantage of the new definition is that every distribution
is infinitely often differentiable! Moreover, the limit is
compatible with the derivative:
Tn → T (in D′()) =⇒ DαTn → DαT (in D′())
for any multiindex [WHY??]. So any convergent sequence of
distributions can be differentiated. This is a big plus, remember
how messy it is to interchange limits with derivatives for
functions, e.g. if fn → f pointwise, then f ′
n → f does not hold at all; first f ′ n may not exists, and
second
even if f ′ n and f ′ exist, the limit does not follow.
With this new concept, we are able to differentiate
non-differentiable (or even discontinu- ous) functions, one example
is
H ′ = δ0
as elements of D′(R). We are also able to make sense of (partial)
differential equations in a much bigger class of
solutions (i.e. we can talk about solution in distributional
sense). There are some facts for functions that easily extend to
distributions (for the proofs, see,
e.g. Lieb-Loss Section 6):
i) There is a natural formulation of the Fundamental Theorem of
Calculus for distributions (see Thm 6.9 of Lieb-Loss), here I just
write up one special case: for any y ∈ Rd and a.e. x ∈ Rd, we
have
f(x+ y) − f(x) =
22
for any f ∈ W 1,1 loc = {f : Rd → C : f ∈ L1
loc,∇f ∈ L1 loc} where ∇f is understood in
distribution sense.
ii) If the derivative of a distribution is zero on a connected set,
then the distribution is constant (meaning that it is identical to
a constant function).
iii) Distributions can be naturally multiplied with C∞ functions
ψ:
(ψT )(φ) := T (ψφ)
and the usual chain rule holds. This definition extends the usual
pointwise multiplication of functions.
iv) Distributions can be smoothed out by taking convolutions with a
rescaled compactly supported function j ∈ C∞
c with ∫ j = 1, i.e. if we define
(j T )(φ) = T (∫
j(y)φ(· + y)dy )
then this distribution is given by a function, i.e. there exists a
function t ∈ C∞ (de- pending on T and j) such that
(j T )(φ) =
T (∫
) is smooth.
This definition extends the usual convolution of functions. If we
rescale jε(x) = ε−dj(x/ε), then
jε T → T in D′()
v) Let ψ1, ψ2, . . . ψn be a family of L1 loc functions, and assume
that the distribution T
satisfies the property that T (φ) = 0 for all φ ∈ D() such that ∫
φψj = 0, j = 1, 2, . . . , n.
Then there exists constants cj such that
T =
n∑
cjψj
23
[Let me just sketch the proof for n = 1. Fix a function u1 ∈ D such
that ∫ u1ψ1 = 1.
(such function exists, why??) Write any φ ∈ D as
φ = v + (∫
∫ vψ1 =
∫ [ φ−
(∫ φψ1
) u1
) T (u1)
i.e T = c1ψ1 with c1 = T (u1). The generalization for n > 1 is
an easy linear algebra.]
vi) Suppose that T (φ) = 0 for any φ ∈ D() such that supp φ ⊂ \
{0}. Then there is a finite number K and constants c0, c1, . . . cK
such that
T = K∑
j=0
cjδ (j) 0
i.e. distributions supported at the origin (in the above sense) can
be only linear combi- nations of derivatives of the delta
distribution.
vii) [Integration by parts] Similarly to the H1-spaces, in some
cases integration by parts is allowed. We will need the following
statement, that can be proved by standard approximation
argument.
Suppose that u, v ∈ H1(Rd) and suppose v ∈ L1 loc. Moreover,
suppose that v is real
− ∫
∇u · ∇v (6.26)
viii) A distribution T is called positive if T (φ) ≥ 0 for any
nonnegative test function φ ≥ 0. Positive distributions “behave”
much nicer than general ones, in fact, positive distribu- tions are
regular Borel measures.
24
So far it seems that distributions are in every aspect superior to
functions: we seemingly can do everything with them (limit,
differentiation, convolution) and everything seems much easier. The
life is not as nice:
However, there are natural operations that cannot be done with
distribution. First is that the distributions form a linear space,
so they can be added and subtracted, but in general they cannot be
multiplied or divided. E.g. it does not make sense to talk about
the square of the Dirac delta distribution. This is the main reason
why distributions are very useful in linear PDE’s, but one has to
be very careful with their usage for nonlinear PDE’s.
The second operation that is in general meaningless is to take
compositions, one cannot make sense of T (S(·)) for any two
distributions T, S ∈ D′. However, for certain func- tions in
Sobolev spaces, it does make sense. An important special case is
the H1 functions (for more general statement, see Lieb-Loss Theorem
6.16) If G = G(s1, s2, . . . sd) is a differentiable function with
bounded and continuous derivatives and u = (u1, u2, . . . ud) is a
vector-valued function with components uj ∈ H1, then the
function
K(x) = (G u)(x)
∂xj
Remark on the tempered distributions: In case of = Rd one can
develop the theory of distributions via the Schwarz functions, S()
instead of D(). The definition is the same, but we get a smaller
set S ′(Rd) ⊂ D′(Rd). This will be called the space of tempered
distributions. Note that S ′ takes into account the behavior at
infinity, while D′(Rd) is a local concept. E.g. any f ∈ L1
loc(R d) function is in D′(Rd), but if the function f grows
too
fast at infinity (faster than any polynomial) then it is not in S
′(Rd). One advantage to work with S ′(Rd) is that the Fourier
transform can be extended from
S(Rd) to S ′(Rd) and it is a bijection from S ′(Rd) to itself. The
Fourier transform of elements of D′ are much harder to
characterize. Some of these issues may be discussed in the
Tutorium, they are not necessary for this course.
7 Excited states
[We follow Lieb-Loss, Theorem 11.6] Consider − + V where V
satisfies the conditions of Theorem 4.1. Let E0 be the ground
state energy and let ψ0 be (one of) the ground state eigenfunction.
We successively define
25
energies E1 ≤ E2 ≤ · ≤ Ek ≤ · and eigenfunctions ψ1, ψ2, . . . by
the following variational principle. Suppose that E0, . . . Ek−1
and ψ0, . . . ψk−1 have been defined, then let
Ek := inf { E(ψ) : ψ ∈ H1(Rd), ψ2 = 1, ψ, ψj = 0, j = 0, 1, . . . ,
k − 1
} (7.27)
Theorem 7.1 Let V satisfy the conditions of Theorem 4.1. Suppose
that for some k ≥ 0 the first k eigenstates exists and assume that
Ek defined above is negative, Ek < 0. Then the (k + 1)-th
eigenstate ψk, defined as the minimizer in (7.27), also exists and
it satisfies the Schrodinger equation, (− + V )ψk = Ekψk in a weak
sense. Thus the recursion above stops only when the zero energy is
achieved. Moreover, if Ek < 0, then its multiplicity is finite,
i.e. it can be listed at most finitely many times, i.e. there could
be at most finitely many eigenstates corresponding to Ek. Even
more, the increasing sequence E1 ≤ E2 ≤ . . . cannot accumulate at
any negative number. The choice of ψk may not be unique, but the
finite dimensional space spanned by the eigenfunctions belonging to
the same eigenvalue is unique. In particular, the sequence of
eigenvalues does not depend on the choice. Moreover, the
eigenfunctions can be chosen real.
Sketch of the proof. The proof of the existence of ψk, is exactly
the same as the proof of Theorem 4.1: we choose a minimizing
sequence, we pass to a subsequence that converges in H1, and the
weak limit will be ψk. The constraints ψ, ψj = 0, j ≤ k − 1,
survive under the weak limit.
To prove that ψk satisfies the Schrodinger equation, we follow the
same proof for the ground state, but the test function φ ∈ C∞
0 must also satisfy the constraints, φ, ψj = 0, j ≤ k − 1
[WHY???]
Thus we conclude that T = (− + V − Ek)ψk
is a distribution such that D(φ) = 0 for any φ ∈ C∞ 0 , with φ, ψj
= 0, j ≤ k − 1. It follows
from this (see property v) of the distributions) that T is a linear
combination of ψj :
T = k−1∑
j=0
cjψj (7.28)
To prove that all cj = 0, we multiply D by some ψi, i ≤ k − 1. This
step is only formal, since ψi is not a C∞
0 function, so T (ψi) strictly speaking does not make sense. For
the rigorous proof, one can use an approximation argument, i.e.
approximate ψi by a C∞
0 function. Accepting this formality, and integration by parts, we
have
ci = T (ψi) =
26
(we used that ψi and ψk are orthogonal). On the other hand, ψi
satisfies (−+V −Ei)ψi = 0 in distribution sense, and
multiplying
its complex conjugate with ψk we have ∫
∇ψi · ∇ψk +
∫ V ψiψk = 0
so we have proved that ci = 0 for all i ≤ k − 1. Finally, we assume
that Ek < 0 has infinite multiplicity, i.e. Ek = Ek+1 = Ek+2 = .
. ..
Following the proof of the existence of the successive eigenstates,
we find an orthonormal sequence ψk, ψk+1, . . ., that all satisfy
(− + V )ψk = Ekψk. By (2.2), we see that ψjH1 is bounded, so there
is a weakly convergent subsequence, ψnj
ψ in H1. But then ψnj ψ
weakly in L2 as well (see Lemma 4.2). However, an orthogonal
sequence in L2 converges weakly to 0 [WHY? – THINK IT OVER], thus ψ
= 0. But then
lim j→∞
∫ V |ψnj
Ek =
∫ |∇ψnj
|2 +
|2
Now taking the liminf as j → ∞, the first term is positive, the
second goes to zero, so we get Ek ≥ 0 which contradicts to Ek <
0.
It is easy to see that all we used in this proof is that we have an
infinite sequence of eigenvalues that below a strictly negative
number, i.e. E1 ≤ E2 ≤ . . . ≤ E with some E < 0 is also
excluded. Thus there could be no accumulation point of eigenvalues
in the negative axis.
The uniqueness of the eigenspace is an easy exercise in linear
algebra (as far as the eigenspaces are concerned now we can work in
finite dimensions). The fact that the eigen- functions can be
chosen real, follows from the fact that if ψ is an eigenfunction,
then so is ψ, so if ψ is not real and is not a constant multiple of
a real function, then instead of ψ and ψ one can take their real
and imaginary parts, that are also eigenfunctions and span the same
space.
Remark. If you are unhappy that the proof was not rigorously
formalized in every detail and you do not want to spend a little
time to go through the approximation argument with general
distributions, then here is an alternative argument. I will just
show it for the first excited state ψ1. The existence of ψ1 was
rigorous and so was the derivation of the relation
∫ ∇ψ1 · ∇φ+
27
for any φ ∈ C∞ 0 such that ψ0, φ = 0. Since ψ1 ∈ H1, by standard
density argument we see
that (7.29) holds for all φ ∈ H1 functions with ψ0, φ = 0 [CHECK! –
you will need the fact that
∫ V |f |2 ≤ C(V )fH1 that has been used several times].
Define the linear functional L on H1 as follows
L(φ) :=
∫ ψ1φ,
it is easy to see that this is a continuous linear functional. We
know that L ≡ 0 on the orthogonal complement of the one dimensional
space spanned by ψ0. Therefore there exists a number µ (called
Lagrange multiplier) such that
L(φ) = µψ0, φ
µ = L(ψ0) =
∫ ∇ψ1 · ∇ψ0 +
∫ V ψ1ψ0 (7.30)
where we used ψ0, ψ1 = 0. But ψ0 was the ground state, i.e.
∫ ∇ψ0 · ∇φ+
∫ ψ0φ
for any φ ∈ C∞ 0 which, as above, could be extended to any φ ∈ H1.
Plugging in φ = ψ1, we
get ∫ ∇ψ0 · ∇ψ1 +
∫ V ψ0ψ1 = E0
∫ ψ0ψ1 = 0
by orthogonality. Comparing this equation with (7.30), we see that
µ = 0, so L ≡ 0 and thus ∫
∇ψ1 · ∇φ+
∫ ψ1φ
holds for any φ ∈ H1 (without restriction onto the orthogonal
complement). This means that ψ1 is a weak solution to the
Schrodinger equation.
8 Properties of eigenfunctions
We do not have time to discuss all important properties in detail,
but I would like to list a few key facts.
28
Theorem 8.1 Assume the conditions of Theorem 4.1 and suppose that
E0 < 0. Then the following hold:
i) The ground state is unique (modulo trivial constant
multiple)
ii) The ground state can be chosen strictly positive.
iii) The positivity of an eigenfunction characterizes the ground
state, i.e. if (−+V )ψ = Eψ for some ψ ∈ H1 and E ∈ R, and ψ ≥ 0,
then E = E0 and ψ is the ground state.
iv) If V is spherically symmetric, i.e. V (x) = V (|x|), then so is
the ground state.
v) If (−+V )ψ = 0 holds in distribution sense in some open ball B,
and V ∈ Ck(B), then ψ ∈ Ck+2(B), i.e. the regularity of the
solution is two order better than the regularity of the
potential.
We will need the following lemma:
Lemma 8.2 If f ∈ H1 then |f | ∈ H1 and
∫ ∇|f | 2 ≤
∫ |∇f |2 (8.31)
actually it holds even pointwise that |∇|f || ≤ |∇f |. Moreover, if
|f(x)| > 0, then equality in (8.31) can hold only if there is
complex number λ of unit length, |λ| = 1, such that f(x) =
λ|f(x)|.
Sketch of the proof. The short (and almost correct) proof of this
fact is the following. Write f(x) = |f(x)|eiθ(x), i.e. decompose it
into modulus and phase. Then by ∇f = eiθ∇|f |+ i(∇θ)|f |eiθ we
have
|∇f |2 = ∇|f |
2 + |∇θ|2|f |2
so the inequality is obvious. Equality holds only if |∇θ|2|f |2 =
0, but if |f | > 0, then clearly ∇θ = 0, so θ = const.
The little problem with this argument is that the function θ in the
decomposition f(x) = |f(x)|eiθ(x) is not unique if f vanishes on
some set. In other words, ∇|f | has to be defined also on the set
where f(x) = 0. The absolutely honest argument goes through by
defining ∇|f | to be 0 where f(x) = 0 and otherwise
(∇|f |)(x) = R(x)∇R(x) + I(x)∇I(x)
|f(x)|
29
where f(x) = R(x) + iI(x) is the real and imaginary part
decomposition. The above formula coincides with the chain rule
definition
∇ √ R2 + I2 =
R∇R + I∇I√ R2 + I2
however the chain rule literally cannot be applied to the absolute
value function since it is not C1. So one first has to regularize
the absolute value function, apply the chain rule and check that in
the claimed inequality the regularization can be removed (see
Theorem 6.17 of Lieb-Loss for more details). In principle, the same
care is needed when establishing the cases of equality, see Theorem
7.8 of Lieb-Loss.
Sketch of the proof of Theorem 8.1. We start with proving i) and
ii). Armed with the lemma, we see that
E(|ψ|) ≤ E(ψ)
for any ψ ∈ H1 and if |ψ| > 0, then equality holds if and only
if ψ = λ|ψ|. Suppose we had two ground states, ψ and φ, i.e. E0 was
listed twice among the eigenvalues.
We can assume that ψ and φ are orthogonal. [Exercise: the space of
all ground states is a linear space, i.e. if ψ and φ are ground
states, so is aφ+ bψ with any a, b ∈ C.]
By the remark above, we can also assume that both are non-negative
since E(|ψ|) = E(ψ), so we can always replace ψ with |ψ| by
possibly lowering the energy, but the ground state cannot be
lowered further.
Now it is a deeper fact of elliptic PDE theory (Harnack inequality)
that if f is a non- negative function such that for some potential
W that is bounded from above and for which Wf ∈ L1
loc, we have −f +Wf = 0
in distribution sense, and f is not identically zero, then f is
strictly positive (see Lieb-Loss Theorem 9.10). Using this fact, we
are given the necessary condition |ψ| > 0 for the equality, i.e.
we know that ψ = λ|ψ| and φ = µ|φ|, but these two functions cannot
be orthogonal
∫ ψφ = λµ
This proves parts i) and ii) of Theorem 8.1.
Remark: Those who have had some PDE, the Harnack inequality is
related to the mean value property of harmonic functions: if −f =
0, then f(x) = 1
|B|
30
with center x. If W is present, the mean value property does not
hold any more, but it is still correct as an inequality: there is a
constant C, depending on B and W , such that
f(x) ≥ C
f(y)dy. [Harnack inequality]
For the proof of iii), we only need to show that E = E0, then the
claim will follow from i) and ii). We claim that if E > E0, then
ψ must be orthogonal to the ground state ψ0. From the eigenvalue
equation (− + V )ψ = Eψ we have
∫ ∇ψ · ∇φ+
∫ ψφ
for any φ ∈ C∞ 0 , and by standard approximation, we get it also
for φ ∈ H1, in particular for
φ = ψ0: ∫ ∇ψ · ∇ψ0 +
∇ψ0 · ∇φ+
∇ψ0 · ∇ψ +
∫ ψ0ψ
Comparing these two equations and using that V is real and that E
6= E0, we immediately get that
∫ ψ0ψ = 0. But this is impossible since ψ0 > 0 and ψ ≥ 0 (and ψ
is not identically
zero).
The proof of iv) follows a different path, namely the rearrangement
inequalities. We will show the proof only for V ≤ 0, although the
statement is true in general.
For any nonnegative function f : Rd → R+ one can define its
symmetric decreasing rearrangement, f ∗(x), as follows. The
function f ∗(x) will be a spherically symmetric function, i.e. f
∗(x) = f ∗(|x|) (this is a bit sloppy notation, strictly speaking
one should say that f ∗(x) is given by a function of one variable
composed with the length; F (|x|) where F : R+ → R+). Furthermore,
we require that each level set of f ∗ has the same measure as that
of f :
|{x : f(x) > t}| = |{x : f ∗(x) > t}|
31
for all t > 0. It can be shown that there is a unique function
with these properties (see Section 3.3 of Lieb-Loss). For the
symmetric rearrangement it can be easily shown that
fp = f ∗p for any p
using the formula
fpp = p
tp−1|{x : f(x) > t}|dt
for any non-negative function. A bit more involved to prove
that
∫ V f ≤
∫ V ∗f ∗
for any two nonnegative functions. Moreover, if V = V ∗, then
equality can occur only if f = f ∗. [Lieb-Loss: Thm 3.4]. An even
bit more work is required to show that if f ∈ H1, then f ∗ ∈ H1 and
∫
|∇f ∗|2 ≤ ∫
|∇f |2
(Lemma 7.17 of Lieb-Loss) But then, putting these information
together, we see that if ψ is the positive ground state,
then E(ψ∗) ≤ E(ψ)
thus ψ∗ is also a ground state. But the ground state is unique
(modulo constant multiple), so ψ = (const)ψ∗, i.e. it is a
spherically symmetric function.
There is another proof of the symmetry of the ground state
eigenfunction; just show that rotations of ψ0, i.e. x 7→ ψ0(Rx) for
any rotation R is also a ground state and use i).
Finally, statement v) follows from standard elliptic regularity
theory. It is related to the fact that the solution of the Poisson
equation −u = f has two more derivatives than f [roughly speaking].
We do not present the proof here [see Lieb-Loss: Theorem
11.7]
32