194
N OTES ON QUANTUM M ECHANICS A SET OF LECTURES V.P. N AIR

V.P. NAIRnair.ccny.cuny.edu/LectureNotes-QM.pdf · The photoelectric effect refers to the phenomenon where a beam of light shining Photoelectric effect on a material leads to the

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

NOTES ON QUANTUM MECHANICS

A SET OF LECTURES

V.P. NAIR

Contents

1 Introduction to the physics 5

1.1 Difficulties with classical physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Matter waves and key concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Introduction to the mathematical framework 11

2.1 Linear vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 L2-functions as a Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Diagonalization of a hermitian matrix . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Diagonalization of a hermitian operator on a Hilbert space . . . . . . 23

2.7 Hermitian operators with a lower bound and completeness . . . . . . 25

2.8 Commuting operators/matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.9 Unitary operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.10 The Dirac δ-function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Merging the physics and mathematics 32

3.1 Postulates and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 The role of unitary transformations and the Schrödinger equation 35

4 Particle in a box: One-dimensional case 39

5 Linear harmonic oscillator 43

5.1 The operator method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 The method of using the differential equation . . . . . . . . . . . . . . . . . . . . . . 48

6 More about particles in one dimension 53

6.1 Free particle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Piecewise constant potentials in one dimension . . . . . . . . . . . . . . . . . . . . 54

7 The uncertainty principle, classical physics, probability 61

7.1 Uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.2 Recovering classical physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.3 Conservation of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8 Angular momentum 67

8.1 Spherical coordinates and angular momentum . . . . . . . . . . . . . . . . . . . . . 67

8.2 General theory of angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

8.3 Addition of angular momenta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

9 Three dimensions and central potentials 79

9.1 Schrödinger equation in spherical coordinates . . . . . . . . . . . . . . . . . . . . . 79

9.2 Central potentials and separation of variables . . . . . . . . . . . . . . . . . . . . . . 82

9.3 Legendre polynomials, spherical harmonics: Some observations . . . . . . 86

10 Hydrogen atom and other bound states in central potentials 92

10.1 Solving the idealized Hydrogen atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10.2 Building up atoms and the periodic table . . . . . . . . . . . . . . . . . . . . . . . . . 97

10.3 The deuteron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

11 Spin of the electron 102

11.1 Spin and matrix representation of spin . . . . . . . . . . . . . . . . . . . . . . . . . . 102

11.2 Magnetic moment of the electron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

11.3 The Pauli equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12 Many body quantum mechanics 111

12.1 Many-body wave functions, spin-statistics theorem . . . . . . . . . . . . . . . . 111

12.2 Two-electron wave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

13 Rayleigh-Schrödinger perturbation theory 116

13.1 Perturbation theory for nondegenerate states . . . . . . . . . . . . . . . . . . . . . 116

13.2 Helium atom: Corrections to ground state energy . . . . . . . . . . . . . . . . . . 119

13.3 The anharmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

13.4 The exchange integral and spin-spin interaction . . . . . . . . . . . . . . . . . . . 123

13.5 Spin-orbit interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

13.6 Zeeman effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

13.7 The atom in an electric field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

13.8 Degenerate state perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

13.9 Linear Stark effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

14 The variational method 136

14.1 Formalism of variational approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

14.2 Ground state of the Helium atom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

14.3 Another example: |x| potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

15 Scattering 140

15.1 Basic framework and the Born approximation . . . . . . . . . . . . . . . . . . . . 140

15.2 Scattering by Yukawa and Coulomb potentials . . . . . . . . . . . . . . . . . . . . 144

15.3 Another short range potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

15.4 The method of partial waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

15.5 Validity of approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

15.6 The spherically symmetric hill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

15.7 Scattering by a hard sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

16 Time-dependent perturbation theory 156

16.1 Formulation and general features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

16.2 Absorption and emission of radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

16.2.1 Electromagnetic waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

16.2.2 The interaction Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

16.2.3 Absorption of radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

16.2.4 Emission of radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

16.2.5 The matrix element and selection rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

16.3 Photoelectric effect/Photoionization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

17 Transformations, pictures, etc. 174

17.1 Transformations and generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

17.2 Schrödinger, Heisenberg and Dirac pictures . . . . . . . . . . . . . . . . . . . . . . 181

17.3 Symmetries and conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

17.4 Discrete symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

17.4.1 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

17.4.2 Time-reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

5

1 Introduction to the physics

1.1 Difficulties with classical physics

Classical physics had achieved an incredible level of success by the end of the nine-

teenth century. Successful explanations and precise calculations were obtained for a

wide variety of mechanical, optical, electrical and magnetic phenomena. But there

were indications by the late nineteenth and early twentieth centuries that the concepts

of classical physics are inadequate. We will talk about a few of these.

One of the best known problems had to do with blackbody radiation. A perfect Blackbodyradiation

emitter (i.e., a blackbody) heated up to a temperature T will emit radiation. A key

quantity of interest is the spectrum of this radiation, namely, what is the energy density

in a range of wave vectors ~k to ~k + d~k? This is a rather difficult quantity to calculate

using the Maxwell equations (which were well established by then); nevertheless, Wien

was able to use a clever combination of thermodynamic arguments along with ideas

of statistical mechanics to obtain the formula

duk = 2d3k

(2π)3Iωk exp

(− I ωkkBT

)(1.1)

Here duk is the energy density of the radiation contained in a small range d3k of values

around ~k. ωk is the circular frequency of the radiation, kB is Boltzmann’s constant

and I is a constant, not fixed by Wien’s arguments. This formula fits rather well with

observations (for a suitable choice of I) at high frequencies, but deviates significantly

at low frequencies. With more data available by the mid-1890s, Planck was able to find

a formula which fit the data very accurately. This formula is

duk = 2d3k

(2π)3~ωk

1

e(~ωk/kBT ) − 1(1.2)

where ~ is a new universal constant introduced by Planck, now known as Planck’s

constant. What was remarkable is that Planck showed that to obtain this formula by

any kind of statistical reasoning one had to assume that the absorption or emission

of radiation of a particular frequency ωk happened only when the energy was an

integer multiple of ~ωk. This is a clear departure from what is expected in classical

electromagnetic theory.

Another difficulty which is well known is the issue of the stability of the Rutherford Stability of the

atommodel of the atom. By 1911, experiments done in Rutherford’s laboratory on the scat-

tering of α-particles by atoms in thin gold foils, supported by theoretical calculations

by Rutherford himself, had confirmed that the atom consists of a positively charged

heavy nucleus with electrons bound to it, but not attached to it. The scattering was

able to detect the unscreened Coulomb field of the nucleus. Electrons had to be in

bound orbits around the nucleus. But according to classical electrodynamics, the

1.1 Difficulties with classical physics 6

accelerating electron should radiate and hence spiral down to the nucleus, so the

atom would not be stable. Thus while the Rutherford model was compelling from the

experimental point of view, it was incompatible with classical physics.

Yet another problem was with specific heats. The classical equipartition theorem, Problem ofspecific heats

which was proved by Maxwell, assigns 12kB units of specific heat to every degree of

freedom, except for the three coordinates of position. Thus, for a molecule made of n

atoms, we should expect

Cv = 12(6n− 3)kB (1.3)

This is independent of temperature. Maxwell himself had realized, fairly early on,

that there is a problem with this, as complicated molecules, with several internal

oscillatory degrees of freedom, should have a high specific heat according to (1.3),

but experimentally they do not. For example, at normal temperatures, some sample

values of specific heat per atom (in units of kB) are:

Material Cv/atom Cv/atom

(observed) (equipartition)

Aluminum 2.91 ≈ 3

Ammonia (NH3) 3.21 2.63

Argon 1.50 1.50

Methane (CH4) 0.85 2.7

The experimental values are not what we expect from classical statistical physics.

Further, it was experimentally clear that the specific heats of materials change with

temperature, decreasing as we lower T . In fact, this was formalized as the Nernst heat

theorem or the third law of thermodynamics by 1907 or so. This law would require

the specific heats to go to zero as T → 0. Einstein showed how one could explain

this by using the Planck distribution for the vibrational modes of atoms in a solid. In

1913, Debye applied the same idea, taking account of all modes of lattice vibrations,

including the so-called acoustic phonons, and obtained the famous T 3-law for the low

temperature behavior of specific heats.

The photoelectric effect refers to the phenomenon where a beam of light shining Photoelectriceffect

on a material leads to the emission of electrons by the material. Since the energy of a

beam of light is proportional to its intensity, if the process is classical, we would expect

that the energy of the emitted electrons, in particular the maximum kinetic energy

they can have, would increase with the intensity. But this is not what is observed.

Einstein argued that if light consists of particles, which we now call photons, each

such particle could be attributed an energy ~ω following from the exponential factor in

1.1 Difficulties with classical physics 7

the Planck distribution. In this case, if the electron in the material absorbs the photon

and is ejected, then conservation of energy would give

~ω = 12mv

2 +W (1.4)

This conservation argument would apply to the case of no loss of energy for the

electron due to other effects in the material, so the formula (1.4) should really apply to

the maximum energy the ejected electron should have. ThusK.E.max = ~ω−W . Here

W is the binding energy of the electron in the material, usually referred to as the work

function. This relation shows that the intensity is not the crucial factor, the maximum

energy of the electron will increase with frequency. This is what is observed. In fact,

one of the early determinations of the value of ~ (by Millikan) was by measuring the

electron energies and using this relation.

Magnetism is another area where the inadequacy of classical physics is revealed Bohr-vanLeeuwen

theoremin a simple way. If atoms consist of charged particles, we can try to determine the

behavior of a large number of atoms via statistical arguments,. The partition function

of a number of charged particles(of charges qα), based on classical physics, is1

QN =1

N !

∫ ∏α

d3xα d3pα

(2π~)3e−βH (1.5)

where the Hamiltonian H is given by

H =N∑α=1

(pαi − qαAi(xαi) )2

2mα+ V (x) (1.6)

Here α refers to the particle, i = 1, 2, 3, as usual, and V is the potential energy. The

latter could include the electrostatic potential energy for the particles as well as the

contribution from any other source. Ai(xαi) is the vector potential which is evaluated

at the position of the α-th particle. The integration is over the volume of the phase

space. One can now change the variables of integration to Παi = pαi − qαAi(xαi), so

that the Hamiltonian becomes

H =

N∑α=1

Παi Παi

2mα+ V (x) (1.7)

Although this eliminates the external potential Ai from the Hamiltonian, we have to

be careful about the Jacobian of the transformation. But in this particular case, the

Jacobian is 1. For the phase space variables of one particle, we find(dΠi

dxi

)=

[δij −q ∂Ai

∂xj

0 δij

] (dpj

dxj

)(1.8)

1We have slightly upgraded the classical partition function by including the ~-dependent denominator.

This will not affect the argument presented here.

1.2 Matter waves and key concepts 8

The determinant of the matrix in this equation is easily verified to be the identity and

the argument generalizes to N particles. Hence

QN =1

N !

∫ ∏α

d3xα d3Πα

(2π~)3e−βH(Π,x) (1.9)

We see that Ai has disappeared from the integral. This shows that the statistical dy-

namics of a collection of charged particles does not depend on the magnetic field.

Thus magnetic phenomena such as diamagnetism, paramagnetism and ferromag-

netism cannot be explained by classical physics. (The argument extends to the grand

canonical partition since it is∑

N zNQN , z being the fugacity.) This observation about

the problem of magnetic phenomena in classical physics is originally due to Niels

Bohr, later independently discovered by Hendrika van Leeuwen.

There are a few more examples, such as the Compton effect, one could cite, but

what we have said so far suffices to show the need to revise classical physics.

1.2 Matter waves and key concepts

A key concept which is useful is the idea of matter waves. The need for a dual de-

scription of light, as particles and as waves, was clear from various phenomena, for

example, Young’s double-slit experiment and the photoelectric effect. De Broglie

suggested that matter, such as electrons, could be described as waves. A wave should Matter waves

have an amplitude, which, for a plane wave, we can write down as

ψ(x) ∼ A exp(−iωkt+ i~k · ~x

)(1.10)

For a photon, the wave number and momentum should be related by ~p = ~~k; de

Broglie suggested the same should hold for matter waves. Some general principles we

can extract from this idea are:

1. Waves can be linearly superposed, so the linear superposition principle should

be included in any quantum theory.

2. The wave is characterized by the wave vector ~k which is related to the momen-

tum; thus it is labeled by a set of values for observables.

3. We may regard momentum as given by the action of the differential operator

−i~∇ on ψ, as

−i~∇ψ(x) = ~~k ψ = ~pψ (1.11)

4. A plane wave does not have a well-defined position, being spread out over all

space. One needs a wave packet or a suitable superposition of plane waves to

give meaning to the position. Therefore, while the coordinate variable occurs

1.2 Matter waves and key concepts 9

in the plane wave in (1.11), we cannot use it as the “position" of the associated

particle. Thus the state of the particle in (1.11) is just labeled by the momentum.

The idea that particles like the electron can behave as waves seemed radical when

it was suggested, but could be checked experimentally. Recall that the key experi-

ment which led to the wave theory of light was Young’s double-slit experiment. The

interference which arises from the linear superposition of the amplitudes leads to a

pattern of bright and dark fringes. Diffraction of light is another phenomenon which is

a necessary consequence of its wave nature. If the electron is a wave, it should display

interference and diffraction. However, for the effect to be noticeable, we need slits

whose dimensions are close to the wave length. For electrons of wave lengths easily

available in the 1920s, this was best achieved by using a crystal as a diffraction grating.

The diffraction of electrons was experimentally verified by Davisson and Germer and

by G.P. Thomson.

Another question we might think about at this stage is: If particles behave as waves,

then how do we obtain the good results of classical theory, such as the motion of

planets, everyday mechanical phenomena, etc.? (Even at atomic scales, the particle

nature was very much evident, as in cloud chamber (and, much later, in the bubble

chamber) photographs of electrons and other particles.) For this question, an answer

exists already within classical wave theory. In the short wave length limit of classical

waves, we know that they propagate in straight lines. Ray optics is almost a mechanical

description, and it is a good simplification of the full wave theory for short wave lengths.

(In fact, historically, it was by borrowing ideas from the wave theory of light, simplified

to the ray optics limit, that Hamilton was able to formulate his version of mechanics.)

The key concepts from the physics discussed so far are the items enumerated above.

Based on these, we will now proceed to the mathematical framework for quantum

mechanics.

Historically, Heisenberg arrived at quantum mechanics by reasoning which did Very short

historical notenot include any concept of matter waves. He emphasized the need to stay with

observables. Thus, in the Rutherford-Bohr model of the atom, it is meaningless to talk

of the position of the electron; we never directly observe it. Absorption and emission of

radiation is classically due to the induced dipole moment via the interaction V = e ~E ·~x.

We can expect that the strength of transitions between different energy levels, which

lead to radiation according to Bohr, should thus go like ~x, but in the Rutherford-Bohr

model, a transition connects two states. The only things we can observe are the

intensity and frequency of the radiation. Thus the only meaningful thing about the

position we can say is that there is a variable Xnn′ which is related to the intensity

of the radiation for transitions between levels n and n′. (The frequency is accounted

for by Bohr’s formula ~ω = En − En′ .) Heisenberg argued, based on this kind of

1.2 Matter waves and key concepts 10

reasoning, that observables should be viewed as matrices. He was then able to use the

Thomas-Reiche-Kuhn sum rule (which was known from spectroscopic analysis and

Bohr’s correspondence principle and which we will derive later) to arrive at conditions

obeyed by the matrices for various observables. This was the beginning of quantum

mechanics. A little later, Schrödinger followed up on the de Broglie idea and came up

with his wave equation. The equivalence of the two approaches and the full generality

of description emerged from the transformation theory of Dirac and Jordan. We will

describe the general structure, no longer following the historical route.

11

2 Introduction to the mathematical framework

2.1 Linear vector space

We start with the familiar notion of a vector as a quantity with a magnitude and

direction. This is emphasized by the notation ~A for a vector. But more relevant for us

is that we can think of a vector in terms of its components, so that

~A = A1 e1 +A2 e2 +A3e3 (2.1)

where (e1, e2, e3) are unit vectors along the three orthogonal axes, the familiar x, y, z

axes, and Ai are real numbers. We can generalize this to higher dimensions, say N

dimensions, by writing a general vector as

~A =N∑i=1

Ai ei (2.2)

where ei are a set of unit vectors, one for each Cartesian axis. For different choices

of the coefficients (or components) Ai, we get different vectors. It is then possible

to think of a space of all the vectors, which means we include the possibility of all

possible values for Ai, within some well-defined set. This leads us to the notion of a

vector space, the space of all such vectors. One can also add vectors together, and one

can multiply a vector by a real number. In other words, we have

~A+ ~B =∑I

(Ai +Bi) ei, α ~A =∑i

αAi ei (2.3)

The multiplication αAi is the usual multiplication of two real numbers. It will be

useful to define a vector space via these properties and use the idea of a vector space

as a starting point for many things we do in this course. But before we do that, we

introduce a new notation for vectors, we use the Dirac notation where a vector ~A

is written as |A〉. (This is often referred to as the “ket A"; later we will introduce a

dual vector which we will refer to as a bra-vector (i.e., “bra A" denoted by 〈A|), so

that putting two such things together will yield a “braket" (or bracket) 〈A|A〉.) The

second thing we want to do is to consider complex vectors, not just real vectors, and

the possibility of multiplication by complex numbers. We can now collect these ideas

together to give a formal definition of a vector space as follows.

Definition 2.1 — Vector Space. A vector space over the field of complex numbers

is a set V containing elements, denoted by |x〉, |y〉, etc. (called vectors) with two

algebraic operations, addition of vectors and multiplication by complex numbers,

which obey the following rules.

|x〉+ |y〉 ∈ V

2.1 Linear vector space 12

(|x〉+ |y〉) + |z〉 = |x〉+ (|y〉+ |z〉) = |x〉+ |y〉+ |z〉

α |x〉 ∈ V, α ∈ C

α(|x〉+ |y〉) = α |x〉+ α |y〉 (2.4)

|x〉+ 0 = |x〉

(−1) |x〉 = − |x〉 , such that |x〉+ (− |x〉) = 0

Here 0 denotes a zero vector, which you may think of as having all components Ai = 0.

All the operations listed here are familiar from ordinary vector analysis. All operations

here are linear; sometimes this is emphasized by referring to V as a linear vector space.

On a vector space V, we can introduce a basis of vectors in terms of which we can

expand any vector which belongs to V. This is the generalization of the notion of the

unit vectors ei. In keeping with our new notation, we use |ei〉. Later, we will simply

use |i〉 for these when there is no cause for confusion. So, generally, we can write

|x〉 =∑i

xi |ei〉 (2.5)

We will be interested in vector spaces with an inner product. This is the generaliza- Inner Product

tion of the dot product or scalar product familiar from elementary vector analysis. We

can define a dual vector space to V by the set of linear functionals on V, namely, all

F ’s which map vectors in V to a complex number, i.e., F : V→ C, in a way which is

linear in the vector. Thus F acts on a vector and yields a complex number,

F (|x〉) = λ ∈ C, F (α |x〉+ β |y〉) = αF (|x〉) + β F (|y〉) (2.6)

The standard way to proceed from here is to use the Riesz representation theorem, but

for us, the simplest is to say that there is a conjugation operation, which gives for every

vector a conjugate vector. Let us denote the set of conjugate vectors by V∗. (Essentially,

we are identifying the dual vector space (which is the space of linear functionals on V)

with the space of conjugate vectors.) If |x〉 denotes a vector in V, then the conjugate

vector which is an element of V∗ will be denoted by the “bra"-vector 〈x|. We may

consider its expansion in a basis as

〈x| =∑i

〈ei| x∗i (2.7)

This also tells us that the conjugate of α |x〉+ β |y〉 is α∗ 〈x|+ β∗ 〈y|.We can now define a scalar product or an inner product which gives a complex

number, denoted by (x, y) for two vectors |x〉 and |y〉 in V; i.e., it is a map V ×V→ C

with the linearity properties

(αx+ β y, γ z + δ w) = α∗γ(x, z) + α∗δ(x,w) + β∗γ(y, z) + β∗δ(y, w) (2.8)

2.1 Linear vector space 13

Because of this, we can think of this as a pairing between vectors and conjugate vectors,

writing from now on

(x, y) = 〈x|y〉 (2.9)

What this means is that if we are given two vectors |x〉 and |y〉, which are elements

of V, we take the conjugate of the first one to obtain 〈x| and then obtain the inner

product as 〈x|y〉.When we choose a basis of vectors, we can choose them to be orthonormal under

this inner product. Thus we can choose a basis ei such that

〈ei|ej〉 = δij (2.10)

In terms of such a basis

〈x|y〉 =∑i,j

〈ei|ej〉 x∗i yj =∑i

x∗i yi

〈y|x〉 = 〈x|y〉∗ (2.11)

If we think of the components of a vector |x〉 as a column vector, then 〈x| can

be viewed as the complex conjugate row vector, i.e., the adjoint vector (transpose

conjugate),

|x〉 ∼

x1

.

.

.

xN

=⇒ 〈x| ∼ (x∗1, ..., x∗N ) =

x1

.

.

.

xN

=

x∗1.

.

.

x∗N

T

(2.12)

The inner product is then naturally the matrix product

〈x|y〉 = (x∗1, ..., x∗N )

y1

.

.

.

yN

=∑i

x∗i yi (2.13)

The square-root of the inner product of a vector with itself is called the norm,

(which is the generalized name for the length of a vector) and it is denoted by ‖x‖ for a Norm of avector

vector |x〉; i.e.,

‖x‖2 = 〈x|x〉 (2.14)

2.1 Linear vector space 14

When we talk of a basis of vectors ei, there must be sufficiently many of them, to Completeness

of a basiscover all the directions, so that any vector can be expanded in terms of them. With the

help of the norm, this property of “sufficiently many" can be expressed as follows. We

expand |x〉 =∑

j xj |ej〉 as in (2.5). Then taking the inner product with |ei〉, we get

〈ei|x〉 =∑j

xj 〈ei|ej〉 = xi (2.15)

Using this back in the expansion for |x〉, we find

|x〉 =∑i

xi |ej〉 =∑i

|ei〉 〈ei|x〉 (2.16)

This shows that the combination∑

i |ei〉 〈ei| acts as the identity on the vector space,∑i

|ei〉 〈ei| = 1 (2.17)

This is known as the completeness relation. Any set of vectors obeying this relation

can be used as a basis.

We now turn to operations on a vector which gives a vector, rather than a scalar.

One way to get a vector from another vector is multiplication by a scalar, generally

a complex number. This is a linear transformation on vectors in the sense that if we

denote α |x〉 = |x′〉 and α |y〉 = |y′〉, then

α(|x〉+ |y〉) = |x′〉+ |y′〉 (2.18)

But there are other transformations which map a vector to a vector and are linear.

For example, consider vectors |x〉 transformed to |x′〉 and |y〉 transformed to |y′〉.Expanding them in a basis,

|x〉 =∑i

xi |ei〉 , |x′〉 =∑i

x′i |ei〉

|y〉 =∑i

yi |ei〉 , |y′〉 =∑i

y′i |ei〉 (2.19)

Thus xi → x′i, yi → y′i. Linearity requires that the transform of xi + yi should be x′i + y′i.

For arbitrary xi, yi, this is possible only if

x′i =∑j

Mij xj , y′i =∑j

Mij yj (2.20)

It is easy to see that this is the most general linear transformation V → V. We can

think of the set of numbers (x1, · · · , xN ) as a column vector x and Mij as the (i, j)-th

element of a matrix M, so that

x′ = Mx, y′ = My (2.21)

2.1 Linear vector space 15

In other words, a matrix can be defined as a linear transformation on a vector space.

Another way to say this is that a matrix is a linear operator on a vector space, M : V→V, the result of its “operation" on a vector in V is another vector in V. It is therefore Linear

operators on a

vector spaceuseful to think of properties of matrices in conjunction with its action on a vector

space. When a vector |x〉 is acted upon by an operator M , the resulting vector (which

is |x′〉 in (2.19,2.20) ) will be written as |Mx〉. The relation between the operator M

and the matrix elements Mij can be better expressed now using the idea of the inner

product. Taking the transformed vector |x′〉 = |Mx〉, we write the components x′iusing the relation (2.15) as

x′i = 〈ei|x′〉 = 〈ei|Mx〉 =∑j

〈ei|Mej〉 xj

=∑j

Mijxj (2.22)

This shows that we may identify the matrix elements of M as

Mij = 〈ei|Mej〉 (2.23)

Consider now the conjugate vector corresponding to the transformed vector

|Mx〉 =∑

i,jMijxj |ei〉. The conjugate is obtained as

〈Mx| =∑i,j

M∗ijx∗j 〈ei| (2.24)

From this, upon taking the inner product with another vector |y〉, we have the relation

〈Mx|y〉 =∑i,j

M∗ijx∗jyi =

∑i,j

x∗iM∗jiyj =

∑i,j

x∗i (M†)ijyj = 〈x|M †y〉 (2.25)

where we have used the fact that the adjoint of a matrix M , denoted by M † is the com- Adjoint of anoperator

plex conjugate transpose, so that (M †)ij = M∗ji. We may take (2.25) as the definition of

the adjoint of the operator M .

We can extend the notion of a matrix element of an operator by using arbitrary

states rather than a chosen basis. In other words, we will define the x, y matrix element

of the operator M as 〈x|My〉.Notice that, under this definition, “operators" and matrices are somewhat syn-

onymous; for finite-dimensional vector spaces, they are essentially the same. For

infinite-dimensional vector spaces, which we will discuss shortly, an operator with

matrix elements defined as above, can be taken as the definition of the (infinite-

dimensional) matrix. We will therefore use the words “operator" and matrix somewhat

interchangeably, specifying particular matrix elements of the operator when needed.

2.2 Fourier series 16

So far we have considered the action of a single operator on the vector space; this

can be extended to multiple actions in straightforward way. This would correspond to

the product of operators. Thus if |Nx〉 =∑

iNijxj |ei〉, then the action of the product

MN is given by the action of N followed by the action of M ; i.e., M , N act on V in

sequence and we have

MN |x〉 = M |Nx〉 =∑

Mij(Njkxk |ei〉) =∑

(MN)ikxk |ei〉

(MN)ik =∑j

MijNjk (2.26)

The second equation is how we define matrix products. Thus, we see that the operator

product gets represented as the matrix product. The product of operators will be

associative, i.e., (LM)N = L(MN), but not necessarily commutative, i.e., MN 6= NM

in general, similar to the case of matrix products.

We will also need to consider functions of operators. These can be defined by Functions of anoperator

extension of the product of two operators. Thus, if M is an operator, M2, M3, ..,Mn

are naturally defined as multiple products. More complicated functions are defined

by a suitable power series expansion. Thus, for example, eM is defined by its standard

series expansion as

eM = 1 +M +M2

2!+ · · ·+ Mn

n!+ · · · (2.27)

We now consider the generalization to the case of infinite number of dimensions.

Most of what we have said will go through as N → ∞, but there can be issues of

completeness and convergence. It is best to start with an example.

2.2 Fourier series

Consider an interval of the real line [0, L] and functions which are at least piecewise

continuous on this interval, so that we can carry out integrations. We know that, under

certain condition, such a function, say f(x), can be expanded in a Fourier series as

f(x) = a0 +∞∑1

an un(x) +∞∑1

bn vn(x)

un(x) =

√2

Lcos(nπxL

), vn(x) =

√2

Lsin(nπxL

)(2.28)

where a0, an, bn are constants, which may be calculated in terms of the integrals of

f(x) with 1, un, vn. For simplicity of discussion, we will also restrict ourselves to the

subset of functions which vanish at x = 0 and at x = L. This means that we can set

a0, an = 0. The expansion (2.28) then becomes

f(x) =∞∑1

bn vn(x) (2.29)

2.2 Fourier series 17

Further, let us consider only those functions for which∫dx f∗f <∞ (2.30)

These are square-integrable functions; they are often referred to as L2-functions.2 The

mode functions vn obey the condition∫dx v∗n vm = δnm (2.31)

Notice the similarity between (2.29) and the expansion of a vector in (2.5), and also

between (2.31) and the orthonormality condition (2.10). We may thus consider vn(x)

as a particular realization of a vector |n〉 in an infinite-dimensional space, with the

inner product

〈n|m〉 =

∫dx v∗n(x) vm(x) = δnm (2.32)

f(x) may then be considered as a realization of a vector |f〉. The fact that this belongs

to a vector space with a well-defined inner product means that we need

〈f |f〉 =

∫dx f∗f <∞ (2.33)

We see that we have many of the ingredients of linear vector spaces with an inner

product, even in the case of the dimension (i.e., the number of mode functions vn)

being infinite. However, when we go to infinite number of dimensions, there can be

additional issues of convergence and completeness. For example, the condition of

finite norm in (2.33), via the use of the mode expansion, becomes

〈f |f〉 =∑n

b∗nbn <∞ (2.34)

Thus the infinite series defined by |bn|2 must be convergent. We can also have a

sequence of functions fn(x) all of which have a finite norm, and which tend to a

limiting function F (x),

limn→∞

fn(x) = F (x) (2.35)

It is a nontrivial question whether the limit function is itself square-integrable. This is

a question of completeness: Is the vector space complete in the sense that the limit

of every convergent sequence of square-integrable functions is in the vector space,

i.e. does the limit have a finite norm? If it does, we have what is called a Hilbert space.

So we will first give the definition of a Hilbert space before proving that the space of

square-integrable functions do form a Hilbert space.2The name is derived from the fact that one can define a more general norm for functions, of the

form ‖f‖ =(∫

|f |p)1/p

which are known in the mathematics literature as Lp-norms. For us p = 2 is the

relevant one; that case is equivalent to using an inner product. The name has nothing to do with the

length of the interval L which we use in this example.

2.3 Cauchy-Schwarz inequality 18

Definition 2.2 — Hilbert Space. A linear vector space with a positive definite inner

product (i.e., the norm of every vector is positive) and which is metrically complete

(in the sense that the limit of every convergent sequence of vectors is in the vector

space) is a Hilbert space.

It is useful to summarize the properties of the inner product on a Hilbert space. Hilbert spaceproperties

〈x+ y|z〉 = 〈x|z〉+ 〈y|z〉

〈x|αy〉 = α 〈x|y〉 , α ∈ C

〈αx|y〉 = α∗ 〈x|y〉

〈x|y〉 = 〈y|x〉∗ (2.36)

〈x|x〉 ≥ 0

〈x|x〉 = 0 =⇒ |x〉 = 0

〈x|x〉 = ‖x‖2

A Hilbert space is thus a linear vector space endowed with an inner product which

satisfies these properties, and, in addition, it is complete in the sense that the limit

of every Cauchy sequence is an element of the space. (The Cauchy sequences are

defined using the inner product as the metric.)

There are other ways to define a Hilbert space, e.g., in terms of a Banach space, but

we will stay with this definition for now. Under our definition, the finite-dimensional

vector spaces which we discussed earlier are Hilbert spaces. But the definition is suitable

even for the case of infinite-dimensional spaces.

2.3 Cauchy-Schwarz inequality

An extremely useful and important inequality on a Hilbert space is the Cauchy-

Schwarz inequality. This can be derived as follows. We choose a vector |f〉+ α |g〉 in

the Hilbert space, where |f〉 and |g〉 are nonzero vectors, and since the norm is positive,

we can write

〈f + αg|f + αg〉 ≥ 0 (2.37)

From the properties of the conjugate vector, this becomes

〈f |f〉+ α∗〈g|f〉+ α〈f |g〉+ α∗α〈g|g〉 ≥ 0 (2.38)

We choose α = −〈g|f〉/〈g|g〉 to get

〈f |g〉 〈g|f〉 ≤ 〈f |f〉 〈g|g〉 , (2.39)

2.4 L2-functions as a Hilbert Space 19

or upon taking the square root,

|〈f |g〉| ≤ ‖f‖ ‖g‖ (2.40)

This is the Cauchy-Schwarz inequality.

2.4 L2-functions as a Hilbert Space

We can now show that square-integrable functions form a Hilbert space. We can

do the proof in general, not necessarily restricted to the segment of the real line we

considered earlier. So we consider L2-functions on some manifold or some region

within a manifold. For these, we have the inner product

〈f |g〉 =

∫dµ f∗ g (2.41)

Here dµ is the volume measure for integration over the manifold or the region within

the manifold where the functions are defined. Most of the required properties to make

this into a Hilbert space can be seen, using this formula for the inner product, in a way

similar to what we did for the Fourier series. For example, the inner product (2.41) is

positive. It can be zero only if f is identically zero on the manifold. Secondly, we can

write, as in the derivation of the Cauchy-Schwarz inequality,∫dµ (β∗f∗ + α∗ g∗)(βf + α g) ≥ 0 (2.42)

Expanding out and using β = 1, α = −[∫dµ g∗f/

∫dµ g∗g

]we get the inequality,∫

dµ f∗f

∫dµ g∗g ≥

∫dµ f∗g

∫dµ g∗f (2.43)

We could also take α = 1, β = −[∫dµ f∗g/

∫dµ f∗f

]to obtain the same result. Notice

that we have not assumed anything to obtain this except for the positivity of (2.41)

and the fact that at least one of the functions does not identically vanish, so that at

least one of the two choices of α, β given above can be used.

For showing completeness of the square-integrable functions, we need to show

that the limit of every Cauchy sequence is an element of the space. So consider a

Cauchy sequence fn. Each fn is an element of the vector space and hence it has a

finite norm, 〈fn|fn〉 < ∞. Let F denote the limit of this sequence. We want to show

that 〈F |F 〉 <∞, so that F is in the space of L2-functions. For this, write∫f∗nfn =

∫(fn − F + F )∗(fn − F + F )

= ‖fn − F‖2 + ‖F‖2 +

∫(f∗n − F ∗)F +

∫F ∗(fn − F ) (2.44)

2.4 L2-functions as a Hilbert Space 20

The last two terms give the real part of 〈fn − F |F 〉. Since the magnitude of this is

bounded by the Cauchy-Schwarz inequality (2.43),

〈fn − F |F 〉+ 〈F |fn − F 〉 ≥ −2 ‖fn − F‖ ‖F‖ (2.45)

Thus

〈fn|fn〉 ≥ ‖fn − F‖2 + ‖F‖2 − 2 ‖fn − F‖ ‖F‖

≥ (‖fn − F‖ − ‖F‖)2 (2.46)

As n→∞, ‖fn−F‖ → 0 by the convergence of the Cauchy sequence. Thus (2.46) gives

‖F‖2 ≤ 〈fn|fn〉. Since ‖fn‖ is finite, this shows that ‖F‖ is in the space of L2-functions.

Therefore we can conclude that the space of L2-functions is complete and hence that

it is a Hilbert space. We can state this as a theorem.

Theorem 2.1 The space of L2-functions on a manifold is a Hilbert space.

We now define some special kind of operators or matrices.

Definition 2.3 — Hermitian and unitary operators/matrices. An operator (or matrix)

M is hermitian if for all |x〉 , |y〉 ∈ V, 〈x|My〉 = 〈Mx|y〉 = 〈x|M †y〉; i.e., if M = M †.

An operator U is unitary if it obeys U †U = UU † = 1.

These are important in Quantum Mechanics because observables will be represented

as hermitian operators, while unitary transformations are the allowed transformations Importance of

hermitian,unitaryoperators

or change of basis on the Hilbert space. The latter property follows from the fact that

if vectors |x〉, |y〉 are transformed by a unitary operator, the inner product does not

change, i.e.,

〈x′|y′〉 = 〈Ux|Uy〉 = 〈x|U †Uy〉 = 〈x|y〉 (2.47)

When we consider products of operators, the adjoint transformation reverses the

order of the operators in the product. Thus

(MN)† = N †M † (2.48)

This follows from the equalities,

〈MNx|y〉 = 〈Nx|M †y〉 = 〈x|N †M †y〉 (2.49)

Among other things, this implies that if M and N are hermitian operators, then MN is

not necessarily hermitian; the product is hermitian only if the operators commute.

An important result we will need is that hermitian operators and unitary operators

can be diagonalized by a unitary transformation.

2.5 Diagonalization of a hermitian matrix 21

2.5 Diagonalization of a hermitian matrix

Let M be an N ×N hermitian matrix, so that M † = M . A priori, we have no guarantee

that M has N independent eigenstates. But we can see that there will be at least one

eigenstate (and eigenvalue) as follows. Form the quantity

R[x] =(x,Mx)

(x, x)=x∗iMijxjx∗kxk

= n∗iMijnj (2.50)

where n∗i = xi/√x2. R[x] is real since M is hermitian. This can be verified by directly

taking the complex conjugate of R[x]. The complex vector (n1, n2, · · · , nN ) defines a

point on the (2N − 1)-dimensional sphere S2N−1. Since none of the matrix elements

Mij is infinite, R[x] is a bounded function on the compact space S2N−1. Thus, by

the extreme value theorem (of Bolzano and Weierstrass), it attains its maximum and

minimum value on the sphere. This tells us that there is a vector ~n1 for which R[x] is a

minimum. We denote this minimum value as λ1 ≡ (x1,Mx1)/(x1, x1). We conclude

that for small ξ,

R[x1 + ξ] ≥ R[x1] (2.51)

Working this out to first order in ξ, we get

M x1 =

[(x1,Mx1)

(x1, x1)

]x1 = λ1 x1 (2.52)

Thus x1 is an eigenstate of M with eigenvalue λ1. We have shown that at least one

eigenvector exists. We now define an orthonormal basis of vectors |i〉 by

|1〉 = (1, 0, 0, · · · , 0)

|2〉 = (0, 1, 0, · · · , 0)

· · ·

|N〉 = (0, 0, · · · , 0, 1) (2.53)

Also, let |βi〉, i = 2, 3, · · · , N form an orthonormal basis for vectors orthogonal to |x1〉.We now define the matrices

S = |x1〉〈1|+N∑2

|βi〉〈i|

S† = |1〉〈x1|+N∑2

|i〉〈βi| (2.54)

It is easy to check that S is unitary,

S† S =(|1〉〈x1|+

N∑2

|i〉〈βi|)(|x1〉〈1|+

N∑2

|βi〉〈i|)

2.5 Diagonalization of a hermitian matrix 22

= |1〉〈1|+∑i

|i〉〈i| = 1 (2.55)

A similarity transformation of M by S gives

S†M S =(|1〉〈x1|+

N∑2

|i〉〈βi|)M(|x1〉〈1|+

N∑2

|βi〉〈i|)

= |1〉λ1 〈1|+∑i

|i〉〈βi|M |x1〉〈1|+∑i

|1〉〈x1|M |βi〉 〈i|+∑|i〉〈βi|M |βk〉〈k|

(2.56)

where we used 〈x1|M |x1〉 = λ1. We also have

〈βi|M |x1〉 = λ1 〈βi|x1〉 = 0 (2.57)

Further, by hermiticity of M (and hence S†MS), this implies 〈x1|M |βi〉 = 0. Thus

S†M S = λ1 |1〉〈1|+M ′

M ′ =∑ik

|i〉 〈βi|M |βk〉 〈k| (2.58)

M ′ has no entries for the first row or first column, having possible nonzero matrix

elements only for the states |i〉, i = 2, 3, · · · , N . We can now repeat the argument given

above, starting with one eigenstate of M ′ viewed as an (N − 1)× (N − 1) matrix and

constructing a corresponding unitary matrix. Continuing in this way, we see that the

whole matrix M can be diagonalized by a unitary transformation. We can state this as

a theorem.

Theorem 2.2 A hermitian matrix M can be diagonalized by a similarity transforma-

tion S−1MS = Mdiag, where S is a unitary matrix S. The diagonal matrix Mdiag has

nonzero entries along the main diagonal which are the eigenvalues of M defined by

M un = λn un.

There are some other results regarding hermitian matrices which are useful and

which extend to hermitian operators as well. The first result is that we can choose dif-

ferent eigenvectors of a hermitian matrix to be orthogonal. Because of the hermiticity

property,

0 = 〈ui|Muj〉 − 〈M †ui|uj〉 = 〈ui|Muj〉 − 〈Mui|uj〉 (2.59)

With the eigenvalue equation Mui = λiui, Muj = λjuj , this gives

(λi − λj)〈ui|uj〉 = 0 (2.60)

First consider the case when λi 6= λj . In this case, this equation tells that 〈ui|uj〉 = 0.

The eigenvectors corresponding to different eigenvalues are necessarily orthogonal.

2.6 Diagonalization of a hermitian operator on a Hilbert space 23

Now consider the case where we may have λi = λj , i.e., there is a degeneracy. The

eigenvalue equations read

M |ui〉 = λi |ui〉, M |uj〉 = λi |uj〉 (2.61)

We can now define the linear combinations

|ui〉 = c1 |ui〉+ c2 |uj〉

|uj〉 = c3 |ui〉+ c4 |uj〉 (2.62)

Clearly, these are equally good eigenstates with

M |ui〉 = λi |ui〉, M |uj〉 = λi |uj〉 (2.63)

We can now choose the coefficients cα in the transformation (2.62) so that

〈ui|ui〉 = 〈uj |uj〉 = 1, 〈ui|uj〉 = 0 (2.64)

This proves the result.

It is also straightforward to see that the eigenvectors of a hermitian matrix form a

complete orthonormal basis. We can see this as follows by a reductio ad absurdum

argument. Let |f〉 be a vector in the Hilbert space which cannot be expanded in terms

of the eigenstates ui. So we can write

|f〉 =∑i

ci |ui〉 + |v〉 (2.65)

Here |v〉 is the part which cannot be expanded in terms of the eigenstates and, without

loss of generality, we can take it to be orthogonal to all the eigenstates, i.e., 〈ui|v〉 = 0.

We can now form 〈v|M v〉/〈v|v〉 and argue, as before, that there must be an eigenvector

among all possible v’s which minimizes this quantity. This contradicts the assumption

that ui are all the eigenvectors. The only possibility is to have |v〉 = 0, and hence,

any vector |f〉 can be expanded in terms of the eigenvectors of M . In other words, the

set of eigenvectors of a hermitian matrix form a complete basis.

We can summarize these results as another theorem.

Theorem 2.3 The eigenvectors of a hermitian matrix form a complete orthonormal

set.

2.6 Diagonalization of a hermitian operator on a Hilbert space

Consider a hermitian operator M on a Hilbert space. We form the quantity

R[u] =

∫u∗Mu∫u∗u

=〈u|M u〉〈u|u〉

(2.66)

2.6 Diagonalization of a hermitian operator on a Hilbert space 24

This is often referred to as the Rayleigh quotient. First of all, we notice that R[u] is real

if M is hermitian, since

(〈u|M u〉)∗ = 〈M u|u〉 = 〈u|M † u〉 = 〈u|M u〉 (2.67)

Further, if M is a bounded operator, we have −∞ < R[u] < ∞ for all u which are

elements of the Hilbert space. Let λ1 be the minimum value of R[u]. This occurs for

some function u1. We then considerR[u] for u = u1 + ε g, for some g ∈ V and ε is taken

to be infinitesimal. Since u1 gives the minimum value forR[u], we haveR[u1 +ε g] ≥ λ1.

Written out this gives

〈u1 + ε g|M (u1 + ε g)〉 ≥ 〈u1 + ε g|(u1 + ε g)〉

ε [〈g|Mu1〉+ 〈u1|M g〉+ ε〈g|M g〉] ≥ λ1ε [〈g|u1〉+ 〈u1|g〉+ ε〈g|g〉] (2.68)

Using the hermiticity of M , i.e., 〈u1|Mg〉 = 〈Mu1|g〉, we can write this equation as

ε [〈g|(M − λ1)u1〉+ 〈(M − λ1)u1|g〉+ ε〈g|(M − λ1) g〉] ≥ 0 (2.69)

For ε > 0, we get the quantity in square brackets to be≥ 0; for ε < 0, it should be≤ 0.

Thus this inequality gives

−|ε| 〈g|(M − λ1)g〉 ≤ 〈g|(M − λ1)u1〉+ 〈(M − λ1)u1|g〉 ≤ |ε| 〈g|(M − λ1)g〉 (2.70)

Taking the limit ε→ 0, this gives

Real part of 〈g|(M − λ1)u1〉 = 0 (2.71)

Further, since g is arbitrary, we can choose it to be (M − λ1)u1 to get

‖(M − λ1)u1‖2 = 0 (2.72)

This implies (M − λ1)u1 = 0. Thus we have at least one eigenvalue and one eigen-

function for M . We can now consider the set of all functions orthogonal to u1 and

repeat the argument to show that there is an eigenfunction for the next lowest value of

R[u]. Continuing in this way, we get a complete set of eigenfunctions which obey the

eigenvalue equation

M un = λn un (2.73)

If there is degeneracy, we get eigenspaces of dimension > 1 for that eigenvalue and

any orthonormal basis in this subspace can be chosen as the eigefunctions for that

eigenvalue. Diagonalization of the operator is then achieved by the unitary operator

U = |u1〉〈1|+ |u1〉〈2|+ · · · (2.74)

Thus we have obtained the result:

2.7 Hermitian operators with a lower bound and completeness 25

Theorem 2.4 A bounded operator on a Hilbert space can be diagonalized by a unitary

operator.

For unbounded operators, we use a trick. Suppose M is not necessarily bounded.

Then consider

K = (M − i)−1, K† = (M + i)−1 (2.75)

These operators exist becauseM± i cannot have a zero eigenvalue. This can be proved

by contradiction. Suppose u obeys (M − i)u = 0 Then

〈u|Mu〉 − i〈u|u〉 = 0 (2.76)

Since 〈u|Mu〉 is real, this gives 〈u|Mu〉 + i〈u|u〉 = 0. Thus we get 〈u|u〉 = 0 or u = 0.

This shows that (M − i) has no zero mode and hence K is well defined. Further, it is

easy to see that

K†K = (M2 + 1)−1 = KK† (2.77)

From K, K†, we can form the bounded hermitian operators

K +K† = 2M (M2 + 1)−1, −i(K −K†) = 2 (M2 + 1)−1 (2.78)

These can be diagonalized by the previous argument and then we can reconstruct M

by solving for it as

M = 12 (K +K†)(K†K)−1 (2.79)

2.7 Hermitian operators with a lower bound and completeness

Consider a hermitian operator H which has a lower bound, which we may take to be

zero. The eigenvalues can be ordered so that 0 ≤ λ1 < λ2 < · · ·λn < · · · . Consider the

Rayleigh quotient for functions which are orthogonal to the first n eigenfunctions, i.e.,

R[u] =〈u|H u〉〈u|u〉

〈u|ui〉 = 0, i = 1, 2, · · · , n (2.80)

Since we have ordered the eigenvalues and removed the subspace of eigenfunctions

u1, u2, · · ·un, the lowest possible value for R[u] should be the next possible eigenvalue

λn+1. Thus, we have the inequality

〈u|H u〉〈u|u〉

≥ λn+1 (2.81)

2.7 Hermitian operators with a lower bound and completeness 26

for the subspace of functions which are orthogonal to ui, i = 1, 2, · · · , n. The result

(2.81) is known as the Rayleigh-Ritz inequality. (Beyond the mathematical use, this will

be very practical when we do the variational estimates for eigenvalues.) For operators

like H with λn →∞ as n→∞, we can prove the completeness of eigenfunctions as

follows. Consider a function |f〉 and let

|w〉 = |f〉 −n∑1

ci |ui〉, ci = 〈ui|f〉 (2.82)

Since |w〉 is orthogonal to the first n eigenstates, we have

〈w|Hw〉 ≥ λn+1 〈w|w〉 (2.83)

Using (2.82), we can expand (2.83) as

〈w|Hw〉 = 〈f |Hf〉 −n∑1

λi cic∗i (2.84)

Since 〈w|Hw〉 ≥ 0, we know that 〈f |Hf〉 ≥∑n

1 λi cic∗i . Going back to (2.83), we can

write it as

〈w|w〉 ≤ 1

λn+1

(〈f |Hf〉 −

n∑1

λi cic∗i

)

≤ 〈f |Hf〉λn+1

(2.85)

As n→∞, λn+1 →∞. Further 〈f |Hf〉 has a fixed value, taken to be finite, for a given

|f〉. Thus as n→∞, 〈w|w〉 → 0, or

‖f −n∑1

ciui‖ → 0 (2.86)

This means that, in terms of convergence in the mean,

|f〉 =

∞∑1

ci |ui〉, ci = 〈ui|f〉 (2.87)

This demonstrates the completeness of the eigenfunctions.

As an example of this result, we can use this for an alternate proof of Fourier’s theo-

rem. Consider square-integrable functions f(x), g(x) on the interval [0, L] vanishing

at both end-points, and the operator

H = − d2

dx2(2.88)

We take the functions to be differentiable at least up to the second order, so that the

action of H on such functions is well-defined. For simplicity, we will consider real

2.8 Commuting operators/matrices 27

functions, though the generalization to complex functions is straightforward. The

inner product is as given in (2.33), i.e.,

〈f |g〉 =

∫ L

0dx f g (2.89)

Further, by a couple of partial integrations,

〈f |Hg〉 = 〈Hf |g〉+[g′f − f ′g

]L0

= 〈Hf |g〉 (2.90)

since f , g vanish at the end-points and f ′, g′ are finite. This result (2.90) shows that

H is a hermitian operator. It is also fairly trivial and straightforward to see that the

eigenstates are given by

vn =

√2

Lsin(nπxL

)(2.91)

with

H vn = (nπ/L)2 vn = λn vn (2.92)

We have a set of eigenvalues which start at (π/L)2 and tend to infinity as n→∞. All

conditions required for our proof of completeness are satisfied and so we conclude

that vn form a complete set. This is essentially Fourier’s theorem, for real functions

on [0, L] which vanish at the end-points. Arguments along similar lines can be used to

prove the theorem in its full generality, including the case of functions which do not

necessarily vanish at the end points of the chosen interval.

2.8 Commuting operators/matrices

We now show the following result:

Theorem 2.5 If A and B are two hermitian operators/matrices, they can be diago-

nalized in the same basis if and only if [A,B] ≡ AB −BA = 0.

The proof is as follows. If A and B are diagonal in some basis,

〈n| [A,B] |k〉 =∑m

〈n|A|m〉〈m|B|k〉 =∑m

(anδnm bmδmk − bnδnm amδmk)

= (anbn − bnan)δnk = 0 (2.93)

Conversely, let [A,B] = 0. Choose a basis to make A diagonal. Then the condition of

the vanishing commutator becomes

0 = 〈n| [A,B] |k〉 = (an − ak)〈n|B|k〉 (2.94)

2.9 Unitary operators 28

Thus the off-diagonal elements of B, 〈n|B|k〉 = 0 if an − ak 6= 0. 〈n|B|n〉 need not be

zero. If an = ak with |n〉 6= |k〉, A has degenerate eigenvalues. In this case, B need not

be diagonal in the subspace of degenerate eigenvectors of A. However, we can now

use a unitary matrix which is identity for all states except on the degenerate subspace

where we take it to be U . As an example, if the first two eigenvalues of A are equal,

then we choose the unitary matrix to be of the form,

U =

U11 U12 0 0 · · ·U21 U22 0 0 · · ·0 0 1 0 · · ·0 0 0 1 · · ·.. .. .. .. · · ·

(2.95)

On the subspace of degenerate eigenvalues, A is proportional to the identity, and the

action of U does not change its diagonal nature. We can thus use U to diagonalize B

on the subspace as well, by a suitable choice of U .

2.9 Unitary operators

A unitary operator can be diagonalized by a unitary transformation. This result is easily

seen from what we have done so far. Let U be a unitary operator. Then A = U + U †

and B = i(U − U †) are hermitian operators. Since U †U = UU † = 1, they commute

with each other,

AB = i(U2 − U †2), BA = i(U2 − U †2) (2.96)

We can therefore diagonalize them in the same basis, writing A = S†AdiagS, B =

S†BdiagS, so that U = 12S†(A − iB)diagS = S†UdiagS. Further from unitarity, the

diagonal elements must be of the form eiλn for real λn. Thus we may write

Udiag = eiΛ (2.97)

where Λ is diagonal and has real eigenvalues. Now consider writing U by an expansion

of this exponential,

U = S†[1 + iΛ +

i2

2!Λ2 + · · ·

]S

=

[1 + iS†ΛS +

i2

2!S†Λ2S + · · ·

]=

[1 + iS†ΛS +

i2

2!(S†ΛS)(S†ΛS) + · · ·

]= exp

(i S†ΛS

)= exp (iM) (2.98)

2.10 The Dirac δ-function 29

In the last but one line, we inserted a factor 1 = SS† between the two Λs and regrouped

the terms; a similar rearrangement is done for higher powers as well. We see that the

unitary operator can be thought of as the exponential of i times a hermitian operator

M = S†ΛS. We can collect these results as a theorem.

Theorem 2.6 A unitary operator can be diagonalized by a unitary trasnformation.

The diagonal elements of the unitary operator, i.e., its eigenvalues are of the form

eiλn , λn ∈ R. Further any unitary operator can be written as eiM where M is hermi-

tian.

It is also worth pointing out that by virtue of these relations, or directly from the

Cauchy-Schwarz inequality (2.39), every matrix element of a unitary operator obeys

|Uij | ≤ 1 (2.99)

Notice that we may also make the identification A = 2 cosM , B = −2 sinM .

2.10 The Dirac δ-function

For the harmonic oscillator, the eigenfunctions are

un(ξ) =1√

2n n!e−

12 ξ

2

Hn(ξ) (2.100)

where Hn are the Hermite polynomials. The completeness relation becomes

f(x) =

∫dy∑n

un(x)u∗n(y) f(y) (2.101)

These formulae will be obtained later in class. We define the one-dimensional Dirac

δ-function by

δ(x− y) =∑n

un(x)u∗n(y) =1

2n n!e−(x2+y2)/2Hn(x)Hn(y) (2.102)

The most important property of the δ-function is its sifting property (2.101),∫dy δ(x− y) f(y) = f(x) (2.103)

From this we also have∫dy δ(x− y) = 1 (2.104)

Notice that for (2.103) to hold for all nonsingular f(y), we need

δ(x− y) = 0 for all x 6= y (2.105)

2.10 The Dirac δ-function 30

These two conditions, namely (2.104) and (2.105), may be taken as an alternative

definition of the δ-function.

There are many ways of representing the δ-function, other than the one in (2.102).

For example, another one is obtained using the Gaussian function,

ρ(x, y, σ) =1√2π σ

exp

[−(x− y)2

2σ2

](2.106)

This Gaussian function is peaked around x = y, with a width given by σ. Consider

small values of σ and let f(y) be a function which can be expanded around x. Then

we can write∫dy ρ(x, y, σ) f(y) =

∫dy ρ(x, y, σ)

[f(x) + (y − x) f ′(x) + 1

2(y − x)2 f ′′(x) + · · ·]

= f(x) + 12σ

2 f ′′(x) +O(σ4) (2.107)

where we have used∫dy ρ(x, y, σ) = 1,

∫dy (x− y)2 ρ(x, y, σ) = σ2 (2.108)

Thus, as σ → 0, we get f(x) on the right hand side of (2.107). We may thus write a

representation of the δ-function as

δ(x− y) = limσ→0

1√2π σ

exp

[−(x− y)2

2σ2

](2.109)

From this definition, we also see that

δ(x− y) =

0 for x 6= y

∞ for x = y∫dy δ(x− y) = 1 (2.110)

Another, although more formal, definition is obtained from the Fourier transform

of ρ(x, y, σ),

ρ(x, y, σ) =

∫dp

2πeip(x−y) exp

(−1

2p2σ2)

(2.111)

From this, as σ → 0, we can write

δ(x− y) =

∫dp

2πeip(x−y) (2.112)

Yet another representation is obtained by evaluating the integral in (2.112) by using

convergence factors.∫dp

2πeip(x−y) =

[∫ ∞0

dp

2πeip[(x−y)+iε] +

∫ 0

−∞

dp

2πeip[(x−y)−iε]

]ε→0

2.10 The Dirac δ-function 31

=1

2πi

[1

(x− y)− iε− 1

(x− y) + iε

]ε→0

(2.113)

This leads to

δ(x− y) = limε→0

1

π

ε

(x− y)2 + ε2(2.114)

From these representations we can easily verify that δ(x) = δ(−x). Another useful

property is

δ(ax) =1

|a|δ(x) (2.115)

This can be checked as follows. For a > 0,∫dx δ(ax) =

∫d(ax)

aδ(ax) =

1

a=

1

a

∫dx δ(x) (2.116)

For a < 0, we use δ(ax) = δ(−|a|x) = δ(|a|x) and then the same argument as above.

Finally, if g(x) is a function with simple zeros at xi, δ[g(x)] = 0 except at xi. We can

write g(x) ≈ g′(xi) (x− xi) around each zero of g(x). Then using (2.115), we get

δ[g(x)] =∑i

δ(x− xi)|g′(xi)|

(2.117)

Although we use the common terminology of the Dirac “δ-function", it is not a

function in the strict mathematical sense of the term. It is defined as the limit of a

function, like the Gaussian ρ(x, y, σ) in (2.106), with the limit understood as being

taken in integrals of ordinary functions with ρ. It is what is called a distribution in

mathematical literature.

32

3 Merging the physics and mathematics

3.1 Postulates and interpretation

We are now in a position to put the physics and mathematics together to obtain the

basic formulation of quantum mechanics. It is simplest to summarize this in terms of

three postulates:

1. The states of a physical system are in one-to-one correspondence with rays in

a Hilbert space. In other words, there is a Hilbert space V which has all the

physical information of any given physical system.

2. Physical observables such as position, momentum, angular momentum, energy

correspond to linear hermitian operators on this vector space. In any measure-

ment, the observed values of an observable will be one of the eigenvalues of the

corresponding operator.

3. For a single particle, the key observables are the position and momentum, de-

noted by xi and pi, i = 1, 2, 3. They obey the commutation rules Heisenberg

algebraxi xj − xj xi = 0

pi pj − pj pi = 0

xi pj − pj xi = i~ δij (3.1)

For a many-particle system, say with N particles, there is an obvious generaliza-

tion given by

x(α)i x

(β)j − x

(β)j x

(α)i = 0

p(α)i p

(β)j − p

(β)j p

(α)i = 0

x(α)i p

(β)j − p

(β)j x

(α)i = i~ δijδαβ (3.2)

where α, β = 1, 2, · · · , N label the particles.

There is some explanation needed for these postulates. First of all, in postulate 1, we

can say that the states correspond to vectors in V, but there is an overall phase which

is not observable. Therefore, an overall phase can be removed, which is why we say

rays, rather than vectors. The very idea of this postulate, namely that we have a linear

vector space, is in the linear superposition which is a hallmark of the the wave nature

of particles.

In postulate 3, we use the hatted notation to emphasize the operator nature of

the quantity. Notice that the commutation rules say that the position operators for

different directions commute, so do the momentum operators. However, the position

and momentum operators for the same particle do not commute, the discrepancy

is proportional to ~. This is the essential ingredient of quantum mechanics. These

commutation rules are known as the Heisenberg algebra. They are written using

Cartesian components, the version in other components have to be obtained from

3.1 Postulates and interpretation 33

these via suitable change of variables. There is also an intrinsic way to write the

commutation rules in any coordinate system, but that involves more formalism; we

do not take this up here. Also, we are considering nonrelativistic particles; there are

generalizations to relativistic cases, which we will take up later. As for the raison d’être

for this postulate, we have already seen that p can be represented as−i~∂/∂x acting

on the amplitude of the matter wave.

If the states are identified as vectors in a Hilbert space, and observables are oper-

ators, that does not still give us something we can measure. We need real numbers

to characterize experimental results, no one can observe an operator. So we need an

interpretation of the mathematical quantities to relate the theory to experiment. For

this, let us first look at the Heisenberg algebra for single particles. Since different com-

ponents of xi commute, we can consider simultaneous eigenstates of this operator,

xi |x〉 = xi |x〉 (3.3)

Now consider a state of the system given by |α〉. The inner product 〈x|α〉 is a complex

number which depends on the state |α〉 and on xi. This is defined as the wave function

of the state |α〉, Wave function

〈x|α〉 ≡Wave function of state |α〉 as a function of x (3.4)

Since different components of momentum commute among themselves, we can also

consider momentum eigenstates via

pi |p〉 = pi |p〉 (3.5)

For the same state |α〉, we can write a wave function which is a function of the mo-

mentum variables, rather than position variables, as

〈p|α〉 ≡Wave function of state |α〉 as a function of p (3.6)

Notice that the state |α〉 is what is the fixed quantity here. There are many wave

functions possible which represent the same state.

We now introduce the interpretation of the wave function. The quantity d3x | 〈x|α〉 |2 Interpretationof wave

functionis the the probability to find the particle in the volume d3x around the point ~x if it is in

the state |α〉. This means that if the particle is in a state |α〉, it has no definite position,

unless the variable α is the same as x. If a position measurement is carried out, the

probability of obtaining ~x± d~x as the coordinates of its position is d3x | 〈x|α〉 |2. This

is the fundamental probabilistic nature of quantum mechanics. The wave function,

whose square gives the probability, is often referred to as the probability amplitude.

3.1 Postulates and interpretation 34

Consider now an operator A corresponding to some observable. Let |a〉 denote an

eigenstate of this operator, A|a〉 = a|a〉. Then, by the previous statement, | 〈a|α〉 |2 is

the probability to find the value a for the measurement of A if the particle is in state

|α〉. Since this is probabilistic, the mean value of A from a set of measurements is

〈A〉 =∑a

a | 〈a|α〉 |2 =∑a

〈α|a〉 a 〈a|α〉 = 〈α|

(∑a

A|a〉〈a|

)|α〉

= 〈α|A |α〉 (3.7)

We used the completeness relation for the states for all possible values of a, i.e.,∑a

|a〉〈a| = 1 (3.8)

Equation (3.7) shows that the diagonal matrix element of A in the state |α〉 gives the Expectationvalue

expected value for the observable. For this reason, we refer to a diagonal matrix

element 〈α|Aα〉 = 〈α|A |α〉 as the expectation value of A in the state |α〉. (We used

a summation over a, taking the values of a to be discrete; if they are continuous, a

similar argument goes through with integration replacing the summation in (3.7).)

What about the off-diagonal matrix elements like 〈α|Aβ〉, with |α〉 6= |β〉? This does

not contribute to an expectation value for the measurement of A. Generally speaking,

such matrix elements contribute to transitions among different states. In a similar

way, if we consider the expectation value of a product of two observables, say, AB, we

find

〈α|AB|α〉 =∑a,b

〈α|a〉 a 〈a|b〉 b 〈b|α〉 =∑b

〈α|Ab〉 b 〈b|α〉 (3.9)

where |b〉 denote eigenstates of B. We see that off-diagonal matrix elements are

important in measuring such products.

Now we ask the question: Suppose we have just one electron and we prepare it

in a state |α〉 (which is not an eigenstate of A) and measure the observable A. What

do we get? We do one measurement, so we should get one answer. This answer will

correspond to one of the eigenvalues of A, namely, one particular a. But to get a well-

defined value a as the answer, we should have an eigenstate |a〉. So what happened?

We say that the act of measuring A knocks the system into an eigenstate of A. This

is often referred to as the collapse of the wave function. If we have a large number of Collapse of the

wave functionindependent copies of the same system, one electron prepared in state |α〉, then for

the measurement on each electron we cannot say what the result will be. It will be

one of the allowed values for a. Each electron will collapse into an eigenstate of A

in basically random fashion, but, over a large number of measurements, the relative

frequency with which a particular value a occurs will be given by | 〈a|α〉 |2.

3.2 The role of unitary transformations and the Schrödinger equation 35

The collapse of the wave function is rather peculiar when we think of measure-

ments in classical physics. Consider a ball of matter which we set spinning around the

z-axis. If a measurement of angular momentum is carried out, we would find that it is

zero along the x and y directions, but is nonzero along the z direction. In quantum

mechanics, assume we have prepared an electron in a state with spin equal to 12~

along the z-axis. The operators for spin along the z-axis (Sz) and the x-axis (Sx) do not

commute, so this cannot be an eigenstate of the x-component of spin. (We will see

all this in more detail later.) If a measurement of Sx is carried out, we will find either

Sx = 12~ or Sx = −1

2~. The probability of each of these two possibilities will depend on

| 〈Sx = ~/2|Sz = ~/2〉 |2 and | 〈Sx = −~/2|Sz = ~/2〉 |2.

This property of measurements in quantum mechanics, despite the “weirdness"

compared to the familiar classical concepts, has been tested experimentally and does

seem to hold. We will analyze this question in a little more detail later, but for now, we

will take this as given and proceed.

3.2 The role of unitary transformations and the Schrödinger equation

Unitary transformations are very important in quantum mechanics because we have

argued that observables are matrix elements like 〈α|β〉. Even matrix elements of

operators are of this form, 〈α| A |β〉 = 〈α|γ〉, with |γ〉 = A |β〉. Unitary transforma-

tions preserve the inner product. Thus if we consider 〈α|β〉 and carry out a unitary

transformation of each state, defining |γ〉 = U |α〉, |δ〉 = U |β〉, we find

〈γ|δ〉 = 〈Uα|Uβ〉 = 〈α|U †Uβ〉 = 〈α|β〉 (3.10)

since U †U = 1. Because of this property, all transformations of interest in physics

(apart from something called time-reversal which we will consider later) are unitary

transformations on the Hilbert space of states. Thus translations in space, time,

rotations, Lorentz transformations or Galilean transformations, various symmetry

transformations, etc. are all obtained as unitary transformations. For matrix elements

of operators, we have the rule

〈γ|Aδ〉 = 〈α|U †AUβ〉 (3.11)

so that we may equivalently think of the unitary transformation being implemented

on the operators via A→ A′ = U †AU , rather than on the states.

We now find a way to represent the operators on the wave functions, taking the

single particle case as an example. Taking the inner product of (3.3) with 〈α|, we find

〈α|xi x〉 = xi 〈α|x〉. Taking the conjugate of this relation, we get

〈xi x|α〉 = xi 〈x|α〉 (3.12)

3.2 The role of unitary transformations and the Schrödinger equation 36

By the hermiticity of xi,

〈xi x|α〉 = 〈x|x†i x〉 = 〈x|xiα〉 (3.13)

We may thus write

〈x| xi |α〉 = xi 〈x|α〉 (3.14)

This shows that the action of xi on an x-dependent (or x-space) wave function 〈x|α〉may be regarded as simple multiplication by xi. The third of the commutation rules in

(3.1) then tells us that we can take

〈x| pi |α〉 = −i~ ∂

∂xi〈x|α〉 (3.15)

This is is also consistent with the second of the commutation rules (3.1). Thus we have

a representation of the operators xi and pi on wave functions given by (3.14) and (3.15).

Notice that the commutation rules are unchanged under unitary transformations in

the sense that

A B − B A = i C =⇒ A′ B′ − B′ A′ = i C ′ (3.16)

where A′ = U †AU , B′ = U †BU , etc. Thus if we have the representation given by

(3.14) and (3.15), then their unitary transforms also obey the same commutation rules.

However, it is not a really new representation because such transformations leave the

inner products unchanged, so we do not get different physical results. One may ask the

question: Are there truly different ways to represent the operators xi and pi which obey

the same commutation rules and which are not just unitary transformation of (3.14)

and (3.15)? If so, we can get different physical results depending on which version

we use in calculating various results. For a finite number of copies of the algebra

(3.2) (equivalently finite number of particles) on a space of trivial topology such as R3

for the xi, the answer is no. This is a theorem due to Stone and von Neumann. The Schrödinger

representationrepresentation (3.14), (3.15) is known as the Schrödinger representation.

We must now ask the question: What characterizes a physical system involving one

particle, beyond the obvious observables of position and momentum? For example, a

free particle and a particle bound in a potential (such as the electron in the Hydrogen

atom) both must have a position operator and a momentum operator. How do we

mathematically characterize that these are different systems? This is done by specify-

ing the Hamiltonian. Classically, the Hamiltonian is the generator of translations in Hamiltonian

time or time-evolution, as made clear by the canonical equations of motion,

dxidt

=∂H

∂pi,

dpidt

= −∂H∂xi

(3.17)

3.2 The role of unitary transformations and the Schrödinger equation 37

In the quantum theory as well, the Hamiltonian generates time-evolution. On the

Hilbert space of states, we know that all physical transformations must be unitary

transformations. Further we know that a unitary transformation can be written as eiM ,

where M is hermitian, see theorem in subsection 2.9. So consider the situation where

we start with a state |α, t〉 at time t. At time t+ ε, where ε is infinitesimal, |α〉 evolves to

a new state, which we may write as

|α, t+ ε〉 = U(ε) |α, t〉 = eiM |α, t〉 (3.18)

U should be identity for ε = 0; so we can takeM to be linear in ε. SinceM is hermitian,

we write M = −H ε, defining a hermitian operator H by this. H , so defined, is the

Hamiltonian. Thus we may write the time-evolution of a state more generally as

|α, t〉 = U(t− t0) |α, t0〉 = e−iH(t−t0) |α, t0〉 (3.19)

Another way to write this equation is to consider the expansion of both sides of (3.18) Schrödinger

equationin ε. This gives a differential version of time-evolution as

i~∂

∂t|α〉 = H |α〉 (3.20)

This is one version of the Schrödinger equation.

For any physical system, the Hamiltonian is a function of the basic observables

such as position and momentum. The formula for the Hamiltonian then specifies

the physical system. This formula is not difficult to write down, if we recall that the

Hamiltonian is also the energy function for the system. Thus for a free particle, we

only have the kinetic energy 12mv

2 = p2/2m. Quantum mechanically, a free particle is

thus described by the Hamiltonian

H =p2

2m(3.21)

If we consider a particle in the presence of a potential, we use the Hamiltonian

H =p2

2m+ V (x) (3.22)

where V (x) is the potential energy. (This is obtained by taking the classical potential

energy V (x) and replacing x by its operator version x.) Thus, for the electron in the

Hydrogen atom, we use the Hamiltonian

H =p2

2m− e2

r(3.23)

Here e is the electronic charge and r is the radial component of the position operator.

3.2 The role of unitary transformations and the Schrödinger equation 38

The simplest strategy in using these expressions to do calculations is to convert the

Schrödinger equation (3.20) into a differential equation on the wave functions, which

we can then solve by the standard techniques for differential equations. Thus for the

general Hamiltonian in (3.23), taking the inner product of (3.20) with |x〉, we get

i~∂ψ

∂t= 〈x|Hα〉 = 〈x|

(p2

2m+ V (x)

)|α〉 (3.24)

where ψ(x) = 〈x|α〉 is the wave function. Now using (3.15), we find

〈x| p2 |α〉 = −i~ ∂

∂xi〈x| pi |α〉 =

(−i~ ∂

∂xi

)(−i~ ∂

∂xi

)〈x|α〉

= −~2∇2ψ (3.25)

Also since |x〉 is an eigenstate of x, if the potential only involves coordinates, we can

write

〈x|V (x) |α〉 = V (x) 〈x|α〉 = V (x)ψ (3.26)

Combining these two equations, we can write the Schrödinger equation (3.24) as

i~∂ψ

∂t=

[− ~2

2m∇2 + V (x)

]ψ (3.27)

This is now a partial differential equation which can be solved to analyze the dynamics

for any physical system with one particle.

39

4 Particle in a box: One-dimensional case

The Hamiltonian operator is given by

H = − ~2

2m

∂2

∂x2(4.1)

The Schrödinger equation for the wave function becomes

i~∂ψ

∂t= H ψ = − ~2

2m

∂2ψ

∂x2(4.2)

The coefficients of the various terms in this equation are independent of time, so we

can do a separation of variables. We choose Ψ = f(x)h(t). This leads to

i~f(x)h′(t) = − ~2

2mf ′′(x)h(t) (4.3)

Dividing by fh, we get Separation of

variables

i~h′(t)

h= − ~2

2m

f ′′(x)

f(4.4)

The left side is a function of t only, while the right side is a function of x only. So to

obtain equality for all x, t, each side must be a constant. Denoting this constant to be

ω, we get h(t) = e−iωt. The Schrödinger equation then reduces to

∂2f

∂x2+

(2mω

~2

)f = 0 (4.5)

What we have done is equivalent to looking for eigenstates of the Hamiltonian by

taking ψ = f(x) e−iωt. The energy eigenvalues are then ~ω. The general solution to

this equation (4.5) is evidently

f(x) = A cos kx+B sin kx, k =

√2mω

~2(4.6)

As explained before, the requirements which lead to the boundary conditions are:

1. The particle is physically confined to a box of length L, so the flux of probability

across the boundary should be zero. This means that we should have

J =i~2m

[ψ∗∂ψ

∂x− ∂ψ∗

∂xψ

]= 0 at x = 0, L (4.7)

2. We need hermiticity for the momentum operator,∫ L

0dx ψ∗1

(−i~ ∂

∂x

)ψ2 =

∫ L

0dx

[(−i~ ∂

∂x

)ψ1

]∗ψ2 (4.8)

for arbitrary wave functions ψ1, ψ2. This leads to

ψ∗1 ψ2

]L0

= ψ∗1(L)ψ2(L)− ψ∗1(0)ψ2(0) = 0 (4.9)

40

The boundary condition compatible with these two requirements is to take

ψ(0) = ψ(L) = 0 (4.10)

We then see that the solution consistent with the boundary condition is to take A = 0

in (4.6). Further, the condition ψ(L) = 0 leads to the allowed values of k as

kn =nπ

L, n = 1, 2, · · · (4.11)

Negative values of n do not lead to an independent solution, and n = 0 has vanishing

wave function. It is not allowed since we need∫dxψ∗ψ = 1. The eigenvalues for

energy are thus

En = ~ωn =~2k2

n

2m=

~2

2m

n2π2

L2(4.12)

We see that the energy eigenvalues are quantized, labeled by an integer n. With the

normalization condition∫dxψ∗ψ = 1, the eigenfunctions are Eigenfunctions

for theHamiltonian

ψn =

√2

Lsin(nπxL

)e−iωnt = e−iωnt vn(x)

vn =

√2

Lsin(nπxL

)(4.13)

vn are eigenstates of the Hamiltonian,

H vn = En vn (4.14)

These states are also easily verified to be orthonormal as follows. Using sin θ sin θ′ =12 [cos(θ − θ′)− cos(θ + θ′)], we find∫

dx sin(nπx/L) sin(n′πx/L) =1

2

− L

(n+ n′)πsin((n+ n′)πx/L)

]L0

+L

(n− n′)πsin((n− n′)πx/L)

]L0

n 6= n′

= 0 (4.15)∫dx sin(nπx/L) sin(nπx/L) =

1

2

∫dx (1− cos(2πnx/L))

=L

2(4.16)

These two results together lead to∫dx v∗n vn′ = δnn′ (4.17)

41

Since the states are labeled by the integer n, the abstract eigenstates may be taken

as the kets |n〉; the wave function is the x-representation of such a ket, so we can write

vn(x) = 〈x|n〉 (4.18)

The normalization condition may be written as∫dx v∗n(x) vn′(x) =

∫〈n|x〉 dx 〈x|n′〉 = 〈n|n′〉 = δnn′ (4.19)

where we notice that we can remove the x-dependence, interpreting the integral as

the completeness condition for vectors labeled by x; i.e.,∫|x〉 dx 〈x| = 1 (4.20)

We can calculate matrix elements of any operator using its realization as a differen-

tial operator,

〈x|x n〉 = x 〈x|n〉 = x vn(x)

〈x|p n〉 = −i~ ∂∂x〈x|n〉 = −i~∂vn

∂x(4.21)

〈x|H n〉 = − ~2

2m

∂2

∂x2〈x|n〉 = − ~2

2m

∂2vn∂x2

We calculate the matrix elements at t = 0 for simplicity. Using the completeness

relation (4.20), we then find

Xnr = 〈n|x r〉 =

∫〈n|x〉 dx 〈x|x r〉

=

∫dxx 〈n|x〉 〈x|r〉 =

∫dxx v∗n(x) vr(x) (4.22)

Thus X can be viewed as an infinite-dimensional matrix. Carrying out the integration,

we find that the matrix elements for X are of the form

Xnn′ =L

π2

[1− (−1)n+n′

(n+ n′)2− 1− (−1)n+n′

(n− n′)2

]n 6= n′

Xnn =L

2(4.23)

Displayed as a square array, this looks like Matrixrepresentation

of positionoperator

X = L

12 − 16

9π2 0 .. · · ·− 16

9π212 − 48

25π2 0 · · ·0 − 48

25π212 .. · · ·

. . . . · · ·

(4.24)

42

We can work out the matrix representation of other operators in a similar way. For

example, from (4.14), we find that the Hamiltonian can be represented as the matrix

Hnn′ = ~ωn δnn′ (4.25)

The matrix elements of the momentum operator will be given by

Pnn′ = 〈n|pn′〉 =

∫dx 〈n|x〉

(−i~ ∂

∂x

)〈x|n′〉 =

∫dx vn(x)∗

(−i~ ∂

∂x

)vn′(x) (4.26)

We now ask: What happens to states under time-evolution? Each eigenstate will

evolve with a phase factor given by its own energy eigenvalue as e−iωnt. If we have a

linear combination of states, then the evolution is no longer a simple phase factor.

Thus, for example, consider a state Ψ(x, 0) at time t = 0 given by

Ψ(x, 0) =∑n

cn vn(x) (4.27)

Since this must obey the normalization condition∫dxΨ∗Ψ = 1, we must have∑

n

c∗ncn = 1 (4.28)

Each vn will have its own phase factor as it evolves in time. Thus at time t, the wave

function will be given by Time evolutionof states

Ψ(x, t) =∑

cn e−iωnt vn(x) (4.29)

This means that the probability density will have interference terms. Take for example

c1 = c2 = 1√2

and cn = 0 for all n ≥ 3. Then

Ψ(x, t) =1√2

[e−iω1t v1(x) + e−iω2tv2(x)

]|Ψ(x, t)|2 = Ψ∗(x, t)Ψ(x, t) =

1

2

[v∗1v1 + v∗2v2 + eiω1te−iω2tv∗1v2 + e−iω1teiω2tv1v

∗2

]=

1

2[v1v1 + v2v2 + 2 v1v2 cos ((ω2 − ω1)t)] (4.30)

The last line follows from the fact that, for this problem, vn are real, as seen from (4.13).

43

5 Linear harmonic oscillator

Any mechanical system will exhibit harmonic oscillations around stable equilibrium

points. The quantum version of such oscillations is what we call the linear harmonic

oscillator. This is an extremely important problem in physics since small oscillations

are ubiquitous; they can occur in molecules, in solids, in nuclei, and even for field

components in a field theory. In the simplest scenario, in one spatial dimension, we

can consider the expansion of the potential energy V (x) around a stable equilibrium

point, say x0, as

V (x) = V (x0) + (x− x0)V ′(x0) + 12(x− x0)2 V ′′(x0) + · · · (5.1)

Because x0 is an equilibrium point, V ′(x0) = 0 and since it is a stable equilibrium

point, V ′′(x0) > 0. So we introduce a frequency ω by V ′′(x0) = mω2. Further, we

choose the coordinate system such that the origin is at the equilibrium point; i.e., we

can set x0 = 0. The potential energy of the system can be approximated, up to an

additive constant V (x0) which is irrelevant for the analysis to follow, as

V (x) ≈ 12mω

2x2 + · · · (5.2)

The ellipsis denotes anharmonic terms which means that they are at least of the cubic

order in x. The kinetic energy is given as usual by p2/2m. The Hamiltonian operator

for the quantum mechanical analysis is thus

H =p2

2m+

1

2mω2x2 (5.3)

We are interested in finding the eigenstates of the Hamiltonian defined by

H |α〉 = Eα |α〉 (5.4)

There are two ways to proceed from here. One way is the operator method, which

means that we should try to solve this equation subject to the fact that the operators x,

p obey the commutation rule

x p− p x = i~ (5.5)

The second method is to convert the equation (5.4) to a differential equation for the

wave function defined by ψα(x) = 〈x|α〉 using

〈x|p2α〉 =

(−i~ ∂

∂x

)〈x|pα〉 =

(−i~ ∂

∂x

)(−i~ ∂

∂x

)〈x|α〉 = −~2 d

2

dx2ψα(x) (5.6)

Writing (5.4) as 〈x|Hα〉 = Eα 〈x|α〉 and using (5.6), we get the time-independent

Schrödinger equation

− ~2

2m

d2ψ

dx2+ 1

2mω2x2 ψ = E ψ (5.7)

5.1 The operator method 44

We can solve this as a differential equation to identify the energy eigenvalues Eα and

the corresponding wave functions. We will use both techniques, starting with the

operator method.

5.1 The operator method

This method is guided by the action-angle method in classical mechanics. We start by

defining operators a and a† by

a =

√mω

2~x+

i√2m~ω

p

a† =

√mω

2~x− i√

2m~ωp (5.8)

These operators are not hermitian, but, since x and p are hermitian, a† is the hermitian

conjugate of a. It is straightforward to rewrite the commutation rule (5.5) in terms of

these operators. We find

[a, a†] =[√mω

2~x+

i√2m~ω

p ,

√mω

2~x− i√

2m~ωp]

=mω

2~[x, x]− i

2~[x, p] +

i

2~[p, x] +

1

2m~ω[p, p]

= 1 (5.9)

To simplify the Hamiltonian, we first write x, p in terms of a, a†. From taking the sum

and difference of the two expressions in (5.8), we find

x =

√~

2mω(a+ a†), p = −i

√m~ω

2(a− a†) (5.10)

We can now use these expressions in (5.3) to obtain

H =p2

2m+

1

2mω2x2

=1

4~ω[−(a− a†)2 + (a+ a†)2

]=

~ω2

(aa† + a†a

)= ~ω

(a†a+ 1

2

)(5.11)

The a2 and (a†)2 terms cancel out; also in the last line we use aa† = 1 + a†a which

follows from the commutation rule (5.9).

The mathematical problem is now to find eigenstates of H as given in (5.11) where

the operators a and a† are subject to (5.9). For this, we first note that the operator a†a

5.1 The operator method 45

(and hence H) is positive. By this, we mean that for any state |α〉, 〈α| a†a |α〉 is positive.

Let |n〉 denote a complete set of states, so that we have∑n

|n〉 〈n| = 1 (5.12)

We can then write Positivity of a†a

〈α| a†a |α〉 =∑n

〈α| a† |n〉 〈n| a |α〉 =∑n

| 〈n| a |α〉 |2 ≥ 0 (5.13)

Here we use (〈n| a |α〉)∗ = 〈aα|n〉 = 〈α| a† |n〉. This result shows the positivity of a†a

and hence of H . For the ground state of the theory, we must find the state with

the minimum eigenvalue of the Hamiltonian. This is evidently given by a state |0〉for which all terms in the sum are zero; i.e., the ground state obeys the condition

〈n| a |0〉 = 0 for all 〈n|, or, equivalently, we may write

a |0〉 = 0 (5.14)

This gives us one state, the ground state |0〉. We construct other states by the applica-

tion of various operators on this state. The only operators we have are a and a†, other

operators are functions of these. Now the application of a on |0〉 gives zero. So the only

new state we can get is a† |0〉. we designate this state as |1〉. Again, from here we can

get a |1〉 or a† |1〉. The first choice takes us back to |0〉. This is seen from

a |1〉 = aa† |0〉 = (aa† − a†a) |0〉 = |0〉 (5.15)

Here in the second step we added a term a†a |0〉 = 0, which does not affect the equality,

but lets us use the commutation rule (5.9) to simplify it to obtain the third equality.

Since |0〉 is already in our list of states, the only new one at this stage is a† |1〉 = (a†)2 |0〉.We will denote this state as |2〉. Next we consider applying a and a† to this state. The

application of a does not generate a new state since

a (a†)2 |0〉 =([a, a†]a† + a†[a, a†] + (a†)2a

)|0〉 = 2 a† |0〉 = 2 |1〉 (5.16)

So at this stage, the new state we can find is a† |2〉 = (a†)3 |0〉. Proceeding in this way,

the states we can generate are of the form

|n〉 = Cn (a†)n |0〉 (5.17)

where Cn is a normalization factor which we will calculate shortly. But first we will

show that these states are eigenstates of a†a. For this, we start by simplifying a(a†)n.

a(a†)n = aa† (a†)n−1 = (1 + a†a) (a†)n−1 = (a†)n−1 + a† a (a†)n−1

5.1 The operator method 46

= (a†)n−1 + a†aa†(a†)n−2

= (a†)n−1 + a†(1 + a†a) (a†)n−2

= 2 (a†)n−1 + (a†)2a (a†)n−2 (5.18)

At each stage we take one a† from the expression and simplify aa† as 1 + a†a using the

commutation rule (5.9). Continuing, we get

a(a†)n = n(a†)n−1 + (a†)n a (5.19)

Multiply this equation by a† from the left and apply on |0〉 to get

a†a (a†)n |0〉 = n (a†)n |0〉 (5.20)

Again we used a |0〉 = 0. This equation shows that (a†)n |0〉 is an eigenstate of the

operator a†a with eigenvalue n. The latter is obviously an integer since we are taking

multiple powers of a†. And, since Cn in (5.17) is a constant, it does not affect the

eigenvalue condition, we get a†a |n〉 = n |n〉, for the properly normalized states |n〉 as

well. For the Hamiltonian we thus find Energyeigenvalues

H |n〉 = ~ω(n+ 12) |n〉 (5.21)

We have obtained the eigenstates of the Hamiltonian. The energy eigenvalues are

quantized and given by En = (n+ 12)~ω, where n is a positive integer.

We can now calculate the normalization factorCn. First we note thatH is hermitian

and so we know that its eigenstates for different eigenvalues are orthogonal. This is

the general theorem we proved earlier. So we have immediately

〈n|m〉 = 0 for n 6= m (5.22)

For evaluating 〈n|n〉, we give a name to the inner product of (a†)n |0〉 with itself, say,

f(n); i.e.,

f(n) ≡ 〈(a†)n0|(a†)n0〉 = 〈0| an(a†)n |0〉 (5.23)

Now we move one a to the right end using (5.19) to obtain

f(n) = 〈0| an−1a(a†)n |0〉 = n 〈0| an−1(a†)n−1 |0〉+ 〈0| an−1(a†)na |0〉

= n f(n− 1) (5.24)

This is the defining rule for the factorial function and so it is easy to see that f(n) =

n!. Since we need |Cn|2 〈0| an(a†)n |0〉 = 1, we find Cn = (1/√n!). The correctly

normalized states are thus

|n〉 =1√n!

(a†)n |0〉 (5.25)

5.1 The operator method 47

Taking account of (5.22), these states obey

〈n|m〉 = δnm (5.26)

We can also write down the action of a and a† on the states |n〉 easily. Step-up andstep-downoperators

a |n〉 = a1√n!

(a†)n |0〉 =n√n!

(a†)n−1 |0〉

=√n |n− 1〉

a† |n〉 = a†1√n!

(a†)n |0〉 =√n+ 1

1√(n+ 1)!

(a†)n+1 |0〉

=√n+ 1 |n+ 1〉 (5.27)

Notice that a takes us one step down in n, a† takes us one step up; for this reason, we

sometimes refer to a and a† as step-down and step-up operators, respectively.

Among other things, we can use (5.27) to write down the matrix version of the

operators x and p. For example, from (5.10), Matrixrepresentationof position

operatorXnm ≡ 〈n| x |m〉 =

√~

2mω〈n| (a+ a†) |m〉

=

√~

2mω

[√m 〈n|m− 1〉+

√m+ 1 〈n|m+ 1〉

]=

√~

2mω

[√n+ 1 δn+1,m +

√m+ 1 δn,m+1

](5.28)

where we used (5.27) and (5.26). This is an infinite dimensional matrix, but written

out in the more familiar array form, the first few entries look like

X =

√~

2mω

0√

1 0 0 · · ·√1 0

√2 0 · · ·

0√

2 0√

3 · · ·0 0

√3 0 · · ·

· · · · · · · · · · · · · · ·

(5.29)

There is one issue which still needs some clarification. We argued that the ground Existence ofground state

state is given by (5.14). However, how do we know that there exists a state which obeys

this condition? We can see this by using the coordinate representation, writing (5.14)

as

〈x|[√

2~x+

i√2m~ω

p

]|0〉 =

[√mω

2~x+

~√2m~ω

∂x

]〈x|0〉 = 0 (5.30)

This equation has the solution

〈x|0〉 = C exp

(−mωx

2

2~

)(5.31)

where C is a constant. This is clearly a normalizable function showing that a solution

to (5.14) exists.

5.2 The method of using the differential equation 48

5.2 The method of using the differential equation

We now turn to the second method where we solve the differential equation (5.7).

We simplify the coefficients of various terms in this equation by introducing a new

variable

ξ =

√mω

~x (5.32)

This gives

mω2x2 = ~ω ξ2,d2

dx2=mω

~d2

dξ2(5.33)

The Schrödinger equation now becomes

d2ψ

dξ2− ξ2 ψ +K ψ = 0, K =

2E

~ω(5.34)

The wave functions must be square-integrable, we need∫dxψ∗ψ <∞. This means

that ψ should vanish sufficiently fast as |x| → ∞. So we will try to simplify the equation

by first studying the asymptotic behavior ofψ. For large values of |x| or |ξ|, the equation

can be approximated by

d2ψ

dξ2− ξ2 ψ ≈ 0 (5.35)

The solution to this is of the form

ψ ∼ exp(−1

2ξ2)

(5.36)

This suggest that the wave function can be generally of the form

ψ = exp(−1

2ξ2)f(ξ) (5.37)

where we look for solutions for f(ξ) which do not grow exponentially to cancel out the

damping from the Gaussian factor, so that the finiteness of∫dxψ∗ψ is retained.

We can now convert the equation (5.34) to one on f(ξ). The relevant derivatives

are

dξ=

[−ξ f + f ′

]e−

12 ξ

2

d2ψ

dξ2=

[−f − ξf ′ + f ′′

]e−

12 ξ

2

− ξ[−ξ f + f ′

]e−

12 ξ

2

=[−f + ξ2 f − 2ξf ′ + f ′′

]e−

12 ξ

2

(5.38)

The differential equation (5.34) now becomes

f ′′ − 2ξ f ′ + 2α f = 0, 2α = K − 1 (5.39)

5.2 The method of using the differential equation 49

We look for a power series solution for f of the form

f(ξ) =∞∑m=0

am ξm (5.40)

Using this in (5.39) we find∑m

m(m− 1)am ξn−2 − 2

∑m

amξm + 2α

∑m

am ξm = 0 (5.41)

The first term starts with m = 2 since m = 0 and m = 1 give zero. So we can bring all

terms to the same power for ξ by writingm = k+2 in the first term with the summation

starting at k = 0 and m = k for the other two terms, again with summation starting at

k = 0. We then get∑k

[(k + 1)(k + 2)ak+2 + 2(α− k)ak

]ξk = 0 (5.42)

This equation can be satisfied if the coefficients obey the recursion rule

ak+2 = −2(α− k)

(k + 1)(k + 2)ak (5.43)

Thus starting from a0 we get all ak for even k from this equation, or starting from a1, we

get all ak for odd k. a0 and a1 are not determined by the equation. Thus there will be

two independent solutions with two undetermined constants. This is consistent with

the fact that a second order differential equation will have two independent solutions

and two undetermined free parameters; the latter may be viewed as constants of

integration.

First consider the solutions with a0 6= 0, a1 = 0. In this case, we get from (5.43),

a2 = −2α

2a0, a4 =

2(α− 2)

4× 3

2× 1a0 =

(−2)2 α(α− 2)

4!a0

a6 = −2(α− 4)

6× 5a4 = (−2)3 α(α− 2)(α− 4)

6!a0

a2p = (−2)pα(α− 2) · · · (α− 2p+ 2)

(2p)!a0 (5.44)

If α is an even positive integer, say 2r, the series will terminate at some point, when p

is such that 2r− 2p+ 2 = 0. Otherwise the series will not terminate. If α is not an even

integer,

a2p+2ξ2p+2

a2pξ2p= −2

(α− 2p)

(2p+ 2)(2p+ 1)ξ2 ≈ ξ2

p, for large p (5.45)

This means that the series will behave like∑ ξ2

p! ≈ eξ2 . This cannot give normalizable

solutions because eξ2e−

12 ξ

2

∼ e12 ξ

2

is not square-integrable. Normalizable solutions

exist only for values of α being an even integer, so that the series terminates.

5.2 The method of using the differential equation 50

Consider the other case now, a1 6= 0, a0 = 0. In this case

a2p+1 = (−2)p(α− 1)(α− 3) · · · (α− 2p+ 1)

(2p+ 1)!a1 (5.46)

This series can terminate if α is an odd positive integer, otherwise we have an infinite

series which will again behave as eξ2. So normalizable solutions of this type are

possible only for odd positive values of α.

Combining the two cases we see that we can have normalizable solutions if α is a

positive integer. Denoting this integer as n, we see from the definition of α in (5.39)

that the energy eigenvalues are quantized, Energyeigenvalues viadifferential

equationE = 12~ω(2α+ 1) = (n+ 1

2)~ω (5.47)

We have recovered the eigenvalues obtained by the operator method. As for the

eigenfunctions, we must calculate the series for each choice of n or α. Thus for n = 0,

we find from (5.43) that a2 = 0. The series terminates with a0, giving

ψ0 = a0 e−1

2 ξ2

(5.48)

The constant a0 is determined by the normalization condition∫dx ψ∗0ψ0 = 1. Since

dx =√

~/mω dξ and∫ ∞−∞

dξ e−ξ2

=√π (5.49)

we find a0 = (mω/π~)14 , or

〈x|0〉 = ψ0(x) =(mωπ~

) 14e−

12 ξ

2

(5.50)

Notice that this agrees with what we found for the ground state in (5.31). For α = 1,

we must choose the odd series; we find a3 = 0, a1 6= 0. This gives

ψ1 = a1 ξ e−1

2 ξ2

(5.51)

Although we can find higher states the same way, at this stage, it is easier to obtain

them from the operator solution. For this, notice that

〈x| a† |β〉 =

√mω

2~〈x| x |β〉 − i√

2m~ω〈x| p |β〉

=

[√mω

2~x− ~√

2m~ω∂

∂x

]〈x|β〉

=1√2

(ξ − ∂

∂ξ

)〈x|β〉 (5.52)

5.2 The method of using the differential equation 51

Successive applications of this leads to

〈x|n〉 =1√n!〈x| (a†)n |0〉 =

1√2n n!

(ξ − ∂

∂ξ

)n〈x|0〉

=(mωπ~

) 14 1√

2n n!

(ξ − ∂

∂ξ

)ne−

12 ξ

2

(5.53)

where we also used (5.50). We define Hermite polynomials Hn(ξ) by the equation(ξ − ∂

∂ξ

)ne−

12 ξ

2

= Hn(ξ) e−12 ξ

2

(5.54)

The eigenfunctions can then be written as Energyeigenfunctions

〈x|n〉 =(mωπ~

) 14 1√

2n n!Hn(ξ) e−

12 ξ

2

(5.55)

Since they came from correctly orthonormalized states, we also have(mωπ~

) 12 1√

2n n!

1√2mm!

∫ ∞−∞

dx Hn(ξ)Hm(ξ) e−ξ2

= δnm (5.56)

It is useful to evaluate a few of the Hermite polynomials. From (5.54), we get

H0(ξ) = 1

H1(ξ) = 2 ξ

H2(ξ) = 4ξ2 − 2

H3(ξ) = 8ξ3 − 4ξ (5.57)

Notice that Hn(ξ) are odd/even functions of ξ for odd/even values of n.

It is now possible to write down the probability to find the particle at some location

x. As an example, assume the system has been prepared in such a way that the particle

is in the second excited state, i.e., n = 2. Then the probability to find the particle in

the interval (x, x+ dx) is given by | 〈x|2〉 |2 dx or more explicitly

dx | 〈x|2〉 |2 = dx(mωπ~

) 12 1

22 × 2!(H2(ξ))2e−ξ

2

= dx(mωπ~

) 12 1

8(4ξ2 − 2)2e−ξ

2(5.58)

We should keep in mind that ξ =√mω/~x.

It is also useful to see how time-evolution of the states and the probablity would Time-evolutionof wavefunctionswork out. Each eigenfunction will pick up a phase factor exp(−iEnt/~) = exp(−i(n+

12)ωt). Consider a system prepared at time t = 0 in the state

ψ(x, 0) =∑n

cn 〈x|n〉 (5.59)

5.2 The method of using the differential equation 52

with 〈x|n〉being the energy eigenstates as in (5.55) and∑

n c∗ncn = 1. The wave function

at time t > 0 is given by

ψ(x, t) =∑n

cn 〈x|n〉 exp[−iω(n+ 1

2) t]

(5.60)

To illustrate this better, take, as an example, c1 = c2 = 1√2

and all other cn = 0. In this

case

ψ(x, t) =1√2

[〈x|1〉 exp(−i3

2ωt) + 〈x|2〉 exp(−i52ωt)

]=

1√2

exp(−i32ωt)

[1√2H1(ξ) +

1√8H2(ξ) exp(−iωt)

]e−

12 ξ

2

(5.61)

Upon taking the absolute square, we find for the probability

dx |ψ(x, t)|2 = dx1

2

[H1(ξ)2

2+H2(ξ)2

8+H1(ξ)H2(ξ)

2cos(ωt)

]e−ξ

2(5.62)

Notice that the probability density to find the particle in the interval (x, x+ dx), has

an oscillatory component because of the difference of energies for the two eigenstates

in ψ.

53

6 More about particles in one dimension

6.1 Free particle

The Hamiltonian for a free particle is given by

H =p2

2m(6.1)

The Schrödinger equation for this case is

i~∂ψ

∂t= − ~2

2m

∂2ψ

∂x2(6.2)

We look for eigenstates of the Hamiltonian with ψ(x, t) = exp(−iEt/~)φ(x). The

equation for φ is then

d2φ

dx2+ k2 φ = 0, k2 =

2mE

~2(6.3)

The solutions are obviously of the form

φ = Aeikx +B e−ikx (6.4)

where k denotes the positive square root k =√

2mE/~2 and A, B are arbitrary con-

stants. (A second order differential equation must have two constants of integration in

the general solution; these are A, B.) The full solution to the Schrödinger equation is

ψ(x, t) = Ae−iωt+ikx +B e−iωt−ikx (6.5)

where ω = E/~. Since A, B can be freely chosen at this point, consider first the case

A 6= 0, B = 0. The momentum operator acting on Ae−iωt+ikx gives

p[Ae−iωt+ikx

]= −i~ ∂

∂x

[Ae−iωt+ikx

]= ~k

[Ae−iωt+ikx

](6.6)

This shows that a wave function of the form Ae−iωt+ikx is an eigenstate of momentum

with eigenvalue ~k. Thus Ae−iωt+ikx describes a particle of energy E and momentum

~k. This is a free particle moving to the right or in the positive x-direction. Similarly, if

we consider B e−iωt−ikx, we find

p[B e−iωt−ikx

]= −~k

[B e−iωt−ikx

](6.7)

showing that this type of wave function describes a particle of momentum−~k (i.e.,

moving to the left or in the negative x-direction) with energy E.

The wave functions are of the familiar form of waves in classical wave theory. This

agrees with the notion of particles as waves. In fact, recall that, for classical waves, the de Broglierelationrecovered

6.2 Piecewise constant potentials in one dimension 54

wave vector k = 2π/λ where λ is the wave length. The statement that momentum is

~k becomes

p = ~k =2π~λ

(6.8)

which is the de Broglie matter wave relation. But it should be kept in mind that, in

quantum mechanics, ψ(x, t) is the probability amplitude. In other words, dx |ψ|2 gives

the probability to observe the particle in the interval (x, x+ dx). Thus, in the quantum

theory, despite the similarity to waves, the interpretation is very different from classical

wave theory.

6.2 Piecewise constant potentials in one dimension

When we have piecewise constant potentials, i.e., potentials which are constant over

some interval in R, we can use the free particle solutions with suitable matching

conditions to obtain the full solution. Among problems which can be solved in this

way, scattering by potential barriers constitute a physically important set. As the

prototypical case of this, we consider scattering by a potential which is of the form

V (x) =

0 −∞ < x < 0

V0 0 < x < a

0 x > a

(6.9)

The profile of this potential is as shown in the figure. Clearly there are three regions, I,

II and III, as shown. The solution to the Schrödinger equation in each region is quite

simple. We have to then match the solutions at the interfaces between the regions, i.e.,

at x = 0 and at x = a, to obtain the solution valid over the full real line.

We will first obtain the matching conditions. These follow from hermiticity require-

ments. Recall that observables in quantum mechanics must be hermitian operators. Deriving

matchingconditionsThus we need hermiticity for the momentum operator and the Hamiltonian. (We

V(x)

x→0 a

I II III

6.2 Piecewise constant potentials in one dimension 55

have the same requirement for other observables as well. But for this problem, the

observables of interest are the position, momentum and Hamiltonian. Others can be

obtained in terms of these. Since we are using the wave functions which are functions

of x, position is automatically hermitian. So we will require hermiticity for the other

two.) The wave function for our problem can be written as

ψ =

ψI in I

ψII in II

ψIII in III

(6.10)

The hermiticity condition for the momentum operator is∫I+II+III

dx ψ(1)∗(−i~ ∂

∂xψ(2)

)=

∫I+II+III

dx

(−i~ ∂

∂xψ(1)

)∗ψ(2)

=

∫I+II+III

dx i~∂ψ(1)∗

∂xψ(2) (6.11)

Here ψ(1) and ψ(2) are two wave functions, each of which has the form in (6.10) and the

integration is over the three regions with appropriate functions used in each region.

Thus, removing the overall i~ factor, the condition (6.11) becomes

−∫

Idxψ

(1)∗I

∂ψ(2)I

∂x−∫

IIdxψ

(1)∗II

∂ψ(2)II

∂x−∫

IIIdxψ

(1)∗III

∂ψ(2)III

∂x

=

∫Idx

∂ψ(1)∗I

∂xψ

(2)I +

∫IIdx

∂ψ(1)∗II

∂xψ

(2)II +

∫IIIdx

∂ψ(1)∗III

∂xψ

(2)III (6.12)

We can convert the expression on the left hand side to the one on the right hand side

by an integration by parts. For example,

−∫

Idxψ

(1)∗I

∂ψ(2)I

∂x=

∫Idx

∂ψ(1)∗I

∂xψ

(2)I −

(1)∗I ψ

(2)I

]0

−∞(6.13)

Doing this for the other two regions, the requirement (6.12) can be written as[ψ

(1)∗I ψ

(2)I

]0

−∞+[ψ

(1)∗II ψ

(2)II

]a0

+[ψ

(1)∗III ψ

(2)III

]∞a

= 0 (6.14)

Taking the wave functions to vanish at ±∞ (or some similar condition), we get the

matching conditions

ψ(1)∗I ψ

(2)I = ψ

(1)∗II ψ

(2)II at x = 0

ψ(1)∗II ψ

(2)II = ψ

(1)∗III ψ

(2)III at x = a (6.15)

For arbitrary ψ(1), ψ(2), these can be satisfied if we have continuity of wave functions

at each interface, i.e.,

ψI(0) = ψII(0), ψII(a) = ψIII(a) (6.16)

6.2 Piecewise constant potentials in one dimension 56

For the Hamiltonian, after removing irrelevant constants, hermiticity is equivalent

to the condition∫dx ψ(1)∗d

2ψ(2)

dx2=

∫dx

d2ψ(1)∗

dx2ψ(2) (6.17)

In this case, the boundary term left over from integration by parts has a derivative on

one of the wave functions and so the condition, with the help of the already-obtained

relations (6.16), reduces to

ψ′I(0) = ψ′II(0), ψ′II(a) = ψ′III(a) (6.18)

where ψ′ = dψdx . Combining (6.16) and (6.18), we can express the matching conditions

as the following:

Proposition 1 Across any interface, the wave function and the normal component of

its first derivative must be continuous. Matching

conditionssummarizedWe have worked in one dimension, so the qualification about normal component

is not important; but the result applies to higher dimensions, where it is the normal

component which will emerge from the integration by parts.

Returning to the problem of the potential in (6.9), we consider particles of energy

E. In region I, there is no potential, so the Schrödinger equation becomes

− ~2

2m

d2ψI

dx2= E ψI (6.19)

The solution is given by

ψI = Aeikx +B e−ikx, k =

√2mE

~2(6.20)

In region II, the cases of E > V0, E = V0 and E < V0 must be treated separately, since

the solutions have different behavior for these cases. We will consider particles of

energyE < V0 as this highlights certain aspects of quantum mechanics which are very

different from classical physics. In this case, classically there is no solution in region

II. A particle approaching the point x = 0 from the left with E < V0 will be reflected.

However, the Schrödinger equation in this region is

d2ψII

dx2=

2m(V0 − E)

~2ψII (6.21)

and this does have the solutions

ψII = C eqx +De−qx, q =

√2m(V0 − E)

~2(6.22)

6.2 Piecewise constant potentials in one dimension 57

(Notice that the exponents are real.) In region III we have a situation similar to region

I and the solution is

ψIII = Geik(x−a) +H e−ik(x−a), k =

√2mE

~2(6.23)

We have written the coefficients as Ge−ika and Heika; since G, H are not yet deter-

mined, this can be done. It simplifies some equations later.

The matching conditions at x = 0 are

C +D = A+B, q(C −D) = ik(A−B) (6.24)

We can solve for C, D and write this as(C

D

)=

1

2q

(q + ik q − ikq − ik q + ik

) (A

B

)≡M

(A

B

)(6.25)

The matching conditions at x = a become

C eqa +De−qa = G+H, q(C eqa −De−qa) = ik(G−H) (6.26)

Again, we can solve this for C, D as(C

D

)=

1

2q

((q + ik)e−qa (q − ik)e−qa

(q − ik)eqa (q + ik)eqa

) (G

H

)≡ N

(G

H

)(6.27)

From (6.25) and (6.27), we can solve for A, B in terms of G, H as(A

B

)= M−1N

(G

H

)≡M

(G

H

)(6.28)

The matrix elements ofM can be obtained by multiplying out M−1 and N. The matrix

M−1 is given by

M−1 =1

2ik

(q + ik −(q − ik)

−(q − ik) q + ik

)(6.29)

Along with (6.27) this gives

M =1

4ikq

(q + ik −(q − ik)

−(q − ik) q + ik

) ((q + ik)e−qa (q − ik)e−qa

(q − ik)eqa (q + ik)eqa

)(6.30)

We can now specialize to a scattering process of interest. We consider an incident

stream of particles coming in from the left side, i.e., moving in the positive x-direction.

This is described by the Aeikx part of the wave function. The flux of probability for a

wave function ψ is given by Probability flux

J = − i~2m

(ψ∗∂ψ

∂x− ∂ψ∗

∂xψ

)(6.31)

6.2 Piecewise constant potentials in one dimension 58

(This formula will be derived later.) Using ψ = Aeikx, we get the incident flux as

Jinc =~km|A|2 (6.32)

When this flux of particles gets to the potential barrier at x = 0, some part of it will

be reflected back into region I and some will get through to region III even though

the latter process is classically forbidden. The reflected part is given by Be−ikx since it

represents particles moving to the left. The reflected flux is thus

Jrefl = − i~2m

(B∗eikx

∂Be−ikx

∂x− ∂B∗eikx

∂xBe−ikx

)= −~k

m|B|2 (6.33)

The part which gets through the barrier will describe particles moving to the right in

region III. This is described by Geik(x−a). The term He−ik(x−a) represents particles

moving to the left; since there are no such particles to begin with, we can set this to

zero, as an initial choice. In other words, we only consider particles incident from the

left, which can only lead to reflected particles in region I and transmitted particles in

region III which continue to move to the right. The transmitted flux is given by (6.31)

with ψ = Geik(x−a). Thus

Jtrans =~km|G|2 (6.34)

The transmission and reflection coefficients are defined by Transmissionand reflection

coefficients

T =Jtrans

Jinc=|G|2

|A|2, R =

−Jrefl

Jinc=|B|2

|A|2(6.35)

In our case, with H = 0, we find from (6.28) that A =M11G, B =M21G, so that

T =1

|M11|2, R =

|M21|2

|M11|2(6.36)

From (6.30), we then obtain

M11 =1

4ikq

[(q + ik)2e−qa − (q − ik)2eqa

]= cosh(qa) + i

q2 − k2

2kqsinh(qa)

|M11|2 = cosh2(qa) +(q2 − k2)2

4k2q2sinh2(qa)

= 1 +V 2

0

4(V0 − E)Esinh2(qa) (6.37)

In this simplification, we have used eqa = cosh(qa) + sinh(qa) and cosh2(qa) = 1 +

sinh2(qa), and also substituted for q2 and k2 in terms of their definitions in (6.20) and

(6.22). We also find, from (6.30),

M21 =1

4ikq(q2 + k2)(eqa − e−qa) = −i q

2 + k2

2kqsinh(qa) (6.38)

6.2 Piecewise constant potentials in one dimension 59

The transmission and reflection coefficients are thus T ,R for

rectangularpotentialbarrierT =

1

1 +V 20

4(V0−E)E sinh2(qa)

R =

V 20

4(V0−E)E sinh2(qa)

1 +V 20

4(V0−E)E sinh2(qa)(6.39)

Notice that T +R = 1. This is as it should be, since what is not transmitted should be

reflected.

We can now discuss some of the physics related to this. We are considering states

which are eigenstates of energy, so they are not eigenstates of position. As a result,

the wave functions are spread out in position space. We can only assign a probability

distribution for position. In particular, in the classically forbidden region, the wave

function is not zero. It dies exponentially into this region, but still has a nonzero value

at the end of region II. This can be detected as the transmitted component. If the

barrier is very high, the exponential fall-off into region II should be very rapid, and we

should get no transmission. This is consistent with what we find. Notice that if V0 →∞,

T → 0 andR→ 1, since sinh2(qa)→∞. Thus an infinitely high barrier will completely

reflect anything incident on it. Secondly, the classical limit of quantum mechanics

emerges when ~ is negligible compared to the parameters such as the action for the

system. In the formula for T and R, ~ occurs only in qa = a√

2m(V0 − E)/~2. Thus

if ~ a√

2m(V0 − E), qa becomes very large and T → 0. So the classical result is

obtained. We see that transmission is significant when a√

2m(V0 − E) is comparable

to ~; it is really a quantum phenomenon. The quantum transmission through a barrier

where the classical motion is forbidden is known as quantum tunneling.

If qa is large but not infinite, we can approximate the transmission coefficient by

T ≈ 4(V0 − E)E

V 20

exp(−2qa) =4(V0 − E)E

V 20

exp(−2

a

~√

2m(V0 − E))

=4(V0 − E)E

V 20

exp

(−2

~

∫ a

0dx√

2m(V0 − E)

)(6.40)

The integral is trivial since the integrand is a constant; we write it this way because, for

more general potentials, the generalization is General

tunnelingformula

T ≈ Γ exp

(−2

~

∫ b

adx√

2m(V (x)− E)

)(6.41)

where Γ is a prefactor which can often be approximated in terms of the derivatives of

the classical action and the integration is between the classical turning points defined

by V (a) = E and V (b) = E, on the two sides of the barrier.

6.2 Piecewise constant potentials in one dimension 60

Finally, tunneling plays an important role in many physical situations. The first Physical

examples oftunnelingexample which was studied was the emission of α-particles in radioactivity. The

α-particles can be viewed as tunneling out of the nucleus. This leads to a specific

relation between the emission rate and the velocity of the emitted particle which

can be checked experimentally. There are many other examples. First order phase

transitions, the propagation of electrons in photosynthesis, the existence of so-called

θ-vacua in the standard model of particle physics are some important cases. An

even more exotic example is the Hawking radiation from black holes. While nothing

can escape from a black hole classically, quantum mechanically, they radiate and

evaporate off. This can be understood as a quantum tunneling process.

61

7 The uncertainty principle, classical physics, probability

7.1 Uncertainty principle

Consider a physical system in a state |α〉 given by the wave function ψ(x) = 〈x|α〉. The

expectation value of an operator A is then given by

〈A〉 =

∫dx ψ∗ Aψ (7.1)

The action of the operator A on ψα may be as a differential operator or it could involve

simple multiplication by an expression involving x or come combination. As an

example consider 〈x〉. This is given by

〈x〉 =

∫dx ψ∗ xψ =

∫dx ρ(x)x (7.2)

where ρ(x) = ψ∗ψ. This is a positive quantity and we know that dxρ gives the prob-

ability to find the particle in the interval (x, x + dx). So we may think of 〈x〉 as the

mean or average, in the sense of probability theory, of x. The probability distribution

may have a central value with a certain width around it. An estimate of the width of

the distribution is given by the mean square deviation of x. This is the average of the

square of x− 〈x〉, the deviation of x from its average value 〈x〉. We designate this as

∆x2; thus

∆x2 =

∫dx ρ [x− 〈x〉]2 (7.3)

We can define the expectation value and the mean square deviation for the mo-

mentum operator in a similar way as

〈p〉 =

∫dx ψ∗ p ψ

∆p2 =

∫dx ψ∗[p− 〈p〉]2ψ (7.4)

There are constraints on the values of these mean square deviations in quantum

mechanics because of the noncommuting nature of operators. To see how this works,

consider two hermitian operators A and B with the commutation rule

AB − BA = i C (7.5)

where C is some other hermitian operator. Now define

|a〉 = A |α〉 , |b〉 = B |α〉 (7.6)

Earlier we proved the Cauchy-Schwarz inequality which gives

〈a|a〉 〈b|b〉 ≥ | 〈a|b〉 |2 (7.7)

7.1 Uncertainty principle 62

Using (7.6), this simplifies to

〈α| A2 |α〉 〈α| B2 |α〉 ≥ | 〈α| AB |α〉 |2 (7.8)

Now we write

AB = 12(AB + BA) + 1

2(AB − BA)

=1

2(AB + BA) +

i

2C (7.9)

Even though AB is not hermitian, the symmetrized form (AB + BA) is hermitian, so

its expectation value will be a real number. Likewise, C is hermitian so its expectation

value is real. Thus

| 〈α| AB |α〉 |2 =∣∣∣ [1

2 〈α| (AB + BA) |α〉+i

2〈α| C |α〉

] ∣∣∣2=

1

4

(〈α| (AB + BA) |α〉

)2+

1

4

(〈α| C |α〉

)2

≥ 1

4

(〈α| C |α〉

)2(7.10)

Going back to (7.8), we can then write Generaluncertainty

relation

〈α| A2 |α〉 〈α| B2 |α〉 ≥ 1

4

(〈α| C |α〉

)2(7.11)

We now take

A = x− 〈x〉, B = p− 〈p〉 (7.12)

In this case,

〈α| A2 |α〉 = ∆x2, 〈α| B2 |α〉 = ∆p2 (7.13)

Since 〈x〉 and 〈p〉 are ordinary numbers, they commute with x, p and with each other;

i.e.,

(x− 〈x〉)(p− 〈p〉)− (p− 〈p〉)(x− 〈x〉) = xp− px = i~ (7.14)

Thus in this case, C = ~. Thus (7.11) becomes

∆x2 ∆p2 ≥ ~2

4(7.15)

By taking positive square roots, this becomes Uncertainty

principle forposition andmomentum∆x∆p ≥ ~

2(7.16)

7.2 Recovering classical physics 63

where ∆x =√

∆x2, ∆p =√

∆p2. The inequality (7.16) is the basic uncertainty prin-

ciple for single particle quantum mechanics. The averages apply to a number of

measurements on identically prepared systems. Thus ∆x is the root mean square

deviation or uncertainty in the average of position measurements; this is due to the

intrinsic probabilistic nature of the theory. It is not referring to imperfections of in-

strumentation. The inequality (7.16) shows that if we try to find the system in a state

minimizing the uncertainty in position, then the uncertainty in momentum becomes

large. This is because the product has a minimum value which must be greater than

~/2. Likewise, minimizing momentum uncertainty leads to large uncertainties in

position. This correlation of uncertainties is the hallmark of quantum mechanics.

7.2 Recovering classical physics

In systems familiar from classical mechanics involving large masses, large values of

momentum, etc., the instrumental uncertainties far exceed the intrinsic limitations

given by (7.16) and the quantum effect is negligible. In other words, the instrumen-

tal uncertainties δx and δp are such that their product is much larger than ~/2. So

the quantum limitation is masked by the instrumental errors. In this way, one can

recover classical results. To see how this works out in detail, we can consider the

time-evolution of averages. Consider an operatorO with the expectation value 〈O〉 in

some state |α〉. For the time-evolution, we use the definition of the Hamiltonian,

i~∂ |α〉∂t

= H |α〉 , −i~∂ 〈α|∂t

= 〈α| H (7.17)

The second equation is the hermitian conjugate of the first. We can thus write

i~∂ 〈α| O |α〉

∂t= i~

∂ 〈α|∂tO |α〉+ i~ 〈α| O∂ |α〉

∂t= 〈α| (−HO +OH) |α〉

= 〈α| [O, H] |α〉 (7.18)

Re-expressing this, Time evolutionof observables

i~∂〈O〉∂t

= 〈[O, H]〉 (7.19)

This is the basic equation of motion for observables in quantum mechanics. Consider

now a Hamiltonian of the form

H =p2

2m+ V (x) (7.20)

Since x commutes with any function of x (such as V (x)), the commutator of x with H

is

[x, H] =1

2m[x, p2] =

1

2m

[x, p] p+ p[x, p]

= i~

p

m(7.21)

7.2 Recovering classical physics 64

Further, we can use position-dependent wave functions for the state |α〉 to write

〈x|(p V (x)− V (x) p

)|α〉 = −i~ ∂

∂x〈x|V (x |α〉 − V (x) 〈x| p |α〉

= −i~ ∂∂x

(V (x)ψα

)+ i~V (x)

∂ψα∂x

= −i~∂V∂x

ψα

= −i~ 〈x| ∂V∂x|α〉 (7.22)

Since the states used were arbitrary, this is equivalent to

[p, V (x)] = −i~ ∂V∂x

(7.23)

We now use (7.19) withO = x andO = p. With the help of (7.21) and (7.22), we get

∂〈x〉∂t

=〈p〉m,

∂〈p〉∂t

= −〈 ∂V∂x〉 (7.24)

If we ignore mean square deviations, we can write 〈x2〉 = 〈x〉2, etc., so that

〈 ∂V∂x〉 =

∂V (〈x〉)∂〈x〉

(7.25)

The classical equations of motion for this system would be Classicalequations of

motion∂x

∂t=

p

m,

∂p

∂t= −∂V

∂x(7.26)

The first of these is the definition of momentum and the second is Newton’s law. From

(7.24), (7.25), we see that, if we neglect the mean square deviations, then the averages

of x and p obey the classical equations of motion. More specifically, Classical

equations fromQM∂〈x〉

∂t=〈p〉m,

∂〈p〉∂t

= −∂V (〈x〉)∂〈x〉

+ Terms of order ~ (7.27)

Thus classical mechanics is recovered for the average values of these variables. And

mean square deviations can be neglected if the quantum uncertainties which are

constrained by the uncertainty principle (7.16) are negligible. This is obtained if the

values of observables are such that ~ is small compared to observables such as action

(which has the dimension of ~). Mathematically, we may think of ~ as a parameter

and state the case of negligible uncertainties as taking the limit ~→ 0. We can then

summarize our result as the following statement.

Proposition 2 For a physical system, if the mean square deviations of observables are Ehrenfest’s

theoremnegligible, or as ~→ 0, the expectation values of observables will obey the classical

equations of motion.

This result is known as Ehrenfest’s theorem.

7.3 Conservation of probability 65

7.3 Conservation of probability

We start with the Schrödinger equation

i~∂ψ

∂t= − ~2

2m∇2ψ + V ψ (7.28)

The complex conjugate of this equation is

−i~∂ψ∗

∂t= − ~2

2m∇2ψ∗ + V ψ∗ (7.29)

Multiply (7.28) by ψ∗ and (7.29) by ψ and subtract the second from the first. This leads

to

i~ψ∗∂ψ

∂t− (−i~)

∂ψ∗

∂tψ = − ~2

2m

(ψ∗∇2ψ − (∇2ψ∗)ψ

)+ V ψ∗ψ − V ψ∗ψ (7.30)

The term involving the potential energy cancels out. The left hand side is given by

i~[ψ∗∂ψ

∂t+∂ψ∗

∂tψ

]= i~

∂ρ

∂t(7.31)

where ρ = ψ∗ψ. Further,

ψ∗∇2ψ − (∇2ψ∗)ψ =∑i

[ψ∗(

∂2ψ

∂xi∂xi

)−(∂2ψ∗

∂xi∂xi

]=

∑i

∂xi

[ψ∗(∂ψ

∂xi

)−(∂ψ∗

∂xi

]= ∇ · [ψ∗∇ψ − (∇ψ∗)ψ] (7.32)

The summation is over i = 1, 2, 3. If we use the Leibniz rule to simplify the terms in the

second line, we see that the terms with ∂iψ∗∂iψ cancel out showing that the second

line is equivalent to the first. Using (7.31) and (7.32), equation (7.30) becomes Conservation ofprobability,

probability

current∂ρ

∂t= −∇ · ~J

~J = − i~2m

[ψ∗∇ψ − (∇ψ∗)ψ] (7.33)

This is in the form of a conservation law.

There are two important conclusions we can draw from this. Consider integrating

(7.33) over a volume V . Using the divergence theorem, we find

∂t

∫Vd3x ρ = −

∮∂V

~J · d~S (7.34)

Since ρ is the probability density, the integral∫V ρ is the probability to find the particle

anywhere in the given volume V . The right hand side gives the integral of the vector~J over the surface ∂V which is the boundary of V . This equation thus shows that the

7.3 Conservation of probability 66

probability to find the particle in the volume V can decrease but it does so in a way

that the rate of decrease is given by∮~J · d~S. We can therefore interpret the latter

as the outflow rate for probability from the region V . Because of this interpretation,

we will refer to ~J as given in (7.33) as the probability current. Equation (7.34) shows

that we have conservation of probability. Whatever probability is lost from V can be

understood as having flowed out of the volume due to the current ~J .

The total probability to find the particle anywhere in all of space should be 1. Thus

we normalize ψ such that∫d3x ρ = 1 (7.35)

For this integral to be finite, ψ∗ψ should vanish at spatial infinity. Thus we may take

ψ(x)→ 0 as |~x| → ∞. This implies also that ~J vanishes at infinity. Equation (7.34) can

be written for the case of V being all of space as

∂t

∫d3x ρ = −

∮|~x|→∞

~J · d~S = 0 (7.36)

So∫d3x ρ is conserved. Thus the normalization condition (7.35), if it is imposed at a

given time, will be obtained for all time.

Now going back to the finite region V , the fact that the total probability over all of

space is conserved tells us that if the probability within V decreases due to outflow,

then it must increase in the complementary region R3 − V at the same rate. Since

these regions and subregions can be chosen arbitrarily, this means that the probability

current must be the same on both sides of the interface. This is ensured by the

continuity of the wave functions and their first derivatives, which we discussed earlier.

67

8 Angular momentum

8.1 Spherical coordinates and angular momentum

We will now start the process of discussing quantum mechanics in three dimensions.

Most of the discussion will be done using the Schrödinger equation, which is given in

three dimensions by

i~∂Ψ

∂t= − ~2

2m∇2Ψ + V (x) Ψ (8.1)

For most of the applications, we will focus on eigenstates of the Hamiltonian given by

− ~2

2m∇2ψ + V (x)ψ = E ψ (8.2)

which can be obtained from (8.1) by the substitution Ψ = e−iEt/~ ψ(x).

Many of the problems of immediate physical interest, such as the Hydrogen atom,

or more generally bound states in atomic physics, or scattering by various atomic

nuclei, will involve central potentials. This means that we can take V (x) to be a

function of the distance of the particle from some given origin, which may be the

position of the nucleus (treated approximately as fixed), or the scattering center. So

V (x) = V (r), and in these cases, it is useful to discuss the problem in spherical polar

coordinates. So we will start by writing the Laplace operator∇2 in these coordinates.

The spherical polar coordinates (r, θ, ϕ) are related to the standard Cartesian coor-

dinates (x, y, z) by

x = r sin θ cosϕ, y = r sin θ sinϕ, z = r cos θ (8.3)

Here r is the radial coordinate and if we think of the surface at fixed r as a two-

dimensional sphere (like the surface of Earth), then θ denotes the latitude (or polar

angle) and ϕ denotes the longitude (or azimuthal angle).

The square of the distance between two infinitesimally separated points, say,

labeled by (x, y, z) and (x+ dx, y + dy, z + dz), is given in Cartesian coordinates by the

Pythagorean theorem as

ds2 = dx2 + dy2 + dz2 (8.4)

Using (8.3) this can be written out as

ds2 = dr2 + r2dθ2 + r2 sin2 θ dϕ2 (8.5)

The distance function is a basic characteristic of any space in any coordinate system.

It is referred to as the metric of the space. In general, we can write the metric in some

arbitrary coordinate system as General form of

metric

ds2 =∑i,j

gijdξidξj (8.6)

8.1 Spherical coordinates and angular momentum 68

where gij can be functions of the coordinates. It may be viewed as a symmetric matrix.

The indices i, j take values 1, 2, 3 corresponding to three independent directions in

space. For the Cartesian coordinates, (ξ1, ξ2, ξ3) = (x, y, z) and g11 = g22 = g33 = 1 and

all other elements of gij are zero. For the spherical coordinates, (ξ1, ξ2, ξ3) = (r, θ, ϕ)

and g11 = 1, g22 = r2 and g33 = r2 sin2 θ, all others being zero. Written as matrix,

gij =

1 0 0

0 1 0

0 0 1

(Cartesian)

=

1 0 0

0 r2 0

0 0 r2 sin2 θ

(Spherical) (8.7)

Using these in (8.6), it is easy to check that formulae (8.4) and (8.5) are reproduced.

In terms of gij , the general formula for the volume element is given by Volumeelement

dv =√

det g dξ1dξ2dξ3 (8.8)

Notice that for the Cartesian system, det g = 1 and we get dxdydz for the volume ele-

ment. For the spherical polar coordinates, det g = r4 sin2 θ, so that dv = r2 sin θ drdθdϕ.

The general definition of the Laplace operator (acting on a scalar function) is General

definition ofLaplacian

∇2 =1√

det g

∑i,j

∂ξi

[gij√

det g∂

∂ξj

](8.9)

where gij (with upper indices) is the inverse of the matrix gij . Thus, for spherical

coordinates,

gij =

1 0 0

0 1r2

0

0 0 1r2 sin2 θ

(8.10)

Since this is diagonal, for this system of coordinates, only terms with i = j contribute

in (8.9) and so we have Laplacian inspherical

coordinates∇2 =

1

r2 sin θ

[∂

∂r

(r2 sin θ

∂r

)+

∂θ

(1

r2r2 sin θ

∂θ

)+

∂ϕ

(1

r2 sin2 θr2 sin θ

∂ϕ

)]=

1

r2

∂r

(r2 ∂

∂r

)+

1

r2 sin θ

∂θ

(sin θ

∂θ

)+

1

r2 sin2 θ

∂2

∂ϕ2

=1

r2

∂r

(r2 ∂

∂r

)+

1

r2

[1

sin θ

∂θ

(sin θ

∂θ

)+

1

sin2 θ

∂2

∂ϕ2

](8.11)

In Cartesian coordinates, it is trivial to check that we get the usual expression

∇2 =∂2

∂x2+

∂2

∂y2+

∂2

∂z2(8.12)

8.1 Spherical coordinates and angular momentum 69

In spherical coordinates, the terms in∇2 involving derivatives with respect to θ

and ϕ are related to the angular momentum. We will first show this and develop the

theory of angular momentum so that the physics of further simplifications becomes

clear.

Classically the orbital angular momentum is a vector given by

~L = ~x× ~p

L1 = x2p3 − x3p2, L2 = x3p1 − x1p3, L3 = x1p2 − x2p1 (8.13)

In the second line we write out the components of the cross product. In quantum

mechanics, we define the angular momentum operator by a similar expression with

operators substituted for the classical quantities. Thus Orbital angularmomentum

L1 = x2p3 − x3p2, L2 = x3p1 − x1p3, L3 = x1p2 − x2p1 (8.14)

The first thing we want to do is to work out the commutators of these operators among

themselves. For the first two, we get

[L1, L2] = [x2p3 − x3p2, x3p1 − x1p3]

= [x2p3, x3p1]− [x2p3, x1p3]− [x3p2, x3p1] + [x3p2, x1p3] (8.15)

Since dissimilar x and p commute, many terms are zero. For example

[x2p3, x3p1] = x2[p3, x3p1] + [x2, x3p1]p3

= x2 [p3, x3]p1 + x3[p3, p1]+ [x2, x3]p1 + x3[x2, p1] p3

= −i~ x2p1 (8.16)

We have expanded out using [AB,C] = A[B,C] + [A,C]B and [A,BC] = [A,B]C +

B[A,C]. Only terms involving similar x, p can give a nonzero term. The only other

nonzero contribution is from the last of the commutators in (8.15),

[x3p2, x1p3] = x1[x3, p3]p2 = i~ x1p2 (8.17)

Using (8.16) and (8.17), we find

[L1, L2] = i~(x1p2 − x2p1) = i~L3 (8.18)

We can work out the other commutators in a similar way. The full set of commutators

for the angular momentum becomes Angular

momentum

algebra[L1, L2] = i~L3, [L2, L3] = i~L1, [L3, L1] = i~L2 (8.19)

A good mnemonic is to notice that the last two are obtained by cyclic permutations of

the first.

8.2 General theory of angular momentum 70

8.2 General theory of angular momentum

We derived the commutation rules (8.19) for the orbital angular momentum from

the fundamental commutation rules for xi and pj . But they hold for spin angular

momentum as well. Usually the letterLi is reserved for the orbital angular momentum,

with Si used for the spin. A general angular momentum is usually denoted by Ji. So

for the first part of developing the general theory, we will use the letter Ji in place of

Li, so that our analysis applies to spin as well. Thus the general commutation rules we

will use are Angular

momentumalgebra

[J1, J2] = i~J3, [J2, J3] = i~J1, [J3, J1] = i~J2 (8.20)

(We also drop the hat-notation; unless we state otherwise all quantities involving J ’s

will be operators.)

We will now consider eigenstates of angular momentum. Recall that we can have

simultaneous eigenstates only for operators which are mutually commuting. Among

the Ji there are no such, so we can only diagonalize one of the components; conven-

tionally, we take it as J3. To include J1, J2 in some manner, we ask if there is some

combination of the J ’s which commutes with J3. The answer is yes. W will now show

that J2 = J21 + J2

2 + J23 commutes with any component Ji. We can see this explicitly

from the commutation rules.

[J2, J1] = [J21 + J2

2 + J23 , J1] = [J2

2 + J23 , J1]

= J2[J2, J1] + [J2, J1]J2 + J3[J3, J1] + [J3, J1]J3

= J2(−i~J3) + (−i~J3)J2 + J3(i~J2) + (i~J2)J3

= 0 (8.21)

The commutation rules (8.20) have symmetry under cyclic permutations of the labels

1, 2, 3, and J2 is invariant under cyclic permutations, and hence this relation also tells

us that J2 commutes with any Ji,

[J2, Ji] = 0 (8.22)

Therefore if we choose to diagonalize J3, in the same basis we can diagonalize J2.

Thus we will look for eigenstates of these operators defined by

J3 |λ,m〉 = m~ |λ,m〉 , J2 |λ,m〉 = λ |λ,m〉 (8.23)

Here m~ is the eigenvalue of J3 and λ that of J2.

We now consider combinations of J1 and J2 which can shift the eigenvalue of J3.

We define

J+ = J1 + iJ2, J− = J1 − iJ2 (8.24)

8.2 General theory of angular momentum 71

These are not hermitian by themselves because of the explicit factor of i. Notice that,

since J1, J2 must be hermitian, (J+)† = J−. We now have

[J3, J+] = [J3, J1 + iJ2] = i~J2 + i(−i~J1) = ~ (J1 + iJ2) = ~ J+

[J3, J−] = [J3, J1 − iJ2] = i~J2 − i(−i~J1) = −~ (J1 − iJ2) = −~ J− (8.25)

Applying these rules onto the state |λ,m〉, we find

J3(J+ |λ,m〉) = J+J3 |λ,m〉+ ~J+ |λ,m〉 = (m+ 1)~ (J+ |λ,m〉)

J3(J− |λ,m〉) = J−J3 |λ,m〉 − ~J− |λ,m〉 = (m− 1)~ (J− |λ,m〉) (8.26)

This shows that the state J± |λ,m〉 is an eigenstate of J3 with eigenvalue (m ± 1)~.

Thus J± are step-up and step-down operators for the J3 eigenvalue (by one unit in m),

similar to what we had for the harmonic oscillator. Since J2 commutes with all Ji and

hence with J±, we find also

J2 J± |λ,m〉 = J±J2 |λ,m〉 = λJ± |λ,m〉 (8.27)

Thus while the action of J± increases/decreases the value of m, it does not change λ.

Now let us look at J2 in terms of J±. We have

J+J− = (J1 + iJ2)(J1 − iJ2) = J21 + J2

2 + i(J2J1 − J1J2)

= J21 + J2

2 + ~J3

J−J+ = (J1 − iJ2)(J1 + iJ2) = J21 + J2

2 − i(J2J1 − J1J2)

= J21 + J2

2 − ~J3 (8.28)

Using these, we can write

J2 = J+J− + J23 − ~J3 = J−J+ + J2

3 + ~J3 (8.29)

Since J−J+ and J+J− are positive operators (because they are of the form A†A), by

applying this onto |λ,m〉 , we get

λ |λ,m〉 = J2 |λ,m〉 = (J−J+ + J23 + ~J3) |λ,m〉 ≥ m(m+ 1)~2 |λ,m〉

= J2 |λ,m〉 = (J+J− + J23 − ~J3) |λ,m〉 ≥ m(m− 1)~2 |λ,m〉 (8.30)

We can rewrite these as

m(m+ 1)~2 ≤ λ, m(m− 1)~2 ≤ λ (8.31)

For a given value of λ consider applying J+ several times, we increase the value of m

and hence m(m + 1). This can give trouble with the inequality (8.31). So clearly we

should not be able to increase m forever, for a given λ. Likewise, if we apply J− several

8.2 General theory of angular momentum 72

times decreasing the value of m to large negative values, we can run into trouble

with the inequality. (Notice that if m is negative we can write the second inequality as

λ ≥ |m|(|m|+1), hence for large values of |m|we can have a violation of this inequality.)

For a given λ, there must therefore exist some value of m, call this j, beyond which we

cannot increase m by the application of J+. In other words, there is a state |λ, j〉 such

that

J+ |λ, j〉 = 0 (8.32)

Using J2 = J−J+ + J23 + ~J3 from (8.29) and applying this to |λ, j〉, since the action of

J+ gives zero, we find

J2 |λ, j〉 = j(j + 1)~2 |λ, j〉 (8.33)

This identifies λ = j(j + 1)~2. The inequalities (8.31) take the form

m(m+ 1) ≤ j(j + 1), m(m− 1) ≤ j(j + 1) (8.34)

Now consider starting with the state |λ, j〉 and applying J− several times. Each appli-

cation of J− lowers J3-eigenvalue by one unit. Thus we have

(J−)n |λ, j〉 = Cn |λ, j − n〉 (8.35)

where Cn is some proportionality constant. We know this must terminate, otherwise

we will have a violation of the inequality m(m− 1) ≤ j(j + 1) as m becomes large and

negative. Thus there must be a state for which the further application of J− gives zero.

Let us call this lowest possible J3-eigenvalue as j′. Thus

J− |λ, j′〉 = 0 (8.36)

Using the form of J2 = J+J− + J23 − ~J3 and applying on to the state |λ, j′〉, we find

j′(j′ − 1) = j(j + 1) (8.37)

This shows that j′ = j or j′ = −j. The first gives our starting state, so we take the

solution to be j′ = −j. Since |λ, j′〉 is obtained from |λ, j〉 by several applications of

J−, say, N of them, we must have j′ = j −N for some integer N . We thus have

−j = j −N =⇒ 2j = N (8.38)

Since N is an integer, we find that j must be quantized. The allowed values of j are

j = 0,1

2, 1,

3

2, · · · (8.39)

For a given j, i.e., for a given λ, the allowed values of m are thus j, j − 1, · · · ,−j. The

number of such states is thus 2j + 1.

Let us summarize the results so far. Since λ = j(j + 1)~2, we can use j,m to label

the states, instead of λ,m. With this in mind, we have the following statement.

8.2 General theory of angular momentum 73

Proposition 3 For angular momentum, we can find simultaneous eigenstates of J2

and J3 with Eigenstates ofangular

momentumJ2 |j,m〉 = j(j + 1)~2 |j,m〉

J3 |j,m〉 = m~ |j,m〉 (8.40)

The allowed values of j are a positive integer or half-an-odd positive integer. For a

given choice of j, the allowed values of m are j, j − 1, · · · ,−j. The number of values

for m is thus 2j + 1.

We can also write down formulae for the action of J± on the states |j,m〉. We know

that J+ is a step-up operator which acts on |j,m〉 and gives a state proportional to

|j,m− 1〉, while J− is a step-down operator giving a state proportional to |j,m− 1〉.So we write

J+ |j,m〉 = Cjm |j,m+ 1〉 (8.41)

From equations (8.29) we find

|Cjm|2 〈j,m+ 1|j,m+ 1〉 = 〈j,m| J−J+ |j,m〉 = 〈j,m| (J2 − J23 − ~J3) |j,m〉

= ~2 [j(j + 1)−m(m+ 1)] (8.42)

With |j,m+ 1〉 being normalized, this identifies Cjm = ~√j(j + 1)−m(m+ 1). A simi-

lar calculation shows that Cjm defined by J− |j,m〉 = Cjm |j,m− 1〉 can be identified as

~√j(j + 1)−m(m− 1). Thus

J+ |j,m〉 = ~√j(j + 1)−m(m+ 1) |j,m+ 1〉

J− |j,m〉 = ~√j(j + 1)−m(m− 1) |j,m− 1〉 (8.43)

We will now consider a couple of examples before specializing back to the orbital

angular momentum.

j = 0

For j = 0, we have only one allowed value form, namely,m = 0. Thus there is one state

|0, 0〉which has zero value for J3 and J2. This is the state of zero angular momentum.

j = 12

For j = 12 , we have the allowed values of m as 1

2 and −12 . Thus there are two states

possible and we can write these states as |12 ,12〉 and |12 ,−

12〉, with

J2 |12 ,12〉 = 3

4~2 |12 ,

12〉

J3 |12 ,12〉 = 1

2~ |12 ,

12〉 (8.44)

J2 |12 ,−12〉 = 3

4~2 |12 ,−

12〉

8.3 Addition of angular momenta 74

J3 |12 ,−12〉 = −1

2~ |12 ,−

12〉 (8.45)

A value of j = 12 will not be allowed for orbital angular momentum, we will see this

shortly. However, these states can be used as the spin states of an electron (or other

spin-12 particles).

j = 1

For j = 1, we will have 2j + 1 = 3 states, with m values 1, 0, −1. The states may be

designated as |1, 1〉, |1, 0〉, |1,−1〉. For the eigenvalues we have

J2 |1,m〉 = 2~2 |1,m〉 , m = 1, 0,−1

J3 |1, 1〉 = ~ |1, 1〉

J3 |1, 0〉 = 0

J3 |1,−1〉 = −~ |1,−1〉 (8.46)

8.3 Addition of angular momenta

There are many situations where we need to consider the addition of angular mo-

menta. For example, an electron bound to the nucleus of an atom has orbital angular

momentum, generally denoted by Li. But the electron also has an intrinsic or spin an-

gular momentum, usually denoted by Si. Often, depending on the kind of interactions

the electron is subject to, the total angular momentum Ji = Li + Si is the quantity of

interest. By this we mean that states which are the eigenstates of J2 and J3 are the

relevant ones, rather than states specified by the Li or Si separately. Other similar

situations occur when we have to consider the total angular momentum of a number

of particles, where the contributing ones may be of the orbital type or spin-type or

mixtures of these. The key point is that the commutation rules impose restrictions on

how the states are to be combined. Thus if we have two individual angular momenta,

say, ~J (1) and ~J (2), and the total angular momentum ~J = ~J (1) + ~J (2), they all obey the

same type of commutation rules,

[J(1)i , J

(1)j ] = i~εijkJ

(1)k , [J

(2)i , J

(2)j ] = i~εijkJ

(2)k , [Ji, Jj ] = i~εijkJk, (8.47)

The rules of combining angular momenta must be compatible with these commuta-

tion rules.

The angular momentum eigenstates, for the individual angular momenta can be

represented as |j1,m1〉 and |j2,m2〉 corresponding to ~J (1) and ~J (2), respectively. These

are of the form we have discussed in the previous section. Thus for the composite

system, a basis is provided by the products |j1,m1〉 |j2,m2〉 for all values of m1, m2,

giving (2j1 + 1) (2j2 + 1) states. For the combined angular momentum, because it

obeys the same kind of commutation rules, the eigenstates are again of the same form

8.3 Addition of angular momenta 75

as in the last section, namely, |j,m〉, in terms of the eigenvalues for J2 and J3. Thus

our task is to understand what the possible values of j are in terms of j1 and j2 and

then to construct |j,m〉 as linear combinations of |j1,m1〉 |j2,m2〉. In other words, we

expect

|j,m〉 =∑m1,m2

Cj,mj1,m1;j2,m2|j1,m1〉 |j2,m2〉 (8.48)

The coefficients Cj,mj1,m1;j2,m2which appear here are known as the Clebsch-Gordon (CG) Clebsch-

Gordoncoefficientscoefficients. Obviously, the allowed values of j,m have to related to j1,m1, j2,m2. This

will mean that the CG coefficients will be zero except for certain specific values of the

quantum numbers.

The CG coefficients can be calculated systematically, although the process can

get a bit tedious. We will now show how this can be done. The constraints on j,m

will emerge from this process as well. The strategy is to start from the state with the

highest possible values for m1, m2 and then work down from there. The state with the

highest values for m1, m2 is obviously |j1, j1〉 |j2, j2〉. On this state, we have

J+ |j1, j1〉 |j2, j2〉 =(J

(1)+ + J

(2)+

)|j1, j1〉 |j2, j2〉 = 0

J3 |j1, j1〉 |j2, j2〉 =(J

(1)3 + J

(2)3

)|j1, j1〉 |j2, j2〉

= ~ (j1 + j2) |j1, j1〉 |j2, j2〉 (8.49)

Because J+ gives zero on |j1, j1〉 |j2, j2〉, we see that it must correspond to the highest

m-value for some choice of j. Since the corresponding m-value is j1 + j2 from the

second of these equations, we conclude that j = j1 + j2 must be one of the allowed

j-values when we combine j1 and j2. Notice that the state |j1, j1〉 |j2, j2〉 is normalized,

so we can take it to be the state |j, j〉 for j = j1 + j2. Thus we write

|j1 + j2, j1 + j2〉 = |j1, j1〉 |j2, j2〉 (8.50)

identifying Cj1+j2,j1+j2j1,j1;j2,j2

= 1. Now that we have the state |j1 + j2, j1 + j2〉 we can con-

struct the state with m-value j1 + j2 − 1 (with the same j-value) by using the lowering

operator J−. From (8.43), we write

|j,m− 1〉 =1

~√j(j + 1)−m(m− 1)

J− |j,m〉 (8.51)

Applying this to the state (8.50) we find

|j1 + j2, j1 + j2 − 1〉 =

√1

j1 + j2

(√j1 |j1, j1 − 1〉 |j2, j2〉+

√j2 |j1, j1〉 |j2, j2 − 1〉

)(8.52)

8.3 Addition of angular momenta 76

Here we used J− = J(1)− + J

(2)− for working out the right hand side. This equation

identifies the CG coefficients

Cj1+j2,j1+j2−1j1,j1−1;j2,j2

=

√j1

j1 + j2, Cj1+j2,j1+j2−1

j1,j1;j2,j2−1 =

√j2

j1 + j2(8.53)

Clearly we can go to lower values ofm (for the same j = j1 + j2) by further applications

of J−.

Notice that there are two ways we can getm = j1 +j2−1, fromm1 = j1−1,m2 = j2

and from m1 = j1, m2 = j2 − 1. One linear combination of the corresponding states is

obtained in (8.52). There is an orthogonal combination given by

|ψ〉 = eiϕ√

1

j1 + j2

(√j2 |j1, j1 − 1〉 |j2, j2〉 −

√j1 |j1, j1〉 |j2, j2 − 1〉

)(8.54)

Here ϕ gives an arbitrary phase, not determined by the requirement of orthonormality.

Hereafter, we choose it to be zero. On this state, we also find

J+ |ψ〉 =

√1

j1 + j2

(√2j1j2 |j1, j1〉 |j2, j2〉 −

√2j1j2 |j1, j1〉 |j2, j2〉

)= 0

J3 |ψ〉 = ~(j1 + j2 − 1) |ψ〉 (8.55)

So |ψ〉 is the highest state in terms of values for m, since J+ annihilates it. We must

therefore conclude that this is the beginning of another series of states with j =

j1 + j2 − 1. We may also write the corresponding CG coefficient as

Cj1+j2−1,j1+j2−1j1,j1−1;j2,j2

=

√j2

j1 + j2|j1, j1 − 1〉 |j2, j2〉

Cj1+j2−1,j1+j2−1j1,j1;j2,j2−1 =

√j1

j1 + j2|j1, j1〉 |j2, j2 − 1〉 (8.56)

(As mentioned before, there is some freedom of a phase choice in how |ψ〉 is defined.

We have made a particular choice; this choice also propagates to the CG coefficients.)

Evidently, the type of reasoning outlined above can be continued. At the next

stage, we have 3 ways to get m = j1 + j2 − 2 corresponding to (m1,m2) = (j1 −2, j2), (j1 − 1, j2 − 1), (j1, j2 − 2). Two linear combinations will be part of the series for

j = j1 + j2 and j = j1 + j2 − 1. These can be identified by the application of J− on

the corresponding higher states with m = j1 + j2 − 1. The third linear combination,

which is orthogonal to these two, will start a new series of states with j = j1 + j2 − 2.

Thus we should find a new possible j-value of j1 + j2 − 2. The argument can then be

continued to the next set of states, yielding a new series of states with j = j1 + j2 − 3

and so on. Thus the choices for j are of the form j1 + j2 − k, k = 0, 1, 2, etc. Recall

that k arises from lowering the m1 or m2 values. This process will terminate when we

8.3 Addition of angular momenta 77

get to the lowest possible value for either m1 or m2. If j1 > j2, we get to the end once

we have k = j2. So no new values of j can be generated after this. If j2 > j1, this will

happen when k = j1. So we conclude that the allowed values for j should be j1 + j2,

j1 + j2 − 1, · · · , |j1 − j2|.We can see that the process accounts for all the states. We started with a basis of

(2j1 + 1)(2j2 + 1) states. For each j-value, we get (2j + 1) states. Thus the number of

independent states once we have made the combinations is

j1+j2∑j=j1−j2

(2j + 1) =

2j2∑k=0

[2(j1 − j2 + k) + 1] = (2j1 + 1)(2j2 + 1) (8.57)

(We took the case j1 ≥ j2 here, but the result is easily checked to be true for j2 ≥ j1 as

well.) We see that we have covered all the possible states after reorganizing them in

terms of the eigenstates for the total angular momentum.

The result we have obtained can be summarized as a theorem, sometimes known

as the Clebsch-Gordon theorem.

Theorem 8.1 One can take suitable linear combinations of the product of the eigen-

states (of the form |j1,m1〉 and |j2,m2〉) for two sets of angular momentum operators~J (1) and ~J (2) to obtain the angular momentum eigenstates |j,m〉 of the total angular

momentum ~J = ~J (1) + ~J (2). These will be of the form

|j,m〉 =∑m1,m2

Cj,mj1,m1;j2,m2|j1,m1〉 |j2,m2〉

for suitable coefficients Cj,mj1,m1;j2,m2. The possible j-values are given by (j1 + j2),

(j1 + j2 − 1), · · · , |j1 − j2|. For each j-value, there are (2j + 1) states corresponding

to different values of m = m1 +m2.

It is useful to consider examples of the explicit realization of this for some simple

cases.

Combining j = 12 and j = 1

2

The states are of the form |12 ,±12〉 |

12 ,±

12〉. We have j1 = j2 = 1

2 . Thus the allowed

values for j are j = 1, 0. The state with the highest value for J3 is |1, 1〉 = |12 ,12〉 |

12 ,

12〉.

By the application of J−, we find

|1, 0〉 =1√2

(|12 ,−

12〉 |

12 ,

12〉+ |12 ,

12〉 |

12 ,−

12〉)

|1,−1〉 = |12 ,−12〉 |

12 ,−

12〉 (8.58)

8.3 Addition of angular momenta 78

The state orthogonal to 1√2

(|12 ,−

12〉 |

12 ,

12〉+ |12 ,

12〉 |

12 ,−

12〉)

is obviously

|0, 0〉 =1√2

(|12 ,−

12〉 |

12 ,

12〉 − |

12 ,

12〉 |

12 ,−

12〉)

(8.59)

It is easily checked that this has j = 0. Evidently, the CG coefficients are

C1,112 ,

12 ; 1

2 ,12

= C1,−112 ,−

12 ; 1

2 ,−12

= 1

C1,012 ,−

12 ; 1

2 ,12

= C1,112 ,

12 ; 1

2 ,−12

=1√2

C0,012 ,−

12 ; 1

2 ,12

= −C0,012 ,

12 ; 1

2 ,−12

=1√2

(8.60)

A physical situation to which this result can be applied is in combining the spins

of two spin- 12 particles. The resulting composite system will have a set of spin-1 states

(3 of them, with m values 0,±1) and a state with spin equal to zero. Notice that the

spin-1 states are symmetric under exchange of the spins, while the spin-zero state is

antisymmetric.

Combining j = l and j = 12

Another common physical situation is when we combine orbital angular momen-

tum (j1 = l) and spin (j2 = 12 ) for a single spin-1

2 particle. An example would be the

electron in an atomic system. The possible j-values for this case are j = l + 12 and

j = l − 12 . The states of the highest m-value for these cases are given by

|l + 12 , l + 1

2〉 = |l, l〉 |12 ,12〉

|l − 12 , l −

12〉 =

1

2l + 1

(|l, l − 1〉 |12 ,

12〉 −

√2l |l, l〉 |12 ,−

12〉)

(8.61)

The remaining states can be obtained by the successive application of the J− operator.

79

9 Three dimensions and central potentials

9.1 Schrödinger equation in spherical coordinates

In order to use the theory of angular momentum to simplify particle dynamics in three

dimensions, we must relate the Laplace operator to the square of the orbital angular

momentum. This can be done by using the change of variables from Cartesian to

spherical polar coordinates. Once again, this is given by

x = r sin θ cosϕ, y = r sin θ sinϕ, z = r cos θ (9.1)

There are two ways to proceed. The first method is to write the operators Li in

spherical coordinates and then take the square. The second is to use the expression

for the orbital angular momentum and calculate its square directly in the Cartesian

basis and then make the transformation. We will do both.

In the first method, we start from (9.1) and write expressions for the spherical

coordinates as

r =√x2 + y2 + z2, cos θ =

z

r, tanϕ =

y

x(9.2)

The first of these gives immediately

∂r

∂x=x

r,

∂r

∂y=y

r,

∂r

∂z=z

r(9.3)

which we may write as

∂r

∂xk=xkr

(9.4)

Taking the differential of the relation for cos θ, we find

− sin θ dθ =dz

r− z

r2dr (9.5)

which, upon using (9.4), gives

∂θ

∂x=

zx

r3 sin θ=

cos θ cosϕ

r∂θ

∂y=

zy

r3 sin θ=

cos θ sinϕ

r

∂θ

∂z= − 1

sin θ

[1

r− z2

r3

]= −sin θ

r(9.6)

Notice that∑k

xk∂θ

∂xk= 0 (9.7)

9.1 Schrödinger equation in spherical coordinates 80

In a similar way, we find

sec2 ϕdϕ =dy

x− y

x2dx (9.8)

which leads to

∂ϕ

∂x= −1

r

sinϕ

sin θ∂ϕ

∂y=

1

r

cosϕ

sin θ

∂ϕ

∂z= 0 (9.9)

For this variable also, we verify trivially that∑k

xk∂ϕ

∂xk= 0 (9.10)

We can now work out the components of the orbital angular momentum. For the Angularmomentum:

sphericalcoordinates

action of L1 on a wave function ψ, we find

L1 ψ = −i~(y∂ψ

∂z− z ∂ψ

∂y

)= −i~

[r sin θ sinϕ

(∂ψ

∂r

∂r

∂z+∂ψ

∂θ

∂θ

∂z+∂ψ

∂ϕ

∂ϕ

∂z

)−r cos θ

(∂ψ

∂r

∂r

∂y+∂ψ

∂θ

∂θ

∂y+∂ψ

∂ϕ

∂ϕ

∂y

)]= −i~

[yz

r

∂ψ

∂r− zy

r

∂ψ

∂r− (sin2 θ sinϕ+ cos2 θ sinϕ)

∂ψ

∂θ− cos θ cosϕ

sin θ

∂ψ

∂ϕ

]= i~

[sinϕ

∂θ+ cot θ cosϕ

∂ϕ

]ψ (9.11)

This identifies the operator L1 as

L1 = i~[sinϕ

∂θ+ cot θ cosϕ

∂ϕ

](9.12)

In an entirely analogous fashion

L2 = i~[− cosϕ

∂θ+ cot θ sinϕ

∂ϕ

]L3 = −i~ ∂

∂ϕ(9.13)

The action of L2 on a wave function is thus given by

L2ψ = −~2

[(sinϕ

∂θ+ cot θ cosϕ

∂ϕ

)(sinϕ

∂ψ

∂θ+ cot θ cosϕ

∂ψ

∂ϕ

)

9.1 Schrödinger equation in spherical coordinates 81

+

(− cosϕ

∂θ+ cot θ sinϕ

∂ϕ

)(− cosϕ

∂ψ

∂θ+ cot θ sinϕ

∂ψ

∂ϕ

)+∂2ψ

∂ϕ2

](9.14)

This can be expanded out as

L2ψ = −~2

[sin2 ϕ

∂2ψ

∂θ2+ cos2 ϕ

∂2ψ

∂θ2+ sinϕ cosϕ

∂θ

(cot θ

∂ψ

∂ϕ

)− cosϕ sinϕ

∂θ

(cot θ

∂ψ

∂ϕ

)+ cot θ cosϕ

∂ϕ

(sinϕ

∂ψ

∂θ

)− cot θ sinϕ

∂ϕ

(cosϕ

∂ψ

∂θ

)+ cot2 θ cosϕ

∂ϕ

(cosϕ

∂ψ

∂ϕ

)+ cot2 θ sinϕ

∂ϕ

(sinϕ

∂ψ

∂ϕ

)+∂2ψ

∂ϕ2

]= −~2

[∂2ψ

∂θ2+ cot θ

∂ψ

∂θ+ cot2 θ

∂2ψ

∂ϕ2+∂2ψ

∂ϕ2

]= −~2

[1

sin θ

∂θ

(sin θ

∂ψ

∂θ

)+

1

sin2 θ

∂2ψ

∂ϕ2

](9.15)

Comparing this with the expression for the Laplacian in (8.11), we find Laplacian and

angularmomentum

−~2∇2 = −~2

r2

∂r

(r2 ∂

∂r

)+

1

r2L2 (9.16)

This relates the Laplacian and the square of the angular momentum.

In the second approach, we can write

L2 =∑i

LiLi =∑ijkmn

εijkεimnxj pk xmpn

=∑jkmn

(δjmδkn − δjnδkm)xj pk xmpn

=∑jk

[xj pk xj pk − xj pk xkpj ] (9.17)

Our attempt will be to combine this into terms involving xkpk. For this we write

pkxj = −i~δjk + xj pk,∑k

pkxk = −i3~ +∑k

xkpk (9.18)

Using this in (9.17) we find

L2 = 2i~x · p+ x2 p2 −∑j

xj(x · p)pj (9.19)

For the last term, we can further use∑j

xj x · p pj =∑j

xj [x · p, pj ] + (x · p)(x · p)

9.2 Central potentials and separation of variables 82

= i~x · p+ (x · p)(x · p) (9.20)

The expression for L2 can now be simplified as

L2 = x2 p2 + i~x · p− (x · p)(x · p) (9.21)

In spherical coordinates, the wave functions will be functions of r, θ and ϕ. On

such functions we have

x · p ψ(r, θ, ϕ) = −i~∑k

xk

[∂r

∂xk

∂ψ

∂r+

∂θ

∂xk

∂ψ

∂θ+

∂ϕ

∂xk

∂ψ

∂ϕ

]= −i~ r∂ψ

∂r(9.22)

where we used (9.4), (9.7) and (9.9). The expression (9.21) can then be simplified as

L2 = −~2

[r2∇2 − r ∂

∂r− r ∂

∂r

(r∂

∂r

)]= r2

[−~2∇2 +

~2

r2

∂r

(r2 ∂

∂r

)](9.23)

which is equivalent to (9.16).

The Hamiltonian for a particle in a potential V (x) in spherical coordinates in three

dimensions is given by

H =p2

2M+ V (x)

= − ~2

2M∇2 + V (x)

= − ~2

2Mr2

∂r

(r2 ∂

∂r

)+

1

2Mr2L2 + V (x) (9.24)

We now use M for the mass of the particle, since m will show up as eigenvalues of

the L3-component of angular momentum shortly. The Schrödinger equation for

eigenstates of the Hamiltonian is thus given by[− ~2

2Mr2

∂r

(r2 ∂

∂r

)+

1

2Mr2L2 + V (x)

]ψ = Eψ (9.25)

9.2 Central potentials and separation of variables

For many examples of particle dynamics in three dimensions, we will be interested in

central potentials, so V (x) = V (r). In such cases, we can do a separation of variables Separation of

variableswith the wave function taking the form

ψ(x) = R(r)Y (θ, ϕ) (9.26)

where Y (θ, ϕ) is an eigenstate of L2. Some features of these functions will be clear

from the general theory of angular momentum for this; alternatively, we can develop

9.2 Central potentials and separation of variables 83

the properties of these functions in terms of differential equations. Since we have

already discussed the general angular momentum theory, here we will follow the

second method, so one can see how the abstract operator method and the method of

differential equations match.

Since L2 only involves derivatives with respect to the angles, it will not deriveR(r).

Thus substituting the ansatz (9.26) in the Schrödinger equation (9.25), we find

2Mr2

[− ~2

2Mr2

∂r

(r2∂R∂r

)+ (V (x)− E)R

]Y (θ, ϕ) + (L2Y (θ, ϕ))R = 0 (9.27)

We have also multiplied the whole equation by r2. Dividing this equation byR(r)Y (θ, ϕ)

we can write this as

2Mr2[− ~2

2Mr2∂∂r

(r2 ∂R

∂r

)+ (V (x)− E)R

]R

+(L2Y (θ, ϕ))

Y= 0 (9.28)

The first term on the left hand side is purely a function of r, while the second term

only depends on the angles θ, ϕ. The only way the sum of these two can be zero for all

r and θ, ϕ is if each term is separately a constant. Taking this to be λ, we can write

2Mr2[− ~2

2Mr2∂∂r

(r2 ∂R

∂r

)+ (V (x)− E)R

]R

= −λ

(L2Y (θ, ϕ))

Y= λ (9.29)

This is basically the idea of the separation of variables. We can rewrite these equations

as

− ~2

2Mr2

∂r

(r2∂R∂r

)+

λ

2Mr2R+ V (x)R = ER (9.30)

L2 Y (θ, ϕ) = λY (θ, ϕ) (9.31)

The second equation tells us that Y (θ, ϕ) is an eigenstate of L2 with eigenvalue λ.

From the general theory of angular momentum, we know that λ must be of the form

j(j+1)~2. However, there will be some restrictions on the choice of the values for j. To

see how this arises, we will solve the second equation (9.31) using another separation

of variables.

We can write out the second equation again, using (9.15), as

−~2

[1

sin θ

∂θ

(sin θ

∂Y (θ, ϕ)

∂θ

)+

1

sin2 θ

∂2Y (θ, ϕ)

∂ϕ2

]= λY (θ, ϕ) (9.32)

We can separate variables again by an ansatz of the form Y (θ, ϕ) = P(θ)F(ϕ). We

could go through the procedure of writing out the equation and then dividing by

Y (θ, ϕ), but it will amount to saying that this equation separates out as

∂2F∂ϕ2

= −cF

9.2 Central potentials and separation of variables 84

−~2

[1

sin θ

∂θ

(sin θ

∂P∂θ

)]+

~2

sin2 θcP = λP (9.33)

We can solve the first equation to get

F(ϕ) = Aei√c ϕ (9.34)

Now ϕ is an angular coordinate taking values from zero to 2π. Thus ϕ and ϕ + 2π

correspond to the same geometrical point,. We expect that the wave function (and

F which is part of the wave function) should be single-valued, so that we must have

F(ϕ+ 2π) = F(ϕ). This leads to the condition

ei√c 2π = 1 (9.35)

This implies that√c must be an integer. We will denote this as m; we show below that

this is related to the eigenvalues of L3. This argument also shows that√c must be real,

otherwise (9.35) cannot be satisfied.

Consider now the action of L3 on F . From (9.13), we see that F is an eigenstate of

L3 with eigenvalue m~,

L3F = −i~∂F∂ϕ

= m~F (9.36)

The analysis of the ϕ-dependent terms confirms that F (and hence Y (θ, ϕ)) is an

eigenstate of L3, with eigenvalues m~ where m is an integer. From the general theory

of angular momentum, we know that the states are of the form |j,m〉where m takes

values −j to j in integer steps. Thus, if j is half-an-odd integer, all values of m will

be half-an-odd integer; if j is an integer, m values will be integers. Since the above

argument shows that m should be an integer for L3, we conclude that, for orbital

angular momentum (for which we can use the argument that ϕ and ϕ+ 2π correspond

to the same geometrical point), the allowed j-values must also be integers. We will

denote the j value for orbital angular momentum as l. Thus, based on the general

theory of angular momentum, we can conclude λ = j(j + 1)~2 = l(l + 1)~2, where l is

zero or a positive integer.

We can now write out the second equation in (9.33) as

−[

1

sin θ

∂θ

(sin θ

∂P∂θ

)]+

m2

sin2 θP = l(l + 1)P (9.37)

This is an equation familiar from different branches of mathematical physics. The

simplest way to bring out this relationship to known equations is to define the variable

u ≡ cos θ. We may then think of P as a function of u. This gives immediately

∂P∂θ

=∂u

∂θ

∂P∂u

= (− sin θ)∂P∂u

(9.38)

9.2 Central potentials and separation of variables 85

Rewriting (9.37), we then find Associated

Legendreequation

d

du

[(1− u2)

dPdu

]+

[l(l + 1)− m2

1− u2

]P = 0 (9.39)

Being a second order differential equation, there will be two independent solutions

for each value of l,m. Since u = cos θ, the range of u of interest for us is 1 ≥ u ≥ −1,

corresponding to the range of θ from zero to π. The set of nonsingular solutions for u

in this range are called the associated Legendre polynomials, denoted by Pml (u). First

consider m = 0. In this case, the equation reduces to Legendre

equation

d

du

[(1− u2)

dPdu

]+ l(l + 1)P = 0 (9.40)

This is Legendre’s differential equations and the nonsingular solutions are the Legen-

dre polynomials given by Legendre

polynomials

Pl(u) =1

2ll!

(d

du

)l(u2 − 1)l (9.41)

The result of the differentiation is a polynomial of order l. The associated Legendre

polynomials which satisfy (9.39) with m 6= 0 are given by AssociatedLegendrepolynomials

Pml (u) = (1− u2)|m|/2(d

du

)|m|Pl(u) (9.42)

Combining these with F(ϕ), we can now write down the complete solution for the

angular part. The functions are given by Sphericalharmonics

Y ml (θ, ϕ) = η

√(2l + 1)

(l − |m|)!(l + |m|)!

Pml (cos θ) eimϕ (9.43)

where η = (−1)m for m ≤ 0 and η = 1 for m > 0. These functions Y ml (θ, ϕ) are known

as spherical harmonics. We have chose the normalization so that∫dΩY ∗ml (θ, ϕ)Y m′

l′ =

∫ π

0dθ sin θ

∫ 2π

0dϕY ∗ml (θ, ϕ)Y m′

l′ = δll′δmm′ (9.44)

Thus the set of functions Y ml for all l,m form an orthonormal set. Further, the

spherical harmonics form a complete set for functions on the two-sphere; i.e., any

function f(θ, ϕ) of θ, ϕ with the periodicity conditions appropriate to the sphere can

be expanded as

f(θ, ϕ) =

∞∑l=0

m=+l∑m=−l

Cml Y ml (θ, ϕ) (9.45)

9.3 Legendre polynomials, spherical harmonics: Some observations 86

where Cml are constants characterizing the function f . Because of the orthonormality

property of the spherical harmonics, we can write

Cml =

∫dΩY ∗ml (θ, ϕ) f(θ, ϕ) (9.46)

Thus, equations (9.45) and (9.46) constitute a generalization of Fourier’s theorem

to functions on the two-sphere. (Recall that Fourier’s theorem, because it involves

functions of a single angular variable θ, refers to functions on the one-sphere or circle.)

We now return to the radial equation (9.30). Using the fact that λ = l(l + 1)~2, we

can write it as Radial

Schrödingerequation in 3d

~2

[− 1

2Mr2

∂r

(r2∂R∂r

)+l(l + 1)

2Mr2

]R+ V (x)R = ER (9.47)

Further, notice that

1

r2

∂r

(r2∂R∂r

)=

1

r2

[2rR′ + r2R′′

]=

1

r

[2R′ + rR′′

]=

1

r

d2

dr2(rR) (9.48)

Using this in (9.47) and multiplying by 2Mr/~2, we get[− d2

dr2+l(l + 1)

r2

]G =

2M(E − V (r))

~2G (9.49)

where G = rR. This is the radial equation to be solved to complete the calculation

of the wave functions and the energy eigenvalues. The solution to this equation will

depend on the potential V (r) and has to be worked out on a case by case basis; there

is no general solution valid for all potentials. We will work out a few physically relevant

examples shortly.

9.3 Legendre polynomials, spherical harmonics: Some observations

We will now make some general observations on the Legendre polynomials and spher-

ical harmonics. The Legendre polynomials were defined by the formula (9.41), often

referred to as the Rodrigues formula. Another way to define these polynomials is via

the generating function Legendre

polynomials:

generatingfunction1√

1− 2uτ + τ2=∞∑l=0

Pl(u) τ l (9.50)

Here τ is an arbitrary dummy variable; if we expand the left hand side in powers of τ

and equate coefficients of like powers of τ on both sides, we can identify the Legendre

polynomials for any value of l.

We will not show the equivalence of this definition to the one defined by the

Rodrigues formula. This can be verified by working out the polynomials from (9.41)

9.3 Legendre polynomials, spherical harmonics: Some observations 87

and comparing them with (9.50). However, here we will show that the definition

(9.50) shows that the Legendre polynomials obey the differential equation (9.40), (with

m = 0), which is really what we need for our discussion. Towards this, we simply

differentiate the definition (9.50) twice with respect to u to get

τ

(1− 2uτ + τ2)32

=∑l

P ′l τ l

3τ2

(1− 2uτ + τ2)52

=∑l

P ′′l τ l (9.51)

From these two equations, we find∑l

[(1− u2)P ′′l − 2uP ′l

]τ l =

3τ2 − 2uτ3 + u2τ2 − 2uτ

(1− 2uτ + τ2)52

(9.52)

We now multiply (9.50) by τ and differentiate with respect to τ . This gives

∂τ

(1− 2uτ + τ2)12

]=

∂τ

∑l

Pl τ l+1

i.e.,1− uτ

(1− 2uτ + τ2)32

=∑l

(l + 1)Pl τ l (9.53)

Differentiating this again with respect to τ we get

−u(1− 2uτ + τ2)

32

+3(1− uτ)(u− τ)

(1− 2uτ + τ2)52

=∑l

l(l + 1)Plτ l−1

i.e.,2u− 3τ + 2uτ2 − u2τ

(1− 2uτ + τ2)52

=∑l

l(l + 1)Plτ l−1 (9.54)

Multiplying this equation by τ and comparing with (9.52), we find∑l

[(1− u2)P ′′l − 2uP ′l + l(l + 1)Pl

]τ l = 0 (9.55)

Since τ is arbitrary, this shows that the polynomials defined by (9.50) obey the Legendre

differential equation.

Since the differential equation is homogeneous in P , the overall normalization

of the Legendre polynomials is not fixed by it. If Pl is a solution, so is CPl for any

constant C. For us the convenient choice is to take the normalization as defined by

(9.50). We now show that this leads to a particular orthogonality and normalization

condition. For us u = cos θ, so the range of u is from−1 to +1. Our first result is∫duPl(u)Pl′(u) = 0, l 6= l′ (9.56)

The argument is essentially the same as the old result that the eigenfunctions of a

hermitian operator for different eigenvalues are orthogonal. We take the differential

9.3 Legendre polynomials, spherical harmonics: Some observations 88

equations for Pl and Pl′ , and multiply the first by Pl′ , the second by Pl. This gives the

pair of equations

Pl′[d

du

[(1− u2)

dPldu

]+ l(l + 1)Pl

]= 0

Pl[d

du

[(1− u2)

dPl′du

]+ l′(l′ + 1)Pl′

]= 0 (9.57)

We subtract the second from the first and integrate over u from−1 to +1 to get∫ 1

−1du

[Pl′

d

du

[(1− u2)

dPldu

]− Pl

d

du

[(1− u2)

dPl′du

]]=[l′(l′ + 1)− l(l + 1)

] ∫ 1

−1duPl′Pl (9.58)

An integration by parts shows that the left hand side is zero; notice that the factor

1− u2 vanishes at both limits, this is useful. Thus we find

[l′(l′ + 1)− l(l + 1)

] ∫ 1

−1duPl′Pl = 0 (9.59)

This shows the result (9.56) for l 6= l′. We can now write∫ 1

−1duPl Pl′ = Cl δll′ (9.60)

for some constant Cl. To determine this constant, we first notice that from the defini-

tion (9.50), we can write∑l

Plτ l∑l′

Pl′τ l′

=1

1− 2uτ + τ2(9.61)

Integrating both sides with respect to u, and using (9.60), we find∑l

Clτ2l =

∫ 1

−1

du

1− 2uτ + τ2= −1

τlog(1− 2uτ + τ2)

]1

−1= −1

τlog

(1− τ1 + τ

)=

∑l

2

2l + 1τ2l (9.62)

This identifies the constant Cl as 2/2l + 1. In arriving at this result, we have used the

expansions of the logarithms as

log(1− τ) = −[τ +

τ2

2+τ3

3+ · · ·+ τn

n+ · · ·

]log(1 + τ) =

[τ − τ2

2+τ3

3+ · · ·+ +(−1)n+1 τ

n

n+ · · ·

](9.63)

9.3 Legendre polynomials, spherical harmonics: Some observations 89

Using the value of Cl, we can write the orthogonality condition for the Legendre

polynomials as Legendrepolynomials:

orthogonality∫ 1

−1du Pl Pl′ =

2

2l + 1δll′ (9.64)

We will now look at the associated Legendre polynomials. For this, write the

differential equation (9.40) for the Legendre polynomials as

(1− u2)P ′′l − 2uP ′l + l(l + 1)Pl = 0 (9.65)

Differentiating this equation with respect to u we get

(1− u2)P ′′′l − 2uP ′′l + l(l + 1)P ′l − 2uP ′′l − 2P ′l = 0 (9.66)

Define the associated Legendre polynomial P 1l by

P1l =

√1− u2 P ′l (9.67)

This gives

P ′l =P1l√

1− u2

P ′′l =(P1

l )′

(1− u2)12

+uP1

l

(1− u2)32

P ′′′l =(P1

l )′′

(1− u2)12

+2u(P1

l )′

(1− u2)32

+P1l

(1− u2)32

+3u2 P1

l

(1− u2)52

(9.68)

Substituting these expressions in (9.66) and simplifying by combining terms we get

(1− u2)(P1l )′′ − 2u(P1

l )′ + l(l + 1)(P1l )− 1

1− u2(P1

l ) = 0 (9.69)

This is the case of m = 1 for the differential equation (9.39),

d

du

[(1− u2)

dPdu

]+

[l(l + 1)− m2

1− u2

]P = 0 (9.70)

Thus (9.69) shows that P1l is indeed a solution to (9.70) for m = 1. Continuing along

similar lines, differentiating m times and reassembling terms, one can show that the

associated Legendre polynomials defined by

Pml = (1− u2)m/2dm

dumPl (9.71)

are indeed a solution to (9.70).

9.3 Legendre polynomials, spherical harmonics: Some observations 90

We will now go over some aspects of spherical harmonics. They were defined in

(9.43) as

Y ml (θ, ϕ) = η

√(2l + 1)

(l − |m|)!(l + |m|)!

Pml (cos θ) eimϕ (9.72)

Instead of defining them in terms of the associated Legendre polynomials, we will con-

sider their construction along the lines of the general theory of angular momentum.

The key property for us is that the spherical harmonics form a complete set of func-

tions for the two-sphere S2, i.e., for a sphere embedded in three spatial dimensions,

described by angles θ, ϕ. Since a sphere is described by the equation

x2 + y2 + z2 = r2 (9.73)

we can define a unit vector with components

n1 =x

r= sin θ cosϕ, n2 =

y

r= sin θ sinϕ, n3 =

z

r= cos θ (9.74)

Each choice of (n1, n2, n3) defines a point on the sphere. Functions on the sphere

can thus be considered as functions of the unit vector ~n. Our strategy will thus be to

construct the spherical harmonics in terms of the components of ~n. In fact, we only Spherical

harmonics:alternatemethod

need to construct the analog of the states with the highest value of m for a given l; the

others can then be obtained by application of J− = L−. So we consider the operators

L± = L1 ± iL2 = i~[(

sinϕ∂

∂θ+ cot θ cosϕ

∂ϕ

)± i(− cosϕ

∂θ+ cot θ sinϕ

∂ϕ

)]= ~ e±iϕ

[i cot θ

∂ϕ± ∂

∂θ

](9.75)

Next we notice that the action of L+ on sin θ eiϕ gives zero,

L+ sin θ eiϕ = ~eiϕ[i cot θ × i sin θ eiϕ + cos θeiϕ

]= 0 (9.76)

Further,

L3 sin θ eiϕ = ~ sin θ eiϕ (9.77)

Notice that n1 + in2 = sin θ eiϕ, so this is truly made of the components of the unit

vector ~n. Now consider the function

ψ = C (sin θ)l eilϕ (9.78)

Evidently this satisfies

L3 ψ = l ~ψ, L+ ψ = 0 (9.79)

9.3 Legendre polynomials, spherical harmonics: Some observations 91

We can thus identify this as the wave function for the state |j,m〉 = |l, l〉, with j = l and

m = l. The normalization can be worked out as follows.∫dϕdθ sin θ ψ∗ψ = 2πC2

∫ π

0dθ sin θ(sin θ)2l

= C2 22l+2(l!)2 π

(2l + 1)!(9.80)

The simplest way to establish this result is the following. Define

I2l+1 =

∫dθ(sin θ)2l+1 (9.81)

We separate out one factor of sin θ and carry out an integration by parts to obtain

I2l+1 =[− cos θ(sin θ)2l

]π0

+

∫ π

0dθ cos θ [2l cos θ(sin θ)2l−1]

= 2 l

∫ π

0dθ (sin θ)2l−1(1− sin2 θ)

= 2 l I2l−1 − 2 l I2l+1 (9.82)

We can rewrite this equation as a recursion rule

I2l+1 =2l

2l + 1I2l−1 (9.83)

The iteration of this leads to the integral in (9.80). With the integral as in (9.80), the

normalized state is thus

〈θ, ϕ|l, l〉 =

√(2l + 1)!

22l+2(l!)2 π(sin θ)l eilϕ = Y l

l (θ, ϕ) (9.84)

We can now use the formula

L− |l,m〉 = ~√l(l + 1)−m(m− 1) |l,m− 1〉 (9.85)

Taking m = l, this leads to

〈θ, ϕ|l, l − 1〉 =1√2l〈θ, ϕ| (L−/~) |l, l〉

=

√(2l + 1)!

22l+2(l!)2 π

1√2l

(L−/~)(sin θ)l eilϕ

=

√(2l + 1)!

22l+2(l!)2 π

1√2l

(− ∂

∂θ+ i cot θ

∂ϕ

)(sin θ)l eilϕ

= −

√(2l + 1)!

22l+2(l!)2 π

√2l cos θ(sin θ)l−1 ei(l−1)ϕ (9.86)

This is proportional to Y l−1l . We can continue along these lines to get other spherical

harmonics.

92

10 Hydrogen atom and other bound states in central potentials

10.1 Solving the idealized Hydrogen atom

We will now consider the solution of the Schrödinger equation for the energy levels of

the Hydrogen atom. The physical system which constitutes the Hydrogen atom is a

proton-electron bound state. The physics of this involves the electrostatic interaction

between the two particles, relativistic effects, motion of the proton, spin-orbit effects,

etc. Since the proton has a mass of∼ 938 MeV, while the electron mass is∼ 0.51 MeV,

the motion of the proton can be neglected as a first approximation. The same holds for

the spin effects and other relativistic effects. Corrections due to these can be included

in perturbation theory later. Thus we consider an idealized problem of the Hydrogen

atom where the proton is at the origin of coordinates and the interaction potential is

V (r) = −Ze2

r(10.1)

We include a general atomic numberZ, although for the protonZ = 1. The Schrödinger

equation is thus[− ~2

2µ∇2 − Ze2

r

]ψ = E ψ (10.2)

Here µ is the mass of the electron. More generally, if we include the motion of the

proton, it is the reduced mass µ = mpme/(mp+me) ≈ me. This equation has solutions

for both E < 0 and E > 0. The solutions with E > 0 can describe the scattering of

electrons by the nucleus. We will be interested in the case of E < 0 for now.

Since we have a central potential, we can use the separation of variables we have

already discussed. Thus, the wave functions can be taken to be of the form

ψ = R(r)Y ml (θ, ϕ) (10.3)

The radial equation for the functionR then takes the form The radialequation

− ~2

2µr2

∂r

(r2∂R∂r

)+

[~2l(l + 1)

2µr2− Ze2

r

]R = ER (10.4)

As we did before, we introduce G = rR. This equation then becomes

− ~2

d2G

dr2+

[~2l(l + 1)

2µr2− Ze2

r

]G = EG (10.5)

The strategy in solving this equation will be to first identify the behavior of the wave

function for small and large values of r, then make an ansatz consistent with them.

One can then simplify the equation to a point where a power series solution is possible.

10.1 Solving the idealized Hydrogen atom 93

For large values of r, we can drop the 1/r2 and 1/r terms as being subdominant

compared to a constant among the coefficients, so the equation simplifies to

− ~2

d2G

dr2≈ EG (10.6)

Since we are considering E < 0,−2µE/~2 is positive, so we introduce the variable

ξ = 2

√−2µE

~2r (10.7)

The equation for u becomes

d2G

dξ2≈ 1

4G (10.8)

We seek a solution which falls off at ξ →∞ so as to be normalizable. This is evidently

given by e−ξ/2. For small values of r, the dominant coefficient in (10.5) is the centrifugal

term with 1/r2 behavior. In this case, the equation is approximated as

− ~2

d2G

dr2+

~2l(l + 1)

2µr2G ≈ 0 (10.9)

This equation has solutions of the form rl+1 and r−l. The second possibility is singular

at r = 0, so we reject that solution; thus the small r-behavior is to be identified as

rl+1 ∼ ξl+1. An ansatz consistent with these two behaviors is Ansatz forpower series

G = e−ξ/2 ξl+1 f(ξ) (10.10)

All proportionality factors can be absorbed into f . We will use this form and reduce

the full equation (10.5) to an equation for f(ξ) and solve it by a power series method.

First we rewrite (10.5) in terms of ξ. This gives the equation

d2G

dξ2− l(l + 1)

ξ2G− 1

4G+

λ

ξG = 0

λ = Zµe2

~

√1

−2µE(10.11)

We can now use the ansatz (10.10) and convert this into an equation for f(ξ). Denoting

the derivative with respect to ξ by a prime on u,

G′ = (l + 1)ξle−12 ξf − 1

2ξl+1e−

12 ξf + ξl+1e−

12 ξf ′

G′′ =

[l(l + 1)

ξ2+

1

4− (l + 1)

ξ

]ξl+1e−

12 ξf

+

[f ′′ +

(2l + 2− ξ)ξ

f ′]ξl+1e−

12 ξ (10.12)

10.1 Solving the idealized Hydrogen atom 94

Using these expressions, we can reduce (10.11) to

ξ f ′′ + (2l + 2− ξ) f ′ + (λ− l − 1) f = 0 (10.13)

(We have also multiplied throughout by a factor of ξ.) We now seek a power series

solution of the form

f(ξ) =

∞∑0

bk ξk (10.14)

where the coefficients bk will be chosen so that f satisfies (10.13). Substituting in

(10.13), we find∑k

[k(k − 1) + k(2l + 2)] bk ξk−1 +

∑k

(λ− l − 1− k) bk ξk = 0 (10.15)

In the first term, the k = 0 contribution is zero. Thus the series starts with k = 1. We

can therefore shift k → k + 1 for the first term and start with the new k = 0. The above

equation then becomes∑k=0

[(k + 1)(2l + 2 + k)bk+1 − (k + l + 1− λ)bk] ξk = 0 (10.16)

We can satisfy this equation for arbitrary values of ξ if the coefficient of ξk vanishes.

This yields the recursion rule for the coefficients

bk+1 =(k + l + 1− λ)

(k + 1)(2l + 2 + k)bk (10.17)

We now have a situation similar to what we found in the case of the harmonic oscillator.

The ratio of the coefficients, for large k is∣∣∣∣bk+1

bk

∣∣∣∣ ≈ 1

k(10.18)

This gives the behavior bk ∼ 1/k! for large k. The series for f will thus become

f ∼∑ξk/k! ∼ eξ. This will lead to a function G which goes as e−

12 ξeξ ∼ e

12 ξ at large ξ.

Such a solution is not normalizable and hence not acceptable. The only way out is if

the series terminates. This can only happen for certain values of the energy E. If the

series terminates, we must have some value kmax = K at which the next coefficient

bK+1 vanishes. This means that we must have

K + l + 1− λ = 0 (10.19)

Using the expression for λ from (10.11), we can solve for the allowed energy values as Energy

eigenvalues

En = −Z2e2

2a0

1

n2(10.20)

10.1 Solving the idealized Hydrogen atom 95

where

n = K + l + 1, a0 =~2

µe2(10.21)

The energy eigenvalues are given by (10.20), labeled by an integer n, which is known

as the principal quantum number. Since K ≥ 0, we must have n ≥ 1. Put differently Principalquantum

numberl ≤ n− 1. a0 is known as the Bohr radius. With the mass and charge of the electron

substituted in, we find a0 ≈ 0.529 × 10−8 cm. This sets the scale for atomic radii.

Formula (10.20) is the same formula as in the Bohr model of the atom, but derived

here within the full quantum theory.

We can now construct the solutions for f . From (10.19) and (10.21), λ = n. So once

we choose a particular value for n, the recursion rule (10.17) becomes

bk = − (n− k − l)k(2l + 1 + k)

bk−1 (10.22)

Iterating this, we find

bk = − (n− k − l)k(2l + 1 + k)

bk−1

= (−1)2 (n− k − l)k(2l + 1 + k)

(n− k − l + 1)

(k − 1)(2l + 1 + k − 1)bk−2

= (−1)m(n− k − l)(n− k − l + 1) · · · (n− k − l +m− 1)

k(k − 1) · · · (k −m+ 1)

× 1

(2l + 1 + k)(2l + 1 + k − 1) · · · (2l + 1 + k −m+ 1)b0

= (−1)k(n− l − 1)!(2l + 1)!

(n− l − 1− k)!k!(2l + 1 + k)!b0 (10.23)

The solution for f(ξ) may thus be written as

fn(ξ) = b0 (n− l − 1)!(2l + 1)!∑k

(−1)k1

(n− l − 1− k)!k!(2l + 1 + k)!ξk (10.24)

The prefactor b0 (n− l− 1)!(2l+ 1)! depends on n, l and b0. For each choice of n, l, this

can be taken to be part of the normalization, which we have to fix anyway. So this

term is not important for now. The associated Laguerre polynomials are defined by Associated

Laguerrepolynomials

LαK(ξ) = [(K + α)!]∑k

(−1)k1

k!(K − k)!(α+ k)!ξk (10.25)

Thus, our solution for f is essentially L2l+1K = L2l+1

n−l−1(ξ). Keeping in mind thatR =

G/r ∼ G/ξ, the solutions to the radial equation can be taken as

R(ξ) = Cnl ξlL2l+1n−l−1(ξ) e−

12 ξ (10.26)

10.1 Solving the idealized Hydrogen atom 96

Combining this with the angular part, the solution for the wave function becomes

ψnlm = Cnl ξlL2l+1n−l−1(ξ) e−

12 ξ Y m

l (θ, ϕ) (10.27)

To summarize, the (bound) eigenstates of the Hamiltonian in (10.2) are labeled by

three integers: n, the principal quantum number; l which denotes the total angular

momentum in the sense that the eigenvalue for L2 is l(l + 1)~2; the azimuthal (or

magnetic) quantum number m which is the eigenvalue of L3. We may thus denote

the states by |n, l,m〉. The wave functions ψnlm(r, θ, ϕ) = 〈r, θ, ϕ|n, l,m〉 in terms of the

coordinates are given by (10.27). The coefficient Cnl is to be fixed by normalization.

The energy eigenvalues are given by (10.20). Notice that while the wave functions

depend on n, l,m, the energy eigenvalues are independent of l,m. This is special to

the Coulomb potential; perturbations will change this result.

It is useful to look at the low lying states in some detail. Since n = K + l + 1, the

lowest possible value for n is 1, corresponding to K = l = 0. Since l = 0, the only

choice for m is m = 0. Thus we have Y 00 as the angular part. This is the ground state

with the wave function

ψ100 = C10 e−1

2 ξ (10.28)

This has energy E1 = −Z2e2/2a0. The next energy level has n = 2. This allows l = 0

with K = 1 and l = 1 with K = 0.

Finally, to determine the normalization, we need the integral∫ ∞0

e−ξξ2l+2(L2l+1n−l−1)2 = 2n

(n+ l)!

(n− l − 1)!(10.29)

Notice that, with the identification of the energy eigenvalues as in (10.20),

ξ =2Z

na0r (10.30)

Thus the integral for normalization of the wave functions is

1 =(na0

2Z

)3 ∣∣Cnl∣∣2 ∫ ∞0

e−ξξ2l+2(L2l+1n−l−1)2

=(na0

2Z

)3 ∣∣Cnl∣∣2 2n(n+ l)!

(n− l − 1)!(10.31)

The normalized wave functions are thus given by Normalizedwave functions

ψnlm =

(2Z

na0

) 32

√(n− l − 1)!

2n(n+ l)!ξl L2l+1

n−l−1(ξ) e−12 ξ Y m

l (θ, ϕ) (10.32)

The orthonormality of the spherical harmonics was used earlier. Further, for different

values of n, the energy eigenvalues are different, so the corresponding wave functions

10.2 Building up atoms and the periodic table 97

must be orthogonal by the general theorem on eigenfunctions of hermitian opera-

tors. Thus combining this result with the normalization, we have the orthonormality

condition∫[drdθdϕ r2 sin θ] ψ∗nlm ψn′l′m′ = δnn′δll′δmm′ (10.33)

The ground state wave function (10.28), with the normalization included, is

ψ100 =1√8π

(2Z

a0

) 32

exp

(−Zra0

)=

√Z3

πa30

exp

(−Zra0

)(10.34)

Here we also use the fact that Y 00 = 1/

√4π.

We will now count the degeneracy of states and analyze the nature of some low

lying states. Notice that, the energy eigenvalues only depend on the principal quantum

number n. For a given n, the allowed values of l can be any integer from zero to n− 1.

The requirement that K = n− l− 1 ≥ 0 leads to this condition. For each value of l, we

have 2l + 1 states corresponding to m = −l,−l + 1, · · · , l − 1, l. Thus the number of

states with the same energy (i.e., for a given n) is given by adding 2l + 1 for all allowed

values of l. This is obviously Degeneracy

d(n) =n−1∑

0

(2l + 1) = 2(n− 1)n

2+ n = n2 (10.35)

The number of states with the same energy eigenvalue is known as the degeneracy of

that eigenvalue. Thus, for our idealized Hydrogen atom problem, the degeneracy for

En (or for n) is d(n) = n2.

10.2 Building up atoms and the periodic table

We have already considered the ground state with the wave function (10.34). It is not

degenerate. For n = 2, we can have l = 0 with one state, and l = 1 with 3 states. A state

with l = 0 is often referred to as an S-state. Also, l = 1 is a P-state, l = 2 a D-state, l = 3

is an F-state. This is the nomenclature from spectroscopy, S, P, D, F referring to Sharp,

Principal, Diffuse and Fundamental. Often, we include the reference to the principal

quantum number as a prefix. Thus an alternate notation for the state n = 1, l = 0 is 1S.

Similarly, for n = 2, l = 0, we have 2S; while for n = 2, l = 1, we have 2P. This is useful

as a way to specify atomic configurations.

The electron is a particle of spin- 12 . This means that it has states of intrinsic angular

momentum (not orbital angular momentum) corresponding to j = 12 . This allows

two spin sates, 2j + 1 = 2 corresponding to J3 taking values ±12~. Thus a complete

specification of the state of an electron in the Hydrogen atom will require one more

10.2 Building up atoms and the periodic table 98

quantum number which tells us whether we have J3 equal to 12~ or−1

2~. We will use

the letter ms to specify the eigenvalue of J3 for the spin part. Thus the states are given

by |n lmms〉.In building up the electronic configurations of atoms, the key principle is the Pauli

exclusion principle. This states that we cannot have more than one electron occupying Pauli principle

a given state. With this idea in mind the ground states of atoms can be obtained as

follows. For Hydrogen with one electron, the ground state is 1S. This electron could

have either ms = 12 or ms = −1

2 . For Helium, we can put two electrons in the state 1S,

corresponding to ms = 12 and ms = −1

2 . This can be included by using the notation

1S2. For Z = 3, i.e., for Lithium, we have 3 electrons. We put two in 1S as before,

but the exclusion principle forbids a third electron for 1S. So it has to go into one of

the n = 2 states, say, 2S. Thus the ground state of Lithium may be taken as 1S2 2S1.

The last occupied state has one electron, similar to Hydrogen. In chemical reactions,

these are the easiest electrons to be exchanged with other atoms or captured into

joint molecular states. Thus we expect Lithium and Hydrogen to behave similarly

from a chemical point of view. Also for Helium, the n = 1 state is fully occupied, and

any excitation will require moving an electron into a the n = 2 or higher states. Since

this will cost significantly more energy than is available in most chemical processes,

Helium tends to be inert in chemical reactions. For Beryllium with Z = 4 we have 1S2

2S2, for Boron, we get 1S2 2S2 2P1, etc. This has 3 electrons in the highest occupied

state, so, chemically, we should expect that Boron will behave as an element of valency

3. The next atom Carbon will have 4 electrons in the n = 2 states, so, chemically, it

should correspond to valency 4. All the n = 2 states are filled when we get to atomic

number 10, with the configuration 1S2 2S2 2P6. (We may write this more explicitly as

1S2 2S2 2P2x 2P2

y 2P2z.) This corresponds to the element Neon, and from what was said

earlier, we expect Neon to be chemically inert for most reactions. It is easy to see that

the pattern of the periodic table will emerge from these considerations.

When we get to higher levels, and multi-electron atoms, the simple minded treat-

ment we have done is not adequate. Spin-orbit interactions and inter-electron inter-

actions become important. The energy levels get corrections and some of the higher

angular momentum states and states of higher principal quantum number may shift

relative to each other. This happens, for example, for the 3D states versus the 4S

states, leading to similar chemical behavior for a number of consecutive elements.

Building up atoms from low atomic number, when we get to 1S2 2S2 2P6 3S2 3P6, we

have the next inert gas Argon. But going beyond this, some of the 4S states can get

filled, before the 3D states are filled. This is evident from the electronic configurations

shown in the table. Notice the shift between 4S and 3D states in the order of filling

them up with electrons. This accounts for the similar chemical properties of Iron,

10.3 The deuteron 99

Element Configuration

Chromium 1S2 2S2 2P6 3S2 3P6 3D5 4S1

Manganese 1S2 2S2 2P6 3S2 3P6 3D5 4S2

Iron 1S2 2S2 2P6 3S2 3P6 3D6 4S2

Cobalt 1S2 2S2 2P6 3S2 3P6 3D7 4S2

Nickel 1S2 2S2 2P6 3S2 3P6 3D8 4S2

Copper 1S2 2S2 2P6 3S2 3P6 3D10 4S1

Zinc 1S2 2S2 2P6 3S2 3P6 3D10 4S2

Table 10.1: Electronic configurations for some transition elements of low atomic

number

Cobalt, Nickel. Similar issues arise with the other series of similar elements, such as

the triplets of transition metals like Ruthenium, Rhodium, Palladium, or Osmium,

Iridium, Platinum, as well as the Lanthanide series (rare earth elements) and the

Actinide series.

10.3 The deuteron

The name deuteron refers to the nucleus of heavy Hydrogen, corresponding to a

bound state of a proton and a neutron. The binding of the two nucleons is due to

nuclear forces. We will give an approximate treatment of how this bound state is

formed, based on a simple spherically symmetric potential. The fundamental theory

of nuclear forces is known as Quantum Chromodynamics or QCD for short. It is a

very involved quantum field theory and it is a long way from QCD to the potential

description we are using. A number of approximations and some assumptions, which

we believe are true but have not yet been theoretically proved, have to be made.

The description of interactions in terms of an instantaneous potential energy

function clearly has limited validity, since instantaneous energy transfer is inconsistent

with the theory of relativity. For the deuteron problem, since the rest mass energy of

each of the two particles is much higher than the binding energy, one could argue

that a potential is adequate for this purpose. Nuclear forces are known to have a short

range, which is what led to Yukawa’s theory of the meson in the first place. One can

view the proton-neutron interaction to arise from the exchange of a massive particle,

the π-meson. In this view, the p-n potential energy takes the form

V (r) =g2

1

rexp

(−mπc

~r)

(10.36)

10.3 The deuteron 100

where g is a coupling constant and mπ is the mass of the π-meson. This is the Yukawa

potential. It falls off rapidly with separation r, with an effective range of about R ≈(~/mπc). The mass of the π-meson is approximately 140 MeV, so this works out to

R ≈ 1.4 × 10−15m. A potential model which is simpler than the Yukawa potential

(10.36), but which is arguably adequate for our purposes, is given by

V (r) =

−V0 r < R

0 r > R(10.37)

The Schrödinger equation takes the form[− ~2

2µ∇2 + V (r)− E

]ψ = 0 (10.38)

where µ is the reduced mass µ = mpmn/(mp +mn).

We are interested in bound states, so that E = −ε, ε > 0. Also, since the kinetic

energy is positive, we have H > −V0.Thus for the region r < R, we have 2µ(E − V ) =

2µ(V0 − ε) > 0. The Schrödinger equation then simplifies as

d2

dr2(rψ) + α2 (rψ) = 0, α2 =

2µ(V0 − ε)~2

(10.39)

We are only considering solutions with zero angular momentum for simplicity. The

solution to (10.39) in the region r < R is thus

ψ(r) = Asinαr

r, r < R (10.40)

where A is a constant. There is also another solution to (10.39), of the form (cosαr)/r,

but this is obviously singular at r = 0 and has to be rejected.

For the region r > R, V = 0, and the Schrödinger equation simplifies to

d2

dr2(rψ)− β2 (rψ) = 0, β2 =

2µε

~2(10.41)

Again, rejecting the solution which grows exponentially as r →∞, we can take ψ as

ψ(r) = B e−βr, r > R (10.42)

We have to match the wave functions and their first derivatives at r = R. These

matching conditions are

A sinαR = B e−βR

Aα cosαR = −B βe−βR (10.43)

Dividing the first of these equations by the second, we find

tanαR = −αβ

= −

√1− (ε/V0)

(ε/V0)(10.44)

10.3 The deuteron 101

This is a transcendental equation for the bound state energy−ε as a function of V0 and

R. To get a negative value for tanαR, we need αR > π/2. We first look for a shallow

bound state with ε V0 (i.e., α β), with αR > π/2. The last inequality means that

V0R2 &

π2~2

8µ(10.45)

If we take the range R to be approximated by R ∼ ~/(mπc), then this inequality

0.1 0.2 0.3 0.4 0.5

-40

-20

20

40

x→

0.2 0.4 0.6 0.8 1.0

-6

-4

-2

2

4

6

x→

Figure 10.1: The graphs of f1 and f2 for√

2µV0R2

~2 = 1.8 and 10.

becomes V0 & 51 MeV. From experimental observations, ε ≈ 2.226 MeV, so we see that

it is consistent to consider the deuteron as a shallow bound state. For a second bound

state to exist, we will need αR > 3π/2, or V0 & 113 MeV. More generally, αR must be in

the second or last quadrant, i.e., between (2n− 1)π/2 and nπ n ∈ Z, for tanαR to be

negative and lead to a solution for (10.44).

It is sufficient for us to consider a single bound state for the case of the deuteron.

The actual solution to (10.44) can be obtained graphically by plotting the two curves

f1(x) = tan

(√2µV0R2

~2

√1− x

)

f2(x) = −√

1− xx

, x =ε

V0(10.46)

and looking for points of intersection. Some sample graphs are shown in Fig. 10.1. No-

tice that we have only one intersection, and hence one bound state, for√

2µV0R2/~2 =

1.8 < 3π/2, but 3 bound states for (5π/2) <√

2µV0R2/~2 < 7π/2, which is the case

for the second graph. Generally, we get n bound states for√

2µV0R2/~2 taking values

between (2n− 1)π/2 and (2n+ 1)π/2.

102

11 Spin of the electron

11.1 Spin and matrix representation of spin

We have discussed the dynamics of a point-particle both in one and three dimensions.

From the point of view of quantum mechanics, a point-particle is not to be viewed as

some limit of a little rigid ball of matter as the radius goes to zero. This is because, due

to the uncertainty principle, specification of a radius is meaningless. Also, the picture

of a little ball introduces extraneous unobservable concepts, since any physical system

must be specified by the set of observables (i.e., by the corresponding operators) for it.

Thus, so far, when we discussed a point-particle, what we meant was a physical system

specified by the set of basic observables ~x, ~p. All other observables were functions of

these. Now we will augment our concept of a point-particle by adding another set of

observables associated to it. This will be an intrinsic angular momentum or spin for

the particle. The operator corresponding to this will be denoted by ~S, a vector with

three components. Being an angular momentum, it must obey the commutation rulesCommutationrules for spin

Si Sj − Sj Si = i~∑k

εijk Sk (11.1)

There are a number of important remarks to make about spin.

1. It is important to contrast the spin with the orbital angular momentum. In

examples, such as the case of the electron in an atom, we have seen that the

orbital angular momentum Li obeys the commutation rules

Li Lj − Lj Li = i~∑k

εijk Lk (11.2)

These are similar to what we have in (11.1), but the difference is that Li is made

of the position and momentum operators,

Li =∑j,k

εijkxj pk (11.3)

and the commutation rules (11.2) are a consequence of the fundamental com-

mutation rules for xi and pj . However, the spin angular momentum Si is an

independent observable in its own right. It cannot be expressed in terms of xiand pj .

2. Because Si is an independent observable, it commutes with xi and pj . Thus,

we can define a point-particle as the quantum mechanics of the observables

xi, pi, Siwith the commutation rules Algebra of

observables for

a point-particlexi xj − xj xi = 0

pi pj − pj pi = 0

11.1 Spin and matrix representation of spin 103

xi pj − pj xi = i~ δij (11.4)

Si Sj − Sj Si = i~∑k

εijk Sk

We also have the rules

Si xj − xj Si = 0

Si pj − pj Si = 0 (11.5)

3. In addition to the commutation rules, for a point-particle, the value of the spin

angular momentum is restricted. What this means is that while the general

angular momentum theory allows for the possibility of j = 0, 12 , 1,

32 , · · · (where

j(j + 1)~2 is the eigenvalue of J2), for the spin of a point-particle, the value of

j has a particular value. For the electron j = s = 12 . To avoid confusion, we

often use s in place of j when discussing spin. This is also the case, i.e., s = 12 ,

for the proton, the neutron and for the quarks. For the photon, s = 1, although

this statement requires further qualification because the photon is a massless

particle. There are also particles, like the π-meson, which have no spin, i.e.,

j = s = 0. We will focus mostly on the electron, so s = 12 .

4. Where does spin come from? A free point-particle classically is defined by the

law of inertia, namely, that it propagates in a straight line with no acceleration.

From a Lagrangian point of view, we have L = T = 12mv

2, with no potential

energy term, since there is no force. Further, T is defined by the metric via

T =m

2

[(dx

dt

)2

+

(dy

dt

)2

+

(dz

dt

)2]

=m

2

(ds

dt

)2

(11.6)

where ds2 = dx2 + dy2 + dz2 is the metric. This metric has symmetry under

translations (i.e., under ~x→ ~x+ ~a, for constant ~a) and under rotations. Thus a

free particle is defined by operators which generate translations (which are the

momenta ~p) and those which define rotations (which are the angular momenta).

Quantum mechanically, therefore, the operator versions of these quantities

will define a point-particle. This allows for the freedom of spin. The problem

becomes unavoidable in the relativistic theory. Lorentz transformations along

different spatial directions do not commute but give a discrepancy which is a

spatial rotation. Thus we must specify how wave functions or states respond to

rotations, and in general, this does not have to be trivial even for a free particle.

We now turn to the specifics of a spin-12 particle. This means we take the states to Spin- 1

2system

be angular momentum states with j = s = 12 . Since there are only two independent

states |12 ,±12〉 possible, it is simpler to use a matrix version for the operators. We can

label the states as

|1〉 ≡ |12 ,12〉 , |2〉 ≡ |12 ,−

12〉 (11.7)

11.1 Spin and matrix representation of spin 104

The matrix versions of the angular momentum operators are then given by

Si =

(〈1| Ji |1〉 〈1| Ji |2〉〈2| Ji |1〉 〈2| Ji |2〉

)(11.8)

Using the formulae we have derived before, i.e., by using

J3 |s,m〉 = m~ |s,m〉

J± |s,m〉 = ~√s(s+ 1)−m(m± 1) |s,m± 1〉 , (11.9)

we can obtain the matrix elements in (11.8). This gives

Si =~2σi (11.10)

where σi are three 2× 2 matrices given explicitly by Pauli matrices

σ1 =

(0 1

1 0

), σ2 =

(0 −ii 0

), σ3 =

(1 0

0 −1

)(11.11)

These three matrices are known as the Pauli matrices. It is easy to check by direct

matrix multiplication that

σ21 = σ2

2 = σ23 = 1

σ1 σ2 = i σ3

σ2 σ3 = i σ1

σ3 σ1 = i σ2 (11.12)

where 1 denotes the 2× 2 identity matrix. These relations can be combined into the

single equation

σi σj = δij1 + i∑k

εijk σk (11.13)

From this relation, we also get

σi2

σj2− σi

2

σj2

= i∑k

εijkσk2

(11.14)

This shows that Si = ~2σi obeys the angular momentum commutation rules (11.1).

Further, using (11.13), we also get

∑i

SiSi =

(~2

)2∑i

σiσi = ~2 3

4= ~2 1

2

(1

2+ 1

)(11.15)

11.1 Spin and matrix representation of spin 105

This shows that the matrix representation we use corresponds to j = s = 12 . There is no

surprise here, since we used the states with j = 12 to work out the angular momentum

matrices, as in (11.8).

It is useful to consider eigenstates or eigenvectors of the spin operators/matrices.

For example, σ3 is diagonal as written, so we find, trivially, that

~2

(1 0

0 −1

)(1

0

)=

~2

(1

0

),

~2

(1 0

0 −1

)(0

1

)= −~

2

(0

1

)(11.16)

Thus we can make an identification of the column vectors and the states as follows.

|12 ,12〉 =

(1

0

), |12 ,−

12〉 =

(0

1

)(11.17)

These states are normalized and orthogonal to each other. If a measurement of the

spin along the z-axis is made, the system will collapse into one of these eigenstates.

It is interesting to consider eigenstates for the other components of spin as well.

For example, for the S1 operator, the eigenvalue equations are easily solved to obtain

~2

(0 1

1 0

)(1

1

)=

~2

(1

1

),

~2

(0 1

1 0

)(1

−1

)= −~

2

(1

−1

)(11.18)

Thus the correspondence is

|12 , S1 = ~/2〉 =1√2

(1

1

), |12 , S1 = −~/2〉 =

1√2

(1

−1

)(11.19)

We have normalized the eigenvectors. Notice that these are linear combinations of the

eigenstates for S3 (as they must be because the eigenstates of S3 must form a compete

set),

1√2

(1

1

)=

1√2

(1

0

)+

1√2

(0

1

)1√2

(1

−1

)=

1√2

(1

0

)− 1√

2

(0

1

)(11.20)

More generally, any state for spin can be represented as

eiϕ cos θ

(1

0

)+ eiχ sin θ

(1

0

)=

(eiϕ cos θ

eiχ sin θ

)(11.21)

The coefficients are so chosen that the state is normalized.

We will see shortly that we can use the magnetic moment of the electron to create

states with spin along any chosen axis, using the Stern-Gerlach apparatus. This leads

11.2 Magnetic moment of the electron 106

to interesting observations. For example, suppose we create a state with spin oriented

along the positive x-axis. This means that we are starting with the state |12 , S1 = ~/2〉.Suppose we now carry out a measurement of S3. The system will return a value of

either S3 = ~/2 or S3 = −~/2. The probability of getting S3 = ~/2 is given by the

square of 〈12 , S3 = ~/2|12 , S1 = ~/2〉. This can be evaluated as

〈12 , S3 = ~/2|12 , S1 = ~/2〉 = (1, 0)1√2

(1

1

)=

1√2

(11.22)

Squaring this we find the probability of 12 . Similarly, the probability of finding S3 =

−~/2 is given by the square of

〈12 , S3 = −~/2|12 , S1 = ~/2〉 = (0, 1)1√2

(1

1

)=

1√2

(11.23)

leading to 12 again. If we start with a general state as in (11.21), the probability of

obtaining S3 = ~/2 upon carrying out a measurement of S3 is given by the absolute

square of

(1, 0)

(eiϕ cos θ

eiχ sin θ

)= eiϕ cos θ, (11.24)

giving the probability of cos2 θ. Various other cases can be worked out similarly.

11.2 Magnetic moment of the electron

We now turn to the question: How do we know the electron or any other particle has

spin? This is from a combination of magnetic moment properties and conservation

laws.

The electron has electrical charge and hence the existence of intrinsic angular

momentum leads to a magnetic moment. Thus the electron responds to magnetic

fields. There are two effects in spectroscopy which follow from this. One is that we can

put atoms in a magnetic field and the energy levels split, giving a shift of spectral lines

called the Zeeman effect. The contribution of electron spin gives what is called the

anomalous Zeeman effect and this was the first piece of evidence for spin. Secondly,

for the electron in an atom, we can think of using its rest frame for the physics. (It is

complicated to do this in practice, but just to clarify the spin-orbit effect, it is useful to

do so.) In this frame, the proton is moving, which creates a current since the proton is

charged and hence a magnetic field. The magnetic moment of the electron responds

to this field and this effect gives what is known as the spin-orbit interaction. This

exists even in the absence of any externally applied magnetic field and leads to a

certain splitting of the energy levels. For example, the states |2, 0, 0〉 and |2, 1, 0〉 for

11.2 Magnetic moment of the electron 107

the Hydrogen atom (in the |n, l,m〉 notation) are no longer degenerate and this can be

measured via spectroscopy.

We will also discuss the Stern-Gerlach experiment briefly in what follows. This will

allow the separation of the different spin eigenstates of the electron and will also give

independent confirmation of the electron being a spin-12 particle.

Finally, in various processes, only the total angular momentum (orbital + spin,

i.e., Ji = Li + Si) is conserved. This allows for the interconversion of spin and orbital

angular momentum. One can use this to identify the spins of particles which are not

electrically charged. For example, selection rules for spectroscopic transitions tell us

that the photon should have spin equal to 1 (i.e., s = 1).

The magnetic moment of the electron can be motivated by considering the motion

of a charged particle in a magnetic field. The Lagrangian for this is given by Chargedparticle coupledto

electromagneticfield

L =1

2m∑j

xj xj +e

c

∑j

Aj xj − eφ (11.25)

where Ai is the vector potential and φ is the electrostatic potential. This form of the

Lagrangian is justified by showing that it leads to the correct equations of motion. For

this, we need

∂L

∂xi= mxi +

e

cAi,

∂L

∂xi=e

c

∑j

∂Aj∂xi

xj − e∂φ

∂xi(11.26)

Further since Aj is evaluated at the position of the particle,

dAidt

=∂Ai∂t

+∑j

∂Ai∂xj

xj (11.27)

The first term on the right hand side is due to any explicit dependence Ai may have

on time, the second is due to the time-dependence via its dependence on the position

of the particle. The equation of motion corresponding to the Lagrangian L, i.e., the

Euler-Lagrange equation, is given by

d

dt

∂L

∂xi=∂L

∂xi(11.28)

Using (11.26) and (11.27), this becomes

mxi = e

(−1

c

∂Ai∂t− ∂φ

∂xi

)+e

c

∑j

(∂Aj∂xi− ∂Ai∂xj

)xj

= eEi +e

c(~v × ~B)i (11.29)

where we use the usual identification

Ei = −∂Ai∂t− ∂φ

∂xi,

∑k

εijkBk =∂Aj∂xi− ∂Ai∂xj

(11.30)

11.2 Magnetic moment of the electron 108

Equation (11.29) is the correct equation of motion with the correct Lorentz force, so

this justifies L in (11.25) as the correct Lagrangian.

The Hamiltonian is obtained as

H =∑i

pixi − L

=(~p− e ~A/c)2

2m+ eφ (11.31)

The momentum pi is defined by ∂L/∂xi which is given in (11.26). Notice that the effect

of the magnetic field is to make the modification ~p → ~p − e ~A/c in the free particle

Hamiltonian H = p2/2m. For a uniform magnetic field, the vector potential can be

taken as

Ai = −1

2

∑j,k

εijkxjBk (11.32)

For this kind of field, we can write the Hamiltonian (11.31) as

H =1

2m

[p2 − e

c

∑i

(Aipi + piAi) +e2

c2

∑i

AiAi

]+ eφ

=1

2m

p2 +e

2c

∑i,j,k

(εijkxjpiBk + εijkpixjBk) +e2

c2

∑i

AiAi

+ eφ

=p2

2m− e

2mc

∑k

LkBk +e2

2mc2

∑i

AiAi + eφ (11.33)

where we used the definition of orbital angular momentum

Lk =∑i,j

εijkxipj (11.34)

The Hamiltonian (11.33) shows that there is a direct coupling of the particle to the

magnetic field given by Orbital angular

momentum,

magneticmomentHint = −~µ · ~B, µ =

e

2mc~L (11.35)

~µ is the magnetic moment, which is proportional to the charge and the angular mo-

mentum.

This result (11.35) is for the orbital angular momentum. We expect a similar result

for spin angular momentum as well. In this case, the interaction is given by Magnetic

moment due tospin

Hint = −~µ · ~B, µ = ge

2mc~S (11.36)

The factor g is known as the gyromagnetic ratio. For the orbital case, we may use a

formula of the same type, but g = 1 for that case, as is clear from (11.35). For the spin

11.2 Magnetic moment of the electron 109

part, for the electron, g is very close to 2. In the relativistic Dirac theory of the electron,

if certain corrections due to quantum field theoretic effects are neglected, g is exactly

2. Experimentally, Anomalousmagnetic

momentg − 2

2= 1159652180.73(0.28)× 10−12 (11.37)

The correction or addition to the Dirac magnetic moment due to g − 2 is known as

the anomalous magnetic moment. Since this correction is small, of the order of three

parts in one thousand, for most purposes we can use g = 2. This is what we shall do

for the rest of this section. (The value of g − 2 can be calculated in quantum field

theory. The theoretical prediction agrees remarkably well with the experimental value.

Arguably, it is the most accurate prediction and verification in the history of physics.)

We can easily diagonalize the magnetic moment interaction Hint. For a uniform

magnetic field ~B, we can choose the orientation of the coordinates such that ~B is

along the 3rd axis. ThenHint = −(e/2mc)gS3B. The energy eigenvalues ofHint (which

to a first approximation are corrections to the existing eigenvalues of the (p2/2m) + eφ

part of the Hamiltonian) are thus

Ems =

(e~

2mc

)g Bms, ms = −s,−s+ 1, · · · , s (11.38)

For the electron which has s = 12 , we have two energy eigenvalues

E±1

2= ±µB

g

2B, µB =

(e~

2mc

)(11.39)

µB , which is the basic unit of magnetic moment for atomic systems, is often called the

Bohr magneton.

The magnetic moment interaction provides a way of separating different orien-

tations of spin. This is because the formulae (11.32) and (11.36), although derived

assuming uniform magnetic field, are true for slowly varying fields as well. We may

therefore think of Hint as a potential leading to the force

Fi = −∂Hint

∂xi= ~µ · ∂

~B

∂xi(11.40)

This force has opposite signs for ms = ±12 , so that electrons with ms = ±1

2 will get

deflected in opposite directions, for a given gradient of the magnetic field. Thus by

passing a beam of electrons through a region with an inhomogeneous magnetic field

we can separate the two spin orientations. This is the essence of the Stern-Gerlach

experiment. Thus the Stern-Gerlach apparatus can be used to prepare electrons

in pure |12 ,12〉 or |12 ,−

12〉 state. The fact that this experiment gives a splitting of the

initial electron beam into only two beams after passing through the inhomogeneous

magnetic field shows thatms can have only two values for the electron. This is another

way of seeing that the electron is a spin-12 particle.

11.3 The Pauli equation 110

11.3 The Pauli equation

The Dirac equation, which provides a relativistic description of the electron, leads

to a gyromagnetic ratio of 2. One of the questions we might ask is whether there is

a modification of the Schrödinger equation which naturally leads to g = 2 for the

electron. There is such an equation, it is called the Pauli equation. Not surprisingly,

this equation is best obtained by taking a nonrelativistic approximation to the Dirac

equation. Explicitly, the Pauli equation is given by the Hamiltonian

H =

(~σ · (~p− e ~A/c)

)22m

+ V (11.41)

Here σi are the Pauli matrices introduced in (11.12). We can simplify this Hamiltonian

as follows, using the commutation rules for the Pauli matrices.(~σ · (~p− e ~A/c)

)2=

∑ij

σiΠiσjΠj =∑ij

[1

2(σiσj + σjσi) +

1

2(σiσj + σjσi)

]ΠiΠj

=∑ij

[δij + i

∑k

εijkσk

]ΠiΠj

= Π2 +i

2

∑ijk

εijkσk [ΠiΠj −ΠjΠi]

= Π2 − e

c~∑k

σkBk = Π2 − 2e

c

∑k

SkBk (11.42)

Here we used the symbol Πi for pi− eAi/c for brevity. Using (11.42) and the expression

(11.33) for Π2, we can write

H =p2

2m− e

2mc

∑k

LkBk +e2

2mc2

∑i

AiAi −2eSkBk

2mc+ V

=p2

2m− e

2mc

∑k

(Lk + 2Sk)Bk +e2

2mc2

∑i

AiAi + V (11.43)

We see that the Pauli equation does naturally lead to g = 2 for the spin part of the

magnetic moment coupling.

111

12 Many body quantum mechanics

12.1 Many-body wave functions, spin-statistics theorem

We have so far considered the quantum dynamics of a single particle. We will now

turn to some aspects of many body quantum mechanics, which is obviously needed

for almost all physical situations, since these would involve more than one particle in

general. In fact situations where the dynamics can be approximated well by a single

particle are rather rare, many-body physics is the more ubiquitous case.

The Heisenberg algebra for N particles was already given in (3.2) as

x(α)i x

(β)j − x

(β)j x

(α)i = 0

p(α)i p

(β)j − p

(β)j p

(α)i = 0

x(α)i p

(β)j − p

(β)j x

(α)i = i~ δijδαβ (12.1)

where α, β = 1, 2, · · · , N label the particles. We have N mutually commuting copies of

the Heisenberg algebra. The wave function for a state |A〉 of the N-body system will

be of the form

ΨA(x(1), x(2), · · · , x(N)) = 〈x(1), x(2), · · · , x(N)|A〉 (12.2)

Consider writing the Schrödinger equation for such a wave function and solving it.

If there is no interaction potential between the particles, we expect that we can do a

separation of variables and write the wave function in the form

ΨA(x(1), x(2), · · · , x(N)) = ψα1(x(1))ψα2(x(2)) · · ·ψαN (x(N)) (12.3)

Here we interpret the subscript A as a composite index α1α2 · · ·αN , αi labeling the

state of the i-th particle. Notice that this product is natural, since the wave functions

are probability amplitudes and probabilities of independent systems are the products

of individual probabilities. In this case, |Ψ|2 becomes a product of the individual |ψαi |2.

We can also think of the state |A〉 as the product |α1〉 |α2〉 · · · |αN 〉which is an element

of the product Hilbert spaceH(1)⊗H(2)⊗ · · · ⊗H(N). With interactions, such a simple

product decomposition is not possible.

The discussion given above assumed that we can distinguish the individual par-

ticles. Thus, for example, we can consider a two-body system made of the proton

and the electron as in the Hydrogen atom. These particles have different charges and

masses and can be distinguished. So the wave function for a mutually noninteracting

electron-proton system would have a product form for the wave function. But when

we have indistinguishable particles, there are more possibilities. Permutations of

particle quantum numbers are a symmetry, physics should be unchanged under this.

That is the essence of the statement that we have indistinguishable particles. Thus

12.1 Many-body wave functions, spin-statistics theorem 112

the two-electron wave functions Ψα1α2(x(1), x(2)) and Ψα2α1(x(1), x(2)) (where we per-

mute the individual state labels) should describe the same physics or the same state

physically. This also means that we can consider the symmetric and antisymmetric

combinations,

Ψ±(x(1), x(2)) =1√2

(Ψα1α2(x(1), x(2))±Ψα2α1(x(1), x(2))

)(12.4)

(The factor 1/√

2 is for normalization purposes.) For a Hamiltonian with permutation

symmetry for the particle labels, both these will also be eigenstates if the individ-

ual terms Ψα1α2(x(1), x(2)), Ψα2α1(x(1), x(2)) are eigenstates. The question thus arises:

Which wave function should we use for a physical problem, either of the individual

ones, or one of Ψ±(x(1), x(2)), or all of them? When we consider more than two identi-

cal particles, there are again many combinations possible with different properties

under permutations. So the question is rather more acute. The answer to this is

provided by the spin-statistics theorem, which is one of the deep theorems in quantum

field theory. While the precise formulation of the theorem requires rather more careful

wording, the essence of the theorem is captured by the following statement.

Theorem 12.1 For a system of identical particles with each particle having integer

spin (in units of ~), the many-body wave function must be totally symmetric under Spin-statistics

theorempermutations of the individual state labels or individual coordinates.

For a system of identical particles with each particle having half-an-odd integer spin

(in units of ~), the many-body wave function must be totally antisymmetric under

permutations of the individual state labels or individual coordinates.

The more precise statement of the theorem tells us what can go wrong if we use the

“wrong" assignment of symmetric states for half-odd-integer spin (leads to negative

energies) or antisymmetric states for integer spin (leads to negative inner products,

hence no probability interpretation).

The spin-statistics theorem tells us that among the more familiar particles, many-

body wave functions of photons or phonons (quantized units of lattice vibrations in

solids) must be totally symmetric under permutations of individual particle labels as

these have integer spin. (Photons and phonons are spin-1 particles.) Each many-body

state of electrons or protons or neutrons must be totally antisymmetric since these are

spin-12 particles. Thus the wave function of a two-electron system has the property

ΨA(x(1), x(2)) = −ΨA(x(2), x(1)) (12.5)

for a two-body state |A〉. If Ψ admits a decomposition in terms of one-particle wave

functions, we can write

Ψα1α2(x(1), x(2)) =1√2

(ψα1(x(1))ψα2(x(2))− ψα1(x(2))ψα2(x(1))

)

12.1 Many-body wave functions, spin-statistics theorem 113

=1√2

∣∣∣∣∣ψα1(x(1)) ψα1(x(2)

ψα2(x(1)) ψα2(x(2)

∣∣∣∣∣ (12.6)

Notice that we can represent the wave function in the form of a determinant. It is

now straightforward to see that a generalization of the wave function for N particles is

given by Slaterdeterminant

Ψα1α2···αN (x(1), x(2), · · ·x(N)) =1√N !

∣∣∣∣∣∣∣∣∣∣ψα1(x(1)) ψα1(x(2) · · · ψα1(x(N))

ψα2(x(1)) ψα2(x(2) · · · ψα2(x(N))

· · · · · · · · · · · ·ψαN (x(1)) ψαN (x(2) · · · ψαN (x(N))

∣∣∣∣∣∣∣∣∣∣(12.7)

Since the determinant is antisymmetric under permutation of any two rows or any

two columns, this wave function has the required antisymmetry property. This form

of the wave function is known as a Slater determinant; it is very useful in discussing

many-electron systems such as in atoms of higher atomic number and in solid state

contexts.

There are many important consequences of the spin-statistics theorem. The

antisymmetry of the wave function (12.6) or (12.7) implies that the wave function will

vanish if two of the state labels are the same. Thus two electrons (or more generally

two identical particles of half-an-odd integer spin) cannot occupy the same state,

the probability for that is zero. This result is known as the Pauli exclusion principle.

(We have already used this principle in discussing the electronic states of atoms and

the periodic table.) When we consider the statistical mechanics of a number of such

particles, the fact that one cannot have double occupancy of any state means that

we have Fermi-Dirac statistics. Thus particles of half-an-odd integer spin will follow

the Fermi-Dirac distribution. These particles are also referred to as fermions for this

reason.

For particles of integer spin, the wave function is totally symmetric. Thus there is

no problem with double occupancy or higher occupancy for any state. In fact, if all the

labels are identical, we get an enhancement effect, which is seen, for example, in the

stimulated emission of photons. The corresponding statistics, allowing all occupancies

with identity of particles, is given by the Bose-Einstein distribution. Thus photons,

phonons and other particles of integer spin obey the Bose-Einstein distribution, and

are often referred to as bosons.

The reason for the theorem to be called the spin-statistics theorem should be clear

from these statements.

12.2 Two-electron wave functions 114

12.2 Two-electron wave functions

It is useful to consider the two-electron wave function in some more detail. Electrons

are spin- 12 particles and in the two-electron wave function we can add the spin angular

momenta and consider states of total spin, which can be zero or 1. From (8.58) and

(8.59), the two sets of spin wave functions are

|1, 1〉 = |12 ,12〉 |

12 ,

12〉

|1, 0〉 =1√2

[|12 ,

12〉 |

12 ,−

12〉+ |12 ,−

12〉 |

12 ,

12〉]

(12.8)

|1,−1〉 = |12 ,−12〉 |

12 ,−

12〉

|1, 0〉 =1√2

[|12 ,

12〉 |

12 ,−

12〉 − |

12 ,−

12〉 |

12 ,

12〉]

(12.9)

The three states in (12.8) correspond to the total spin being 1, and (12.9) gives the

state for total spin equal to zero. The spin-1 states are obviously symmetric under

the exchange of particle labels, while the spin-zero state is antisymmetric. Since the

wave function for electrons should be antisymmetric under exchange, this means

that the orbital part should have the opposite symmetry property compared to these.

Thus if the orbital states are denoted by labels α1, α2, the set of wave functions for two

electrons (considered to be mutually noninteracting at this stage) is given by Two-electron

wave functions

Ψ(1)α1,α2

(x(1), x(2)) =1√2

[ψα1(x(1))ψα2(x(2))− ψα1(x(2))ψα2(x(1))

]

×

|12 ,

12〉 |

12 ,

12〉

1√2

[|12 ,

12〉 |

12 ,−

12〉+ |12 ,−

12〉 |

12 ,

12〉]

|12 ,−12〉 |

12 ,−

12〉

Ψ(0)α1,α2

(x(1), x(2)) =1√2

[ψα1(x(1))ψα2(x(2)) + ψα1(x(2))ψα2(x(1))

]× 1√

2

[|12 ,

12〉 |

12 ,−

12〉 − |

12 ,−

12〉 |

12 ,

12〉]

(12.10)

We have labeled the states by the superscript denoting the total spin. Notice that if

we consider two electrons in the same orbital state, i.e., α1 = α2, as in the case of the

two electrons in the ground state of the Helium atom, then only the wave function

of total spin equal to zero is nonvanishing. More generally, we see that the orbital

part has to be adjusted, according to what the spin part of the wave function is, to

maintain antisymmetry. This fact has important consequences. For example, even

purely orbital interactions such as the electrostatic repulsion between electrons with

no a priori spin-dependence can give energy corrections which depend on the total

12.2 Two-electron wave functions 115

spin state. An example of this is in ferromagnetism or antiferromagnetism, where the

alignment or anti-alignment of nearby spins on a lattice is driven by the electrostatic

Coulomb energy from the corresponding orbital part of the wave functions.

116

13 Rayleigh-Schrödinger perturbation theory

Among problems which are physically interesting, there are only a few which are

exactly solvable, in the sense that we are able to diagonalize the Hamiltonian exactly

and identify the eigenvalues and eigenstates. This means that we have to consider

most physical situations as a perturbation of suitable exactly solvable problems. There

are many cases where one cannot identify an exactly solvable problem which ap-

proximates the given physical problem, making it difficult to understand the latter

as a perturbation of a solvable problem. The understanding of the physics of such

situations from fundamental principles becomes quite challenging. But, fortunately,

there are many cases where perturbation theory is applicable. We will now develop

time-independent perturbation theory, where we consider how the energy eigenvalues

are corrected by small additional terms in the Hamiltonian and how the corresponding

eigenstates can be calculated.

13.1 Perturbation theory for nondegenerate states

We consider a Hamiltonian of the form

H = H0 + V (13.1)

whereH0 is the unperturbed Hamiltonian and V denotes the perturbation. We assume

that H0 has been diagonalized and that we know the eigenstates, so that we can write

H0 |a〉 = E(0)a |a〉 (13.2)

Here a denotes a collective index for all the quantum numbers needed to specify the

eigenstate |a〉. (Thus a may stand for n, l,m for the ideal Hydrogen atom.) E(0)a are the

unperturbed eigenvalues. The fundamental premise of perturbation theory is that the

spectrum is not radically altered. For every state |a〉we expect a state |ψa〉, which can

be related to the unperturbed states and which has a calculable energy eigenvalue as

a series in the strength of the perturbation V . Thus it is assumed that no new bound

states are formed due to the addition of V , neither are states lost due to the effect of

this term. For the states |ψa〉, we can write

(H0 + V ) |ψa〉 = Ea |ψa〉 (13.3)

Using the conjugate of (13.2) and taking the inner product of (13.3) with |b〉, we find(Ea − E(0)

b

)〈b|ψa〉 = 〈b|V |ψa〉 (13.4)

We will take the set |a〉 to be an orthonormal set of states, i.e., 〈b|a〉 = δab, but the

normalization of |ψa〉 has not yet been specified. So we will choose, for convenience,

13.1 Perturbation theory for nondegenerate states 117

the condition

〈a|ψa〉 = 1 (13.5)

We will discuss a little later how to normalize the state |ψa〉. Using (13.5), we see that

|ψa〉 has the form

|ψa〉 = |a〉+ P |φ〉 , P =∑b 6=a|b〉 〈b| (13.6)

Notice that P is a projection operator to the part of the Hilbert space orthogonal to |a〉.By setting a = b and using (13.5), we get

Ea = E(0)a + 〈a|V |ψa〉 (13.7)

This will give the perturbed energy eigenvalues once we know |ψa〉. For b 6= a, we can

rewrite (13.4) as

〈b|ψa〉 =〈b|V |ψa〉

(Ea − E(0)b )

(13.8)

We can now use this expression for 〈b|ψa〉 = 〈b|φ〉 in (13.6) to write |ψa〉 as

|ψa〉 = |a〉 − P1

H0 − EaV |ψa〉 (13.9)

The two equations (13.7) and (13.9) can be iteratively solved; this will yield the per-

turbation expansion we are seeking. Taking |ψa〉 = |a〉 and Ea = E(0)a as the lowest

approximation on the right hand side of (13.9), we find the first order correction to the

states as Perturbed wavefunction

|ψa〉 = |a〉 −∑b 6=a|b〉 〈b|V |a〉

(E(0)b − E

(0)a )

+ · · · (13.10)

We can now use this in (13.7) to get Modified energy

eigenvalues

Ea = E(0)a + 〈a|V |a〉 −

∑b6=a

〈a|V |b〉 〈b|V |a〉(E

(0)b − E

(0)a )

+ · · · (13.11)

We have obtained the correction to the eigenstates to first order, and to the energy

levels to second order, in the perturbation V . These formulae show that there could

be a problem. If we have degenerate unperturbed energies, we could have a situation

where E(0)a − E(0)

b = 0 even though |a〉 6= |b〉. The denominators in (13.10) and (13.11)

can then vanish and our expansion does not make sense, unless 〈b|V |a〉 = 0. So,

generally, since the matrix element does not necessarily vanish, this approach is valid

13.1 Perturbation theory for nondegenerate states 118

only for cases of nondegenerate states. The perturbation theory for degenerate states

is more involved and will be taken up later.

An interesting point about the expression for the correction to the energies is the

following. The second order correction to the energy is always negative for the ground

states since E(0)b − E

(0)a > 0 for |a〉 being the ground state.

It is also interesting to write down the general expression, going beyond the second

order, using the iterative solution to (13.9). We find

|ψa〉 = |a〉 − PGa V |a〉+ PGaV PGaV |a〉+ · · ·

=1

1 + PGa V|a〉 (13.12)

where

Ga =1

H0 − Ea(13.13)

Notice that 〈b|PGa V |a〉 = 〈b|Ga PV |a〉. We then use the identities

(H0 − Ea + PV ) = (H0 − Ea)(1 +GaPV )

(H0 − Ea + PV )−1 = (1 +GaPV )−1 (H0 − Ea)−1

(H0 − Ea + PV )−1(H0 − Ea) = (1 +GaPV )−1 (13.14)

Thus we may write (13.12) as

|ψa〉 =1

H0 − Ea + PV(H0 − Ea) |a〉 =

(1− 1

H0 − Ea + PVPV

)|a〉 (13.15)

For the last expression, we have used

(A+B)−1 = A−1 − (A+B)−1BA−1 (13.16)

which is an identity valid for any two operators A, B, easily verified by multiplication

by (A+B) from the left. Writing the correction to the energy as ∆a = Ea − E(0)a , we

get Perturbation toall orders

|ψa〉 =

(1− 1

H0 − E(0)a −∆a + PV

PV

)|a〉 (13.17)

∆a = 〈a|V |ψa〉

= 〈a|V |a〉 − 〈a|V 1

H0 − E(0)a −∆a + PV

PV |a〉 (13.18)

Equations (13.17) and (13.18) constitute a closed set of equations for the corrections

to the eigenstates and to the energy. The last equation is a closed form nonlinear

equation for the correction ∆a to the energy.

We now consider some applications of perturbation theory.

13.2 Helium atom: Corrections to ground state energy 119

13.2 Helium atom: Corrections to ground state energy

The neutral Helium atom has a nucleus of charge 2 with two electrons bound to it.

The calculation of the energy of the ground state (or any state) for two noninteracting

electrons is straightforward. We basically add the individual single electron bound

state eigenvalues. However, there is the Coulomb repulsion between the electrons and

this can change the energy of the ground state. Here we can use perturbation theory

to calculate the correction to the energy eigenvalue.

The Hamiltonian for the two-electron system, in the large nuclear mass limit so

that µ ≈ m, can be written as

H0 =~p1 · ~p1

2m− Ze2

r1+~p2 · ~p2

2m− Ze2

r2

V =e2

|~x1 − ~x2|(13.19)

(For us, Z = 2, but we will keep it general for now.) The ground state wave function for

each electron is of the form

ψ100 =

√Z3

πa3exp(−Zr/a) (13.20)

where a = ~2/me2 and the energy eigenvalue is E = −Z2e2/2a. See equations (10.20),

(10.21), (10.34) for these results. For the two-electron system in the ground state,

the orbital part of the wave function must be symmetric, since the antisymmetric

combination is zero. Thus we can take

Ψ(~x1, ~x2) = ψ100(~x1)ψ100(~x2)× 1√2

[|12 ,

12〉 |

12 ,−

12〉 − |

12 ,−

12〉 |

12 ,

12〉]

(13.21)

The spin part gives identity when we take the expectation value, so for the calculation

of the energy correction, we can use just the orbital part given by

Ψ(~x1, ~x2) =

√Z3

πa3exp(−Zr1/a)

√Z3

πa3exp(−Zr2/a)

=Z3

πa3exp(−Z(r1 + r2)/a) (13.22)

The unperturbed energy eigenvalue for the ground state of the Helium atom with two

electrons is thus

E(0)0 = −2

Z2e2

2a= −8

e2

2a(13.23)

for Z = 2. The first order perturbative correction to the ground state energy is given

by E(1)0 = 〈0|V |0〉. Explicitly,

E(1)0 =

(Z3

πa3

)2 ∫d3x1d

3x2e2

|~x1 − ~x2|exp(−2Z(r1 + r2)/a) (13.24)

13.2 Helium atom: Corrections to ground state energy 120

Let λ = 2Z/a. Further, we can use the Fourier representation of the Coulomb potential,

1

|~x1 − ~x2|= 4π

∫d3k

(2π)3ei~k·(~x1−~x2) 1

k2(13.25)

Using this formula we can write

E(1)0 = e2

(Z3

πa3

)2 ∫d3k

(2π)3[I(k, λ)]2

k2

= e2

(Z3

πa3

)22

π

∫ ∞0

dk [I(k, λ)]2 (13.26)

where

I(k, λ) =

∫d3x e−λr+i

~k·~x =

∫r2dr dϕ sin θ dθ e−λr+ikr cos θ

= 2π

∫r2dr

∫ 1

−1dz e−λr+ikrz,

[use z = cos θ

=

∫ ∞0

r2dr2π

ikr

[e−(λ−ik)r − complex conjugate

]=

8πλ

(k2 + λ2)2(13.27)

The energy correction now becomes

E(1)0 = e2

(Z3

πa3

)22

π(8πλ)2

∫ ∞0

dk

(k2 + λ2)4=

5

8

Ze2

a(13.28)

where we used the integral∫ ∞0

dk

(k2 + λ2)4=

32

1

λ7(13.29)

Taking Z = 2, the ground state energy including the first order correction from the

repulsion of the electrons is Helium atom:Correctedground state

energyE0 ≈(−8 +

5

2

)e2

2a= −2.75

e2

a(13.30)

Notice that the correction is significant; the value goes from −4 for two mutually

noninteracting electrons to −2.75, in units of e2/a. The experimentally measured

value of the ground state energy for Helium is

E0 exp ≈ −2.904e2

a(13.31)

The first order perturbation theory brings the energy eigenvalue to within 5.3% of the

experimental value.

13.3 The anharmonic oscillator 121

13.3 The anharmonic oscillator

The harmonic oscillator is obtained as the small oscillation approximation of any

mechanical system around a stable equilibrium point. Recall that this is obtained

from the expansion of the potential energy around the equilibrium point x0 in a Taylor

series as

V (x) = V (x0 + x− x0) = V (x0) + (x− x0)V ′(x0) +1

2!(x− x0)2 V ′′(x0)

+1

3!(x− x0)3 V ′′′(x0) +

1

4!(x− x0)4 V ′′′′(x0) + · · · (13.32)

The first term is a constant, irrelevant for our analysis, the second vanishes (V ′(x0) = 0)

since x0 is an equilibrium point. The second derivative at x0 must be positive for a

stable equilibrium point. Writing x− x0 → x (which is equivalent to choosing origin

of coordinates such that x0 = 0), the potential energy, apart from the constant term,

becomes

V (x) ≈ 1

2mω2 x2 + λx3 + g x4 + · · ·

mω2 = V ′′(x0), g =1

3!V ′′′(x0), λ =

1

4!V ′′′′(x0) (13.33)

The Hamiltonian for the system can be approximated, taking account of the two lowest

anharmonic corrections, as

H =p2

2m+

1

2mω2x2 + g x3 + λx4 (13.34)

We will treat the anharmonic terms as a perturbation to calculate the corrected energy

eigenvalues. For simplicity, we take g = 0, which would apply to a system which has

symmetry under x → −x. The first two terms in the Hamiltonian (13.34) give the

harmonic oscillator for which we can find the exact eigenvalues as in section 5. Thus

we have H = H0 + V with

H0 =p2

2m+

1

2mω2x2, V = λx4 (13.35)

As before, it is convenient to define the operators a and a† by

a =

√mω

2~x+

i√2m~ω

p

a† =

√mω

2~x− i√

2m~ωp (13.36)

This is the same as equation (5.8). In terms of these, H0 = ~ω(a†a+ 12) and

x =

√~

2mω(a+ a†) . (13.37)

13.3 The anharmonic oscillator 122

We can use this to simplify the perturbation V . Expanding out x, we will find many

terms, which can be grouped depending on the number of a’s and the number of a†’s.

Thus

V =λ~2

4m2ω2

(V4 + V †4 + V2 + V †2 + V0

)V4 = a4

V2 =(a3a† + a2a†a+ aa†a2 + a†a3

)(13.38)

V0 =(a2a†2 + aa†aa† + aa†2a+ a†2a2 + a†aa†a+ a†a2a†

)Notice that forV4 only the matrix elements 〈n− 4|V4 |n〉 can be nonzero (and 〈n+ 4|V †4 |n〉for the conjugate). Thus the subscripts in the terms given in (13.38) indicate the change

in the value of n for which nonzero matrix elements are possible. Using the relations

a |n〉 =√n |n− 1〉 , a† |n〉 =

√n+ 1 |n+ 1〉 (13.39)

we can work out the nonzero matrix elements. These are given by

〈n− 4|V4 |n〉 =√n(n− 1)(n− 2)(n− 3)

〈n+ 4|V †4 |n〉 =√

(n+ 4)(n+ 3)(n+ 2)(n+ 1)

〈n− 2|V2 |n〉 =√n(n− 1) (4n− 2) (13.40)

〈n+ 2|V †2 |n〉 =λ~2

4m2ω2

√(n+ 2)(n+ 1) (4n+ 6)

〈n|V0 |n〉 =λ~2

4m2ω2(6n2 + 6n+ 3)

For the first order correction. we need the diagonal matrix element 〈n|V |n〉, so only

V0 can contribute. The result is

∆(1)n =

λ~2

4m2ω2(6n2 + 6n+ 3) =

3λ~2

2m2ω2(n2 + n+ 1

2) (13.41)

For the second order correction we get

∆(2)n = −

∑a,a6=n

〈n|V |a〉 〈a|V |n〉E

(0)a − E(0)

n

= −

[| 〈n− 4|V4 |n〉 |2

(−4~ω)+| 〈n+ 4|V †4 |n〉 |2

(4~ω)+| 〈n− 2|V2 |n〉 |2

(−2~ω)+| 〈n+ 2|V †2 |n〉 |2

(2~ω)

]

= − λ2~3

m4ω5

[17

4n3 +

51

8n2 +

59

8n+

21

8

](13.42)

Thus the energy levels of the anharmonic oscillator with the quartic potential, to

second order in the perturbation, is given by Anharmonicoscillator:Corrected

energy tosecond order

En ≈ (n+ 12)~ω+

3λ~2

2m2ω2(n2+n+ 1

2)− λ2~3

m4ω5

[17

4n3 +

51

8n2 +

59

8n+

21

8

]+· · · (13.43)

13.4 The exchange integral and spin-spin interaction 123

Notice that for the ground state, i.e., for n = 0, the second order correction is indeed

negative as expected on general grounds.

The classical limit of this expression is worthy of comment. This limit can be

obtained by taking ~ → 0, n → ∞, keeping n~ = J fixed. Thus we expect that the

classical perturbed energies are given by Anharmonicoscillator:

Classicalperturbationtheory

E ≈ ωJ +3

2

λ

m2ω2J 2 − 17

4

λ2

m4ω5J 3 + · · · (13.44)

This is exactly the result found in classical perturbation theory in the action-angle

formulation of the problem, with J as the action variable.

13.4 The exchange integral and spin-spin interaction

Here we will discuss one type of spin-spin interaction which arises from the permuta-

tion symmetry of identical particles and which is driven by the Coulomb electrostatic

interaction. We start by considering a system of two mutually noninteracting electrons.

The wave functions for this were written in (12.10) as

Ψ(1)α1,α2

(x(1), x(2)) =1√2

[ψα1(x(1))ψα2(x(2))− ψα1(x(2))ψα2(x(1))

]

×

|12 ,

12〉 |

12 ,

12〉

1√2

[|12 ,

12〉 |

12 ,−

12〉+ |12 ,−

12〉 |

12 ,

12〉]

|12 ,−12〉 |

12 ,−

12〉

Ψ(0)α1,α2

(x(1), x(2)) =1√2

[ψα1(x(1))ψα2(x(2)) + ψα1(x(2))ψα2(x(1))

]× 1√

2

[|12 ,

12〉 |

12 ,−

12〉 − |

12 ,−

12〉 |

12 ,

12〉]

(13.45)

We now use these wave functions to calculate the first order correction to the energy

due to the Coulomb repulsion between the electrons. The electrostatic interaction is

given by

Hint =e2

|~x1 − ~x2|(13.46)

Since this does not have explicit spin dependence, the first order correction to the

energy is given by

∆E = I ∓ J (13.47)

where

I = e2

∫d3x(1)d3x(2) |ψα1(x(1))|2 1

|~x1 − ~x2||ψα2(x(2))|2

13.4 The exchange integral and spin-spin interaction 124

J = e2

∫d3x(1)d3x(2) ψ∗α1

(x(1))ψ∗α2(x(2))

1

|~x1 − ~x2|ψα1(x(2))ψα2(x(1)) (13.48)

The integral J is known as the exchange integral, clearly there is an exchange of the Exchangeintegral

labels between the wave functions and their conjugates in this expression. The first

connecting sign in (13.47) applies to the antisymmetric orbital wave function and

hence to the symmetric, total spin equal to 1, spin wave functions. The second sign

applies to the case of total spin being zero. Thus although the Coulomb energy by

itself does not have any spin dependence, the energy correction does depend on the

spin states or on the relative spin orientations because the orbital part has to adjust

according to the spin part to ensure overall antisymmetry for the wave function under

exchange. We can bring this out more explicitly, by noting that ~S1 · ~S2 has different

values for these states. For the individual spins we have ~S21 = ~S2

2 = (3/4)~2, while

(~S1 + ~S2)2 = 2 ~2 for total spin equal to 1 and (~S1 + ~S2)2 = 0 for total spin equal to zero.

Writing

~S1 · ~S2 =1

2

[(~S1 + ~S2)2 − ~S2

1 − ~S22

](13.49)

we find ~S1 · ~S2 = (1/4)~2,−(3/4)~2 for the two cases respectively. Thus we can write

1

2+ 2

~S1 · ~S2

~2=

1, s = 1

−1, s = 0(13.50)

The energy correction from (13.47) can now be written as

∆E = (I − 12J )− 2J

~2~S1 · ~S2 (13.51)

This equation clearly shows that if the exchange integral is positive, it is energetically

favorable to have the spins oriented in the same direction, while for a negative value

for J , the favored orientation is opposite. The sign of the exchange integral will, of

course, depend on the states |α1〉 , |α2〉.The spin-spin interaction given by (13.51) is the essence of ferromagnetism and

antiferromagnetism. In a solid, with several atoms, taking the orbital wave functions to

correspond to single electron wave functions associated with different atomic nuclei,

such as neighboring ones or next-to-neighboring ones on a crystal lattice, the relevant

term of the Hamiltonian takes the form Heisenberg

Hamiltonian forferromagnet

H = − 1

~2

∑ij

Jij ~Si · ~Sj (13.52)

where the subscripts i, j refer to lattice sites and the summation is over all lattice

points. The exchange integral Jij generally falls off rapidly with separation between

13.5 Spin-orbit interaction 125

the lattice points i and j, so that often a nearest neighbor approximation is adequate

and the Hamiltonian only involves the product of spin operators at neighboring lattice

points. A positive J leads to ferromagnetic ordering of spins (alignment of spins)

while a negative J gives antiferromagnetic ordering.

The identification of the exchange integral as the key physics behind ferromagnets

was originally due to Heisenberg and hence the Hamiltonian (13.52) is often referred

to as the Heisenberg Hamiltonian.

13.5 Spin-orbit interaction

We have seen the corrections to the energy level of the ground state of the Helium atom

due to the electrostatic repulsion between the two electrons. But even for a single

electron system, there are corrections and energy level splittings due to a number

of effects. These include the spin-orbit interaction, corrections due to the motion

of the nucleus, relativistic corrections for the motion of the electron, the hyperfine

interaction, etc. Here we will discuss the spin-orbit interaction.

It is easy to understand the origin of the spin-orbit interaction. Consider viewing

the dynamics of the nucleus-electron system in the rest frame of the electron. In this

frame the nucleus is moving, and since it carries a positive charge, it appears as a

current in the rest frame of the electron. This current generates a magnetic field and

the since the electron has a spin magnetic moment, there is an additional energy

−~µ · B. This is the spin-orbit interaction. More quantitatively, we can proceed as

follows. In the comoving frame of the nucleus, there is only the Coulomb field of the

nucleus. The magnetic field experienced by the electron can be obtained by a Lorentz

transformation, the velocity being−~v where ~v is the velocity of the electron. For small

velocities, this is given by

~B =~v × ~E

c(13.53)

The magnetic moment interaction is thus given by

Hint = −~µ · ~B = − e

mc~S ·

(~v × ~E

c

)

= −Ze2

mc2~S ·(~v × ~rr3

)=

Ze2

m2c2r3~L · ~S (13.54)

This is not quite the right answer. There is an additional effect which arises from

the fact that the composition of two Lorentz boosts in different directions is not just

a Lorentz boost with a combined velocity, but also generates a rotation. This extra

rotation, known as Thomas precession, gives a factor of 12 in the spin-orbit interaction,

13.6 Zeeman effect 126

so that the correct expression for Hint is Spin-orbit

interactionHamiltonian

Hint =Ze2

2m2c2r3~L · ~S (13.55)

As in the case of the gyromagnetic ratio for spin being 2, this interaction, with the

correct coefficient, emerges from the Dirac theory of the electron.

The important feature of this interaction is that it does not commute with Li or Siseparately. Thus the Hamiltonian, with this term included, cannot be diagonalized

in the same basis as L2, L3 or S2, S3 as we did before. However, the term ~L · ~S does

commute with the total angular momentum Ji = Li+Si, as is clear from the following.

[Li+Si, ~L · ~S] =∑j,k

[i~εijkLkSj + i~εijkLjSk] = i~∑j,k

εijk (LkSj + LjSk) = 0 (13.56)

since LkSj + LjSk is symmetric in j, k, and εijk is antisymmetric in the same indices.

So we can label energy eigenstates using the total angular momentum, in addition to

the principal quantum number. By expanding out J2 = (L+ S)2, we see that

~L · ~S =1

2

[J2 − L2 − S2

](13.57)

The first order correction to energy levels is given by

∆E(1)n.l.ml,ms

=Ze2

4m2c2

⟨n, l,ml,ms

∣∣∣J2 − L2 − S2

r3

∣∣∣n, l,ml,ms

⟩=

Ze2~2

4m2c2[j(j + 1)− l(l + 1)− s(s+ 1)]

⟨ 1

r3

⟩n,l

=Ze2~2

4m2c2a3n3

j(j + 1)− l(l + 1)− (3/4)

l(l + 1)(l + 12)

(13.58)

where we used the result⟨ 1

r3

⟩n,l

=⟨n, l,ml,ms

∣∣∣ 1

r3

∣∣∣n, l,ml,ms

⟩=

1

a3n3

1

l(l + 1)(l + 12)

(13.59)

Since the corrected energy levels depend on j and l, we see that the degeneracy of the

states, namely, that their energies were only a function of n, is lifted by the spin-orbit

interaction.

13.6 Zeeman effect

Zeeman effect refers to the splitting of energy levels of an atom in an external magnetic

field. This is due to the interaction of the magnetic moment of the electron with the

applied magnetic field. We have already obtained the form of this interaction term in

(11.43) as

Hint = − e

2mc(~L+ 2~S) · ~B (13.60)

13.6 Zeeman effect 127

(We have put in an extra 1/c to take account of the Gaussian units we will use for the

electromagnetic theory.) We can now calculate the correction to the energy levels

using first order perturbation theory. Given that we do have the spin-orbit interaction,

there are two regimes of interest. For weak fields, the splitting we find will be smaller

than the effect of spin-orbit interaction, so one can use the labeling of states using the

total angular momentum J . For high fields, the Zeeman splitting will be larger than

spin-orbit effects, so it should be the leading correction to the unperturbed energy

levels. In this case, we use the unperturbed states of the atom, ignoring spin-orbit

effects.

We will first consider the weak field case. The first order correction to the energy

eigenvalue is given by

∆En,l,ml,ms = − e

2mc〈n, l,ml,ms|Li + 2Si |n, l,ml,ms〉 Bi (13.61)

Notice that we do not have Ji = Li+Si in the matrix element, rather we haveLi+2Si =

Ji+Si. An important result following from the general theory of angular momentum is

the Wigner-Eckart theorem, which tells us that the matrix element of a vector operator

like Li + 2Si is proportional to the corresponding matrix element of Ji. In other words,

we can write

〈n, l,ml,ms|Li + 2Si |n, l,ml,ms〉 = gj 〈n, l,ml,ms| Ji |n, l,ml,ms〉 (13.62)

We have not proved the theorem yet, but we will take this up later. For now, assuming

the result of the theorem, we can calculate gj by considering the projection of Li + 2Si

on to Ji. Thus we write

Li + 2Si = Ji + Si = Ji +1

J2~S · ~J Ji + Sperpi

Sperpi = Si −1

J2~S · ~J Ji (13.63)

Here Sperpi is orthogonal to Ji, as can be seen from the fact that ~J · ~Sperp = 0. The

theorem tells us that we can ignore the matrix element of Sperpi for our purpose, so

that

〈n, l,ml,ms|Li + 2Si |n, l,ml,ms〉 = 〈n, l,ml,ms|(

1 +1

J2~S · ~J

)Ji |n, l,ml,ms〉

(13.64)

Now we can write Ji − Si = Li, which upon squaring gives the result

~S · ~JJ2

=J2 + S2 − L2

2J2=j(j + 1) + (3/4)− l(l + 1)

2j(j + 1)(13.65)

13.7 The atom in an electric field 128

Using this result, the shift in energy eigenvalues is given by Weak field

Zeeman effectand Landég-factor

∆Ej,n,l,ml,ms = − e

2mcgj 〈 ~J · ~B〉

= − e~2mc

gjmJ B (13.66)

gj = 1 +j(j + 1) + (3/4)− l(l + 1)

2j(j + 1)(13.67)

Here mJ is the eigenvalue of J3. We considered a magnetic field along the third axis

in (13.66), for simplicity. To emphasize that the energy shift depends on the j-value

resulting from combining Li and Si, we have added a subscript j to ∆E. gj is known

as the Landé g-factor. The possible values of j are j = l + 12 and j = l − 1

2 . For these

cases, the Landé g-factor can be evaluated as

gj =

1 + 12l+1 , j = l + 1

2

1− 12l+1 , j = l − 1

2

(13.68)

Turning to the strong field case, the situation is much simpler. We can ignore

spin-orbit interaction as a first approximation, so L3 and S3 can be simultaneously

diagonalized with the Hamiltonian H0, and so we can use these to label the states. For

a magnetic field along the third axis, we then get Strong fieldZeeman effect

∆En,l,ml,ms = − e~2mc

(ml + 2ms)B (13.69)

13.7 The atom in an electric field

There are some physically interesting results which can be obtained by considering

an atom in an electric field. We can consider fields which are slowly varying over

the extent of an atom, so that we can take the field to be uniform for the purpose

of applying perturbation theory to the atomic states. We will also consider time-

independent fields, so that the contribution of the electromagnetic vector potential to

the electric field is zero. The interaction energy of the electron is thus given by

Hint = −e φ = e ~Eapp · ~x (13.70)

where φ is the electrostatic potential and we have used the fact that φ = −~Eapp · ~x for a

constant (i.e., uniform in space) electric field. The superscript app is meant to specify

that this is the applied field, since we will be dealing with related electric fields shortly.

We can now use the general formulae and write down the correction to the energy

eigenvalues. Up to the second order, we find

∆Ea = eEappi 〈a|xi |a〉 − Eappi Eappj

∑b 6=a

〈a|xi |b〉 〈b|xj |a〉E

(0)b − E

(0)a

+ · · · (13.71)

13.7 The atom in an electric field 129

In most of the situations of interest, the unperturbed states have definite transforma-

tion property under ~x→ −~x, i.e., under space reflection or parity. For example, for the

unperturbed eigenstates of the Hydrogen atom, the eigenfunctions are of the form

ψ = R(r)Y ml (θ, ϕ) (13.72)

where the spherical harmonics Y ml are polynomials of order l in x = (~x/r). The

simplest way to see this is to realize that the spherical harmonics form a basis for

functions on the sphere of unit radius which can be described by x = ~x/r. Thus, we

can choose as a basis a constant, xi, xixj − δij/3, etc. The correspondence is

Y 00 ←→ constant

Y m1 ←→ xi (13.73)

Y m2 ←→ xixj −

δij3

(The extra δij/3 term in the last expression is to ensure that Y m2 is orthogonal to Y 0

0 .)

We see that Y ml (−x) = (−1)l Y m

l (x). Thus under ~x → −~x, ψ → (−1)lψ. States with

odd values of l are odd under parity, i.e., they get a minus sign, while states with even

values of l are even under parity.

As for the matrix element of xi, we can write

〈a|xi |a〉 =

∫d3xψ∗(~x)xi ψ(~x) =

∫d3y ψ∗(~y) yi ψ(~y)

=

∫d3xψ∗(−~x) (−xi)ψ(−~x) =

∫d3x (−1)lψ∗(~x) [−xi] (−1)lψ(~x)

= −∫d3xψ∗(~x)xi ψ(~x) = −〈a|xi |a〉 (13.74)

The matrix element 〈a|xi |a〉 is thus zero for states |a〉 of definite parity. We have used

the change of variable of integration ~y = −~x. Notice also that the change of sign for

d3y is compensated by the change of limits of integration, as in∫ L

−Ldxf(x) =

∫ L

−Ldyf(y) =

∫ −LL

(−dx)f(−x) =

∫ L

−Ldxf(−x) (13.75)

We can conclude that there is no shift in the energy eigenvalue to linear order in Efor states of definite parity, if there is no degeneracy. The shift of energy levels in an

electric field is known as Stark effect. So we can equivalently state that there is no

linear Stark effect in the absence of degeneracy. We have used perturbation theory for

nondegenerate states, if we have degenerate states the conclusion can be altered. We

will consider this a little later after discussing the second order correction.

The second order correction to the energy (which leads to the quadratic Stark

effect) can be written as

∆E(2)a = −1

2αijEappi E

appj

13.7 The atom in an electric field 130

αij = 2∑b 6=a

〈a|xi |b〉 〈b|xj |a〉E

(0)b − E

(0)a

(13.76)

The quantity αij is known as the polarizability of the physical system for the state Polarizability

|a〉. The calculation of ∆E(2)a is not very easy in most cases, because it involves a

summation over all states |b〉. For the simple case of the Hydrogen atom, for the

ground state, one can show that Quadratic Stark

effect

∆E(2)100 = −9

4a3(Eapp)2 (13.77)

The correction has been evaluated for a number of other cases as well.

The polarizability is also an important quantity for the electrical properties of a

material system. If we have several atoms, say n per unit volume, the energy difference

due to this shift may be written as

∆E = −1

2

∫d3x nαijEappi E

appj (13.78)

To relate this to familiar properties of matter, consider the set-up shown in figure.

There are two capacitor plates with free charges on them as shown, which produce

a certain electric field Eapp. The atom, shown as a blue distribution of charges is

electrically neutral, but undergoes a small deformation with the positive charges

trying to move towards the negative plate, and the negative charges moving towards

the positive plate. This produces a small electric field E ′ counter to the applied field.

The net field in the vicinity of the atom is Eapp − E ′. This is the field that a probe would

measure near the atom, so we call this the electric field E . E ′ itself is the response of the

atom to the electric field in its vicinity, so for small fields, i.e., for small perturbations,

we can write E ′i = nαijEj , where αij is some coefficient characteristic of the atom and

n is the density of atoms in the material. Using this, we find

Ei = Eappi − E ′i = Eappi − nαijEj (13.79)

We may equivalently write this equation as

Eapp = (δij + nαij) Ej ≡ εijEj (13.80)

+

+

+

−−

Eapp

− +

E ′

Figure 13.1: Polarizing an atom in an applied external electric field

13.8 Degenerate state perturbation theory 131

The field Eapp is what is produced by the free charges, the ones on the capacitor plates,

ignoring the bound charges in the atom. So it is what is called the displacement field~D in electromagnetic theory. Thus the equation given above states that Di = εijEj ,identifying εij = δij + nαij as the dielectric constant. (It is generally a tensor.) The Polarizability

and dielectric

constantelectrical energy is given by

1

2

∫d3xDi(ε

−1)ij Dj ≈1

2

∫d3x

[D2 −DiαijDj + · · ·

]≈ 1

2

∫d3x(Eapp)2 − 1

2

∫d3xnαij Eappi E

appj + · · · (13.81)

Comparing with (13.78), we see that αij defined by the condition E ′i = nαijEj is indeed

the polarizability αij defined by (13.76). Thus (13.76) gives us a way to calculate

the dielectric constant of a material in terms of the underlying atomic content. For

many materials, there is isotropy which implies that αij = α δij , so that the dielectric

constant may be taken as a scalar.

13.8 Degenerate state perturbation theory

We will now consider perturbation theory for the case when there is degeneracy for

the unperturbed states. What we did earlier does not quite work, since there is the

potential for the denominators E(0)b − E

(0)a to vanish. The key observation for formu-

lating perturbation theory for degenerate states is that, when we have degeneracy, a

linear combination of the degenerate states will also have the same energy. The per-

turbation could lead to such a mixing of the states. Let |a, i〉 denote a set of states, with

the unperturbed energy eigenvalue E(0)a , with i = 1, 2, · · · , N denoting the distinct

degenerate states. Then as the ansatz for the perturbed states, we write

|ψa,i〉 =∑j

Cji |a, j〉+ |φ(1)a,i 〉+ · · · (13.82)

Correspondingly, we write

Ea,i = E(0)a,i + E

(1)a,i + · · · (13.83)

The eigenvalue equation for H = H0 + V becomes

(H0+V )

∑j

Cji |a, j〉+ |φ(1)a,i 〉+ · · ·

=[E

(0)a,i + E

(1)a,i + · · ·

] ∑j

Cji |a, j〉+ |φ(1)a,i 〉+ · · ·

(13.84)

Splitting this into different orders of perturbation, we can write this down as the

separate set of equations,

H0

∑j

Cji |a, j〉 = E(0)a,i

∑j

Cji |a, j〉

13.8 Degenerate state perturbation theory 132(H0 − E(0)

a,i

)|φ(1)a,i 〉 = −

(V − E(1)

a,i

)∑j

Cji |a, j〉 (13.85)

We have not indicated the second and higher order terms explicitly. Notice that the

first of these equations is automatically satisfied since H0 |a, j〉 = E(0)a,i |a, j〉 because of

the degeneracy. In the earlier analysis we solved the second one by taking the inner

product with the unperturbed states 〈b| to get the energy difference E(0)b − E

(0)a on

the left hand side and then dividing out by this factor to identify |φ(1)a 〉. In the present

case, the left hand side would vanish if we take the inner product with 〈a, k|. Thus

consistency of the second equation in (13.85) would require∑j

〈a, k|V |a, j〉 Cji − E(1)a,i Cki = 0 (13.86)

This is equivalent to writing a series of eigenvalue equations,∑j

Vkj Cj1 = E(1)a,1 Ck1∑

j

Vkj Cj2 = E(1)a,2 Ck2 (13.87)

∑j

Vkj Cj3 = E(1)a,3 Ck3, etc.

where we have written 〈a, k|V |a, j〉 = Vkj for brevity. This equation shows that the

first order energy corrections are the eigenvalues of the matrix Vkj . The eigenvectors

give the coefficients Cji to be used for the construction of the perturbed states. So

the strategy for degenerate state perturbation theory is to use the unperturbed states

to write the matrix Vkj and the calculate its eigenvalues and eigenstates. Generally

speaking, the degeneracies are lifted, i.e., the new energy eigenvalues E(0)a,i + E

(1)a,i are

not all the same, so one can consistently calculate the second order perturbation. We

will not need these for most of our calculations, so will not pursue the higher order

corrections for the degenerate case.

The simplest example of degenerate state perturbation theory is to consider two

degenerate states, both with the unperturbed energy eigenvalue E(0). The matrix Vkjis a 2× 2 matrix for this case, so the eigenvalue equations (13.87) are of the form[

V11 − E(1) V12

V21 V22 − E(1)

] (C1

C2

)= 0 (13.88)

One can easily see that the eigenvalues are given by

E(1)± =

1

2Tr(V )± 1

2

√(TrV )2 − 4 detV (13.89)

13.9 Linear Stark effect 133

where TrV = V11 + V22 and detV = V11V22 − V12V21. We see that the two degenerate

states now have different energies given byE(0)+E(1)+ andE(0)+E

(1)− . Correspondingly,

the coefficients Cji are given by(C11

C21

)=

1√(V22 − E(1)

+ )2 + V12V21

(V22 − E(1)

+

−V21

)(C12

C22

)=

1√(V22 − E(1)

− )2 + V12V21

(V22 − E(1)

−−V21

)(13.90)

There are many examples of such a system where the diagonal elements V11, V22

are zero. In this case, the energy eigenvalues become E(q1)± = ±

√V12V21 = ±|V12| and

the corrected states are of the form

|ψa,1〉 =1√2

(|a, 1〉+ e−iα |a, 2〉

)|ψa,2〉 =

1√2

(|a, 1〉 − e−iα |a, 2〉

)(13.91)

eiα =V12

|V12|

The linear Stark effect, which we discuss next, is an example of this type.

To summarize, to find the corrections to degenerate states, we must first construct

the matrix

(Hint)ij = 〈a, i|Hint |a, j〉 =

∫d3xψ∗a,iHint ψa,j (13.92)

This is a finite dimensional, say N ×N , matrix where N is the number of degenerate

states, i.e., i, j = 1, 2, · · · , N . We then find the eigenvectors and eigenvalues of this

matrix. The eigenvalues will be the first order correction to the energy levels, the eigen-

vectors will define the matrix of coefficients Cji needed to construct the perturbed

eigenstates as in (13.82).

13.9 Linear Stark effect

A classic example of perturbation theory for degenerate states is the linear Stark effect.

A simple case is that of the n = 2 states of the Hydrogen atom. As we have seen

before, the states |2, 0, 0〉 and the three states |2, 1, 0〉 , |2, 1,±1〉 are all degenerate for

the unperturbed HamiltonianH0 = (p2/2m)−(e2/r). Consider again the perturbation

Hint = e~E · ~x. For simplicity, we will take the electric field to be along the z-axis, so

that the interaction Hamiltonian is Hint = eE z. The matrix for Hint for the states with

n = 2 will be a 4× 4 matrix. But several matrix elements will turn out to be zero. First

13.9 Linear Stark effect 134

let us consider the diagonal matrix elements. Since the states have definite parity, by

the argument given after (13.72), all diagonal elements are zero; i.e.,

〈n, l,ml| z |n, l,ml〉 = 0 (13.93)

Next consider an element of the form 〈n, l,ml| z |n, l′,m′l〉where there is a mismatch

between the ml-values for the two wave functions. From the spherical harmonics, we

see that the ϕ-dependence is of the form

ψn,l,ml∼ eimlϕ (13.94)

On the other hand, z has no ϕ-dependence, z = r cos θ. Thus using (13.94),

ψ∗n,l,mlz ψn,l′,m′l ∼

∫dϕ ei(m

′l−ml)ϕ = 0, if m′l 6= ml (13.95)

The ϕ-integration is sufficient to ensure the vanishing of the matrix element, so we

have not indicated the integrations over θ, r and the corresponding terms of the wave

functions. The only potentially nonvanishing matrix element is thus 〈2, 0, 0| z |2, 1, 0〉and its conjugate. The relevant wave functions are

ψ200 =1√

32πa3

(2− r

a

)e−r/2a

ψ210 =1√

32πa3

(ra

)cos θ e−r/2a (13.96)

a =~2

me2

After carrying out the trivial ϕ-integration, the relevant integral is

〈2, 0, 0| z |2, 1, 0〉 =1

16a4

∫ ∞0

dr r4(

2− r

a

)e−r/a

∫ π

0dθ sin θ cos2 θ

=1

16a4a5 (2× 4!− 5!)× 2

3= −3 a (13.97)

The matrix Hint for the states |2, l,ml〉 can now be written as

Hint = −3aeE

0 1 0 0

1 0 0 0

0 0 0 0

0 0 0 0

(13.98)

The correspondence of states and matrix labels is

|2, 0, 0〉 ←→ |1〉

|2, 1, 0〉 ←→ |2〉

13.9 Linear Stark effect 135

|2, 1, 1〉 ←→ |3〉 (13.99)

|2, 1,−1〉 ←→ |4〉

The eigenvalues of the matrix Hint are ±3ae|E|, 0, 0. The states |2, 1,±1〉 are not cor-

rected to this order in the perturbation, while the states |2, 0, 0〉 and |2, 1, 0〉 combine

into states with eigenvalues E(0)2 ± 3ae|E|. We see that one combination is shifted up

in energy and one combination is shifted down. More specifically, the corrected states

given by the eigenvectors of Hint are Hydrogen atom:Linear Stark

effect for n = 2

|α〉 =1√2

(|2, 0, 0〉+ |2, 1, 0〉) , E(1) = 3ae|E|

|β〉 =1√2

(|2, 0, 0〉 − |2, 1, 0〉) , E(1) = −3ae|E| (13.100)

136

14 The variational method

14.1 Formalism of variational approach

We will now discuss another very useful approximation technique, the variational

method. The key idea here is the following. For any wave function ψ, the expectation

value of the Hamiltonian is

〈E〉 =

∫ψ∗H ψ

I, I =

∫ψ∗ψ (14.1)

We have not assumed that ψ is normalized, that is why we divide by I =∫ψ∗ψ. Now

consider extremizing this expectation value by varying ψ and ψ∗. Setting the variation

of 〈E〉 to zero, we find

1

I

[∫δψ∗(Hψ − 〈E〉ψ) +

∫(Hψ∗ − 〈E〉ψ∗)δψ

]= 0 (14.2)

Since ψ and ψ∗ can be varied independently, we see that the extrema correspond to

eigenfunctions of H , obeying

Hψ = 〈E〉ψ (14.3)

The ground state will have the lowest eigenvalue, say E0, and we can write∫ψ∗H ψ

I≥ E0 (14.4)

To get the lowest eigenvalue, we are extremizing 〈E〉 over all square-integrable func-

tions (consistent with the required boundary conditions) and finding the absolute

minimum. This requires varying an infinite number of parameters since an arbitrary

function has an infinite number of parameters defining it. The simplest way to see

this is to consider the expansion of an arbitrary square-integrable function ψ in a basis

as ψ =∑

n bn un(x). The parameters bn define the function and we can vary them to

find the absolute minimum to get E0.

The strategy for the variational method is to take an ansatz for ψ, chosen to be

consistent with general expected behavior, and depending on a finite number of

parameters, say ak. We then calculate 〈E〉 and vary these parameters to find the

minimum. Clearly this does not necessarily give E0 (unless we are so lucky that our

ansatz is the actual ground state eigenfunction) because we are only varying over a

finite set of parameters, not the full set with infinite number of parameters. But we

will have 〈E〉 ≥ E0. The hope is that, with an intelligent guess, we can get fairly close

to the actual value of E0, even though we are only varying a few parameters.

14.2 Ground state of the Helium atom 137

14.2 Ground state of the Helium atom

A good example of the variational approach is the Helium atom where we take account

of the inter-electron repulsion. We have already treated this in perturbation theory in

section 13.2. The Hamiltonian for our problem is

H0 =~p1 · ~p1

2m− 2e2

r1+~p2 · ~p2

2m− 2e2

r2

V =e2

|~x1 − ~x2|(14.5)

We have used the fact that Z = 2 for Helium. The unperturbed ground state wave

function for each electron is of the form

ψ100 =

√Z3

πa3exp(−Zr/a) (14.6)

where a = ~2/me2. Here too, we should use Z = 2, but for the variational method

we will keep it as a free parameter to be determined by extremization of 〈E〉. The

reasoning is as follows. Each electron moves in the field of the nucleus and the other

electron which can partially cancel out the nuclear charge. So it may be physically

reasonable to take each electron as moving independently of the other electron, but

in a field with a slightly diminished value of Z. With this idea, we take our ansatz for

the ground state wave function as

Ψ(x1, x2) =

(Z3

πa3

)exp(−Z(r1 + r2)/a) (14.7)

With this wave function⟨~p1 · ~p1

2m

⟩=

⟨~p2 · ~p2

2m

⟩=

~2

2ma2Z2 =

e2

2aZ2⟨

2e2

r1

⟩=

⟨2e2

r2

⟩= 2

e2

aZ⟨

e2

|~x1 − ~x2|

⟩=

5

8

e2

aZ (14.8)

The last result was obtained in (13.27), (13.28). Notice that the Hamiltonian has 2e2/r

for the electrostatic potential energy. This is fixed by the physical system, the only

parametric freedom is in the wave functions, so only Z in the wave function can be

treated as a free parameter. Combining results (14.8), the expectation value for energy

is

〈E〉 =e2

a

(Z2 − Z(4− (5/8))

)=e2

a

(Z2 − 27

8Z

)(14.9)

14.3 Another example: |x| potential 138

The minimum value occurs for Z = Z∗ = (27/16) and the corresponding value for the

energy is Helium atom:Variational

estimate ofground stateenergy

E∗ = −e2

a

(27

16

)2

≈ −2.85e2

a(14.10)

For comparison, we have the values

Eexp = −2.904e2

a, Epert = −2.75

e2

a(14.11)

The result from perturbation theory is up to first order in the inter-electron interaction.

The variational estimate in (14.10) is closer to the experimental value. We also see that

the minimum is at Z∗ = (27/16) which is a little lower than 2, as expected from the

screening arguments presented earlier.

In many instances, the variational method, with a good ansatz based on physical

reasoning, can give a good estimate for the ground state energy. By considering more

general ansatze with more free parameters, one can improve the estimate as well. In

perturbation theory there are ways to estimate the error one incurs in stopping at a

certain order. The main problem with the variational method is that there is no way to

estimate the error for a chosen ansatz.

14.3 Another example: |x| potential

As another example of the variational method, we consider a one-dimensional prob-

lem with the potential energy V (x) = λ |x|. The potential is zero at the origin and rises

linearly in both directions. Thus we expect that the ground state wave function will

be peaked near the origin and fall off as x→ ±∞. Since the potential is symmetric, a

good ansatz for the ground state might be

ψ = e−ax2/2 (14.12)

where a is a constant to be determined by minimizing the energy. We then find

I =

∫dxψ∗ψ =

∫dx e−ax

2=

√π

a⟨p2

2m

⟩=

1

I~2

2m

∫dψ∗

dx

dx=

1

I~2

2m

∫dx a2x2e−ax

2

=~2a2

2mI

(− ∂

∂a

)√π

a=

~2a

4m(14.13)

〈V 〉 =1

I

∫dxλ|x| e−ax2 =

I

∫ ∞0

dxxe−ax2

=λ√πa

This gives the expectation value for the energy as

〈E〉 =~2a

4m+

λ√πa

(14.14)

14.3 Another example: |x| potential 139

The minimum occurs at a∗ given by

∂〈E〉∂a

=~2

4m− λ

2√πa− 3

2∗ = 0, =⇒ a∗ =

(2mλ

~2

) 23

π−13 (14.15)

The variational estimate of the ground state energy is thus

E∗ = 〈E〉]a=a∗

=3

23

(~2

2πm

) 13

(14.16)

140

15 Scattering

15.1 Basic framework and the Born approximation

Scattering is the technique used to obtain information about most of the interactions

between particles. The general set-up is as follows. We have an incoming stream of

particles which scatter from a target (which may be another particle, atom, nucleus,

etc.). Detectors placed at various angles relative to the incoming direction detect the

scattered particles. The key quantity of interest in this case is the scattering cross

section which we will now define. Assume that we designate the incoming direction

of the beam of particles by the unit vector n = (0, 0, 1); i.e., we take the incoming

particles to be moving along the z-axis. Consider a detector placed at an angle (θ, ϕ)

relative to the incoming direction. This direction corresponds to the unit vector

r = (sin θ cosϕ, sin θ sinϕ, cos θ). Let the detector have an area d~S = r r2dΩ covering a

small solid angle dΩ. Now we have the equation for the conservation of probability

which was obtained in (7.33) as

∂ρ

∂t= −∇ · ~J

~J = − i~2m

[ψ∗∇ψ − (∇ψ∗)ψ] (15.1)

where ρ = ψ∗ψ. If we integrate this over a volume V , we get

∂t

∫Vd3x ρ = −

∮∂V

~J · d~S (15.2)

(This was also obtained earlier as equation (7.34).) Equation (15.2) shows that the

probability of scattering into the detector placed in the direction n per unit time

(i.e., the probability rate for scattering) is given by ~J · d~S where ~J corresponds to the

scattered part of the wave function. The differential cross section for scattering is

defined as Differentialscattering cross

section

dσ =Probability rate of scattering into solid angle dΩ in the direction r

Incident flux(15.3)

The incident flux is the probability which crosses unit area per unit time of the incident

beam. This can be written as ~Jinc · n, while the one for the scattered wave is r2 ~Jscat · rdΩ.

We are interested int he detector placed far away from the scattering center, so we can

take large r limit in calculating this quantity. The differential cross section is thus

dσ =

[r2| ~Jscat · r|| ~Jinc · n|

]r→∞

dΩ (15.4)

The cross section has the dimensions of area, as should be clear from this equation.

The cross section is the quantity which captures the essence of the interaction for the

15.1 Basic framework and the Born approximation 141

particles independent of the set-up used for the scattering. The incident flux may vary

from set-up to set-up, and the amount of time for which the experiment is run also

varies. dσ eliminates such dependences by dividing out by the incident flux. This also

means that, theoretically, we can calculate dσ using any suitable framework, choosing

normalizations for the wave functions in a convenient way.

Notice that from the definition, the total number of events at the detector at n is

obtained as How to use thecross section

Number of events in dΩ at n = dσ ×Number of particles per unit area per unit time

in incident beam for relevant experimental facility

× running time of experiment (15.5)

So the number of events measured at any facility can be directly related to the theoret-

ical calculation of dσ and, from this, we can obtain information about the interaction.

To initiate the theoretical discussion, if the interaction between the incoming

particles and the scattering center is denoted by a potential V (x), the wave functions

must satisfy the Schrödinger equation which we can write as(∇2 +

2mE

~2− 2mV (x)

~2

)ψ = 0 (15.6)

We will consider the case where the interaction is of short range, so that as |~x| → ∞,

V (x)→ 0. Thus for large values of |~x|, the above equation reduces to(∇2 +

2mE

~2

)ψ ≈ 0 (15.7)

This equation certainly holds for the incident beam when it is far way from the scatter-

ing center. The solution is of the form

ψ0(x) = Aei~k·~x (15.8)

where k2 = 2mE/~2. Since we have free particles far way from the scattering center, E

is necessarily positive, so k is real.

We will first consider a perturbative calculation of the scattering amplitude. Thus

we can ask for a solution for ψ obeying (15.6) with ψ0 as the unperturbed solution. The

simplest way to formulate this is to use Green’s functions for the differential operator

∇2 + k2. (This operator is often called the Helmholtz operator.) Green’s functions are Helmholtz

operatorgenerally inverses of differential operators. For the relevant operator for us, we define

(∇2x + k2)G(x, x′) = δ(3)(x− x′) (15.9)

In terms of this, we can convert the differential equation (15.6) to an integral equation,

ψ(x) = ψ0(x) +

∫d3x′G(x, x′)

[2mV (x′)

~2

]ψ(x′) (15.10)

15.1 Basic framework and the Born approximation 142

Applying∇2x + k2 to this equation and using (15.9) and the fact that (∇2 + k2)ψ0 = 0,

we see that we do indeed reproduce (15.6).

The simples way to calculate the Green’s function G(x, x′) is to use the Fourier

representation

δ(3)(x− x′) =

∫d3p

(2π)3ei~p·(~x−~x

′)

G(x, x′) =

∫d3p

(2π)3G(p) ei~p·(~x−~x

′) (15.11)

Using this representation in (15.9), we find

G(x, x′) = −∫

d3p

(2π)3

1

p2 − k2ei~p·(~x−~x

′) (15.12)

Notice that there could be a problem for the integration since there is a singularity at

p2 = k2. The simplest way to deal with this is to consider a slightly modified version

where the singularity is moved off the real values of p2 by a small amount iε, ε > 0 and

then set ε to zero at the end of the calculation. Thus we consider the modified version

G(x, x′) = −∫

d3p

(2π)3

1

p2 − k2 − iεei~p·(~x−~x

′) (15.13)

There are other ways to shift the singularities. For example, we can consider p2−k2 +iε,

or p2 − (k + iε)2, etc. Since (15.9) is a second order differential equation, there will be

two independent solutions, corresponding to two ways of shifting the singularities.

Other choices will end up as linear combinations of the basic two ways. For us, we can

take these two independent ways as corresponding to using p2 − k2 ∓ iε.We now evaluate G(x, x′) in (15.13) as follows. Writing R = |~x− ~x′|,

G(x, x′) = − 1

8π3

∫ ∞0

p2dp sin θdθ dϕ1

p2 − k2 − iεeipR cos θ

= − 1

4π2

∫ ∞0

pdp1

p2 − k2 − iε

[eipR − e−ipR

iR

]= − 1

4π2iR

[∫ ∞0

dp peipR

p2 − k2 − iε−∫ −∞

0dp p

eipR

p2 − k2 − iε

]=

i

4π2R

∫ ∞−∞

dp peipR

p2 − k2 − iε

=1

4π2R

∂R

∫ ∞−∞

dpeipR

p2 − k2 − iε(15.14)

In the third line of this equation, we made a change of variables p→ −p. The remaining

integration in (15.14) can be done by using complex integration. We take p to be a

complex variable. The contour of integration is chosen to be along the real axis and it

is completed into a closed contour by a large semicircle in the upper half-plane. Since

15.1 Basic framework and the Born approximation 143

eipR → 0 exponentially for positive imaginary part for p, completion of the contour in

the upper half-plane guarantees that there is no contribution from the large semicircle

and the result is just what we need, namely the integral along the real axis. The poles

from the denominator are at p = ±√k2 + iε = ±k ± iη, η = ε/2k > 0. Thus for the

contour under consideration, the only pole which contributes is k + iη, giving the

result∫ ∞−∞

dpeipR

p2 − k2 − iε= 2πi

1

2keikR (15.15)

The Green’s function is thus Green’s functionfor Helmholtz

operator

G(x, x′) = − 1

4π|~x− ~x′|eik|~x−~x

′| (15.16)

Notice that for large |~x|, this corresponds to outgoing spherical waves. The other

choice of using p2 − k2 + iε will yield incoming spherical waves. For the scattering

problem, after scattering, we are interested in the outgoing scattered waves, so for us,

the contour chosen by using p2 − k2 − iε is the right one.

The integral equation (15.10) now becomes Integralequation for

scattering

ψ(x) = ψ0(x)− 1

∫d3x′

1

|~x− ~x′|eik|~x−~x

′|[

2mV (x′)

~2

]ψ(x′) (15.17)

We can solve this equation iteratively by writing ψ = ψ0 + ψ1 + ψ2 + · · · where ψncorresponds to the modification due to n powers of the potential V . For the first

correction we find

ψ1 = − 1

∫d3x′

1

|~x− ~x′|eik|~x−~x

′|[

2mV (x′)

~2

]ψ0(x′) (15.18)

We are interested in the large r behavior of this function. For this, we use

|~x− ~x′| =√r2 − r′2 − 2~x · ~x′ = r

√1 + r′2 − 2

r · ~x′r≈ r − r · ~x′ (15.19)

Using this result,

ψ1 ≈ −1

4πreikr A

∫d3x′

[2mV (x′)

~2

]ei(~k−~k′)·~x′ (15.20)

In this equation, we used (15.8) for ψ0(x′). Further, since we are observing the particles

coming out in the direction r, kr = ~k′, the wave vector of the outgoing waves. We have

used this result in writing out the exponential factor in (15.20). Notice however that

the magnitude of ~k′ is the same as that for ~k, ~k′ · ~k′ = k2.

15.2 Scattering by Yukawa and Coulomb potentials 144

Equation (15.20) gives the first approximation to the scattered wave, Born

approximation

ψ1 ≈ Aeikr

rf(~k,~k′)

f(~k,~k′) = −[ m

2π~2

] ∫d3x′ ei(

~k−~k′)·~x′ V (x′) (15.21)

This first approximation to the scattered wave is known as the Born approximation.

Apart from constant factors, f is the Fourier transform of the potential, ~k − ~k′ being

the Fourier variable. The probability currents are given by

~Jinc =~~km|A|2 =

~kmn|A|2

~Jscat =1

r2

~krm|A|2 |f(~k,~k′)|2 (15.22)

The differential scattering cross section is thus given by

dσ = |f(~k,~k′)|2 dΩ (15.23)

15.2 Scattering by Yukawa and Coulomb potentials

As the first example of scattering by a potential, we consider the Yukawa potential

V (x) = ge−λr

r(15.24)

This is evidently a short range potential as it falls of exponentially with distance. A

measure of the effective range of the interaction is given by (1/λ). This potential was

originally suggested as the potential for internucleon interactions inside the nucleus,

which was expected to be of short range, of the order of the size of a nucleus. Yukawa

argued (around 1935) that such a potential could be generated by the exchange of

a particle of mass m ≈ ~λ/c, which is about 140 MeV based on the known range of

nuclear forces. This was Yukawa’s prediction of the π-meson which was confirmed by

direct detection of the meson in 1948.

The Fourier transform of this potential is given by∫d3x′ei~q·~x

′V (x′) = g

∫r′dr′e−λr

′eiqr

′ cos θ sin θdθdϕ

= 2πg1

iq

[∫ ∞0

dr′e−(λ−iq)r′ − complex conjugate

]=

4πg

q2 + λ2(15.25)

For the amplitude f(~k,~k′), ~q = ~k − ~k′, so that

q2 = k2 + k′2 − 2~k · ~k′ = 2k2 − 2k2 cos θ = 4k2 sin2(θ/2) (15.26)

15.2 Scattering by Yukawa and Coulomb potentials 145

Using (15.25 15.26), the amplitude from (15.21) is given by

f(~k,~k′) = −2gm

~2

1

4k2 sin2(θ/2) + λ2(15.27)

This gives the scattering cross section Yukawapotential: cross

section forscatteringdσ =

(2gm

~2

)2 [ 1

4k2 sin2(θ/2) + λ2

]2

dΩ (15.28)

Sometimes we are interested in the total number of events detected over all angles, i.e.,

by all detectors placed so as to cover the entire solid angle of 4π. The relevant quantity

for this is the total cross section given by integrating over all angles. For the Yukawa

potential it is given by

σ =

∫dσ =

(2gm

~2

)2 4π

λ2(4k2 + λ2)(15.29)

The Coulomb potential is a special case because the potential is not of short range.

Therefore a proper treatment of scattering by a Coulomb potential needs a more

careful analysis. The Schrödinger equation for the Coulomb potential is essentially

the same as what we solved for the bound states of the Hydrogen atom, except that

we must now look for solutions with positive energy rather than negative energy.

The fact that the Coulomb potential does not fall off at a fast enough rate to be

considered a short range interaction means that there is nontrivial modification of

the phase of the wave functions even at spatial infinity. The scattering wave solutions

of the Schrödinger equation, taking account of such modifications at infinity, have

been worked out, originally by Mott, leading to a proper treatment of scattering by

a Coulomb potential. Here, instead of working through this approach, we will use a

trick to get the answer. We consider the Yukawa potential and take λ → 0. We take

g = q1q2 so that we can compare with the scattering of two particles of charges q1, q2

against each other. The differential scattering cross section is thus given by Coulombscattering:Rutherford

cross sectiondσ =(q1q2

~2

)2m2 1

4k4 sin4(θ/2)dΩ

= (q1q2)2 1

16E2 sin4(θ/2)dΩ (15.30)

In the second line, we have expressed the result in terms ofE = ~2k2/(2m). This result

is known as the Rutherford cross section. It was derived within classical electrody-

namics by Rutherford in 1911 and formed the basis for interpreting the results on

scattering of α-particles by a gold foil, leading to the Rutherford-Bohr model of the

atom. Notice that all powers of ~ have disappeared in the result when expressed in

terms of the energy, consistent with a classical derivation.

15.3 Another short range potential 146

While the differential cross section is correct and has been verified in many ex-

periments, we do have a problem if we consider the total cross section. The limit of

λ→ 0 does not exist for (15.29). This is again because of the long range nature of the

Coulomb potential. It is also related to the fact that photons have zero mass, which

means that it is possible to have photons with almost zero energy corresponding to

very long wavelengths. A charged particle which is scattered has acceleration and

therefore can radiate photons. It becomes difficult to separate scattering with no

radiation of photons from scattering with radiation of photons of almost zero energy,

if the energy resolution of the detector is not perfect. This is the key problem. As a

result, the finite resolution of the detector plays the role of a nonzero, albeit very small,

value for λ. In this way, for the total cross section we get a finite result, but which is

determined by the resolution of the detector.

15.3 Another short range potential

As another example of scattering by a short range potential, within the Born approxi-

mation, we consider Scattering:

Gaussianpotential

V (r) = V0 e−ar2 (15.31)

where V0 and a are constants. In this case, the scattering amplitude is given by

f(~k,~k′) = −V0

[ m

2π~2

] ∫d3x′ ei(

~k−~k′)·~x′ e−ar2

= −[mV0

2π~2

] ∫d3x′ e−a(~x′−i ~q

2a)2 e−q

2/4a

= −[mV0

2π~2

](πa

) 32e−q

2/4a (15.32)

The differential scattering cross section is thus

dσ = |f(~k,~k′)|2dΩ =πm2V 2

0

4a3~4exp

(− q

2

2a

)=

πm2V 20

4a3~4exp

(−2k2

asin2(θ/2)

)(15.33)

where we have again simplified q2 in terms of the scattering angle θ as q2 = 4k2 sin2(θ/2).

15.4 The method of partial waves

We now work out another approach to scattering which involves explicit nonpertur-

bative solutions to the Schrödinger equation. We consider spherically symmetric

potentials V (r). The Schrödinger equation takes the form given in (15.6),(∇2 +

2mE

~2− 2mV (x)

~2

)ψ = 0 (15.34)

15.4 The method of partial waves 147

We can do a separation of variables and write the wave function in the form ψ =

R(r)Y lm(θ, ϕ), as we did for the solution of the Hydrogen atom. In the present case, we

have an incoming beam of particles along one axis, say the z-axis. There is symmetry

of rotations around the z-axis and this is preserved by the scattering since the potential

has full symmetry under all rotations. Thus, we can simplify the form of the wave

function as

ψ =∞∑l=0

Cl Pl(cos θ)Rl(r) (15.35)

where Pl are the Legendre polynomials defined by

Pl(w) =1

2ll!

dl

dwl(w2 − 1)l (15.36)

The radial equation for the functionsRl is given by

1

r2

∂r

(r2∂Rl

∂r

)+

[k2 − l(l + 1)

r2− 2mV

~2

]Rl = 0 (15.37)

As we did for the Hydrogen atom, we can write Gl = rRl, which converts (15.37) to

the form

d2

dr2Gl +

[k2 − l(l + 1)

r2− 2mV

~2

]Gl = 0 (15.38)

For the purpose of scattering, the large r behavior of the solution is what is important.

For a short range potential, the large behavior is given by the equation

d2

dr2Gl + k2Gl ≈ 0 (15.39)

The general solution of this equation is of the form

Gl ≈ A sin

(kr + δl −

2

)(15.40)

where A and δl are constants. δl will be referred to as the phase shifts. SinceRl = Gl/r,

we can write the asymptotic behavior of ψ as Partial waves:asymptoticbehavior of

wave functionψ ≈ A∑l

Cl Pl(cos θ)sin(kr + δl − lπ

2

)kr

(15.41)

The extra factor of k in the denominator can be viewed as part of the as yet unde-

termined constants Cl. It si convenient to write it this way for later use. This is the

asymptotic behavior solution for full wave function ψ. It should thus have a part which

corresponds to the incoming waves and a part which corresponds to the scattered

15.4 The method of partial waves 148

part. We have to separate these out to identify the scattering amplitude. The incoming

wave is a plane wave moving along the z-axis, so it has the form Aeikz. Therefore we

can write

A∑l

Cl Pl(cos θ)sin(kr + δl − lπ

2

)kr

≈ Aeikz +Af(θ)eikr

r(15.42)

To isolate f(θ), it is useful to write eikz in spherical coordinates. Since z = r cos θ, this

can be expanded in terms of Pl(cos θ) as

eikz =∑l

Fl(r)Pl(cos θ) (15.43)

Using the integration formula∫ π

0dθ sin θ Pl(cos θ)Pl′(cos θ) =

2

2l + 1δll′ (15.44)

we can invert (15.43) to write

Fl(r) =2l + 1

2

∫ π

0dθ sin θ eiw cos θPl(cos θ) (15.45)

where w = kr. Thus the r-dependence of Fl is only through the combination kr. It is

more conventional to write this as Fl = il(2l + 1)jl, where Spherical Besselfunctions

jl(w) =(−i)l

2

∫ π

0dθ sin θ eiw cos θPl(cos θ) (15.46)

The functions jl are known as the spherical Bessel functions. They can be related to

the usual Bessel functions, but this is not important for us. We can take (15.46) as

the definition of the spherical Bessel functions, defined via its integral representation.

Using this back in (15.43), we can write it as Bauer’s formula

eikz =∑l

il(2l + 1)Pl(cos θ) jl(kr) (15.47)

This expansion is known as Bauer’s formula. While the expansion (15.47) may be used

for all values of r, what is relevant for us the is the asymptotic behavior, for large r.

We can obtain this from the integral representation by a series of integration by parts.

Thus we start by writing (15.46) as

jl(w) =(−i)l

2

∫ 1

−1du eiwu Pl(u) (15.48)

where we have used u = cos θ. Carrying out the integration over eiwu, we can write this

as ∫ 1

−1du eiwu Pl(u) =

[eiwu

iwPl(u)

]1

−1

−∫du

[eiwu

iw

]dPl(u)

du

15.4 The method of partial waves 149

=

[eiwu

iwPl(u)

]1

−1

−[eiwu

(iw)2

dPl(u)

du

]1

−1

+

∫du

[eiwu

(iw)2

]d2Pl(u)

du2(15.49)

The series can be continued to obtain the expansion in powers of 1/w. The leading

term for the large r-expansion is the first one, which gives Spherical Besselfunctions:asymptotic

behaviorjl(w) ≈ (−i)l

2

[eiw − (−1)le−iw

iw

]=

1

2iw

[ei(w−lπ/2) − e−i(w−lπ/2)

]≈

sin(w − lπ

2

)w

(15.50)

We have used Pl(1) = 1, Pl(−1) = (−1)l and il = eilπ/2. With the result (15.50), the

asymptotic behavior for eikz from Bauer’s formula is

eikz ≈∑l

il(2l + 1)Pl(cos θ)sin(kr − lπ/2)

kr(15.51)

We use this in (15.42) to identify the scattering amplitude. Writing sinx = (eix −e−ix)/2i, we get∑

l

Cl Pl(cos θ)1

2ikr

[ei(kr+δl−lπ/2) − e−i(kr+δl−lπ/2)

]≈∑l

il(2l + 1)Pl(cos θ)1

2ikr

[ei(kr−lπ/2) − e−i(kr−lπ/2)

]+ f(θ)

eikr

r(15.52)

We can now balance the coefficients of eikr and e−ikr on both sides of this equation.

Since the f(θ)-term does not have a factor e−ikr, we find

Cl e−iδl = il(2l + 1), or Cl = il(2l + 1) eiδl (15.53)

The coefficients have to match for each Pl(cos θ) separately, since the latter functions

form an orthogonal complete set for functions of θ. From the coefficient of eikr, we

then get Scattering

amplitude interms of phaseshiftsf(θ) =

∑l

[ClPl(cos θ)

(−i)l

2ikeiδl − (2l + 1)Pl(cos θ)

1

2ik

]=

1

k

∑l

(2l + 1)Pl(cos θ)

(e2iδl − 1

2i

)=

1

k

∑l

(2l + 1) eiδl sin δl Pl(cos θ) (15.54)

where we used the result (15.53) for Cl. Taking the square of this amplitude, the

differential scattering cross section is Differentialcross section in

terms of phaseshifts

dσ = |f(θ)|2 dΩ

15.4 The method of partial waves 150

=1

k2

∣∣∣∑l

(2l + 1) eiδl sin δl Pl(cos θ)∣∣∣2 dΩ (15.55)

Once we calculate the phase shifts δl, this formula gives the cross section. In practice,

this method is useful only if it is a good approximation to restrict ourselves to a few

low values of l. In such cases, we have only a few phase shifts to evaluate. A good

feature of the the formula (15.55) is that the angular dependence of the differential

cross section is explicitly given in terms of the Legendre polynomials, so, even when

we do not know the phase shifts explicitly, we know the angular dependence (if it is a

good approximation to consider a few values of l).

Since the wave function is decomposed into modes corresponding to different

angular momenta, this method is known as the method of partial waves. The l = 0

component is the S-wave, l = 1 component is the P -wave, etc.

There are some general features of scattering we can obtain from (15.55) before we

deal with the actual calculation of the phase shifts. Since the angular dependence is

explicitly given by the Legendre polynomials, we can integrate over all angles to obtain

a formula for the total cross section. Using the integration formula (15.44), we find Total cross

section in termsof phase shifts

σ =1

k2

∑l,l′

(2l + 1)(2l′ + 1)eiδl−iδl′ sin δl sin δl′

∫dϕdθ sin θ Pl(cos θ)Pl′(cos θ)

=2π

k2

∑l,l′

(2l + 1)(2l′ + 1)eiδl−iδl′ sin δl sin δl′

(2

2l + 1δll′

)=

k2

∑l

(2l + 1) sin2 δl ≡∑l

σl (15.56)

(δll′ is the Kronecker delta, equal to 1 if l = l′, zero otherwise.) If we consider the total

cross section for each partial wave separately, namely, each σl, the fact that sin2 θ ≤ 1

gives a bound, Bounds onpartial cross

sections

σl ≤4π

k2(2l + 1) (15.57)

This is a very important bound; it tells us that the cross sections should decrease with

increasing energy of the incoming particles, since (~2k2/2m) = E. This result is related

to the conservation of probability and is known as a unitarity bound.

Another important result is obtained from separating the scattering amplitude

into real and imaginary parts, by writing eiδl = cos δl + i sin δl. The imaginary part of f

is thus given by

Imf(θ) =1

k

∑l

(2l + 1) sin2 δl Pl(cos θ) (15.58)

15.5 Validity of approximations 151

We also have Pl(cos θ) = Pl(1) = 1 at θ = 0, so that the imaginary of the forward (i.e.

θ = 0) scattering amplitude is

Imf(0) =1

k

∑l

(2l + 1) sin2 δl (15.59)

Comparing this with (15.56), we see that Optical theorem

σ =4π

kImf(0) (15.60)

This relates the total cross section to the imaginary part of the forward scattering am-

plitude. It is known as the optical theorem. It is also an expression of the conservation

of probability or unitarity of the scattering process.

A general comment on unitarity:

In quantum mechanics, any transformation of a physical system realizable by the

action of operators must be represented as unitary transformations on the states.

This is necessary to preserve the inner product or probability amplitudes, the key

foundational principle of quantum mechanics. Applied to time-evolution, the uni-

tarity property is equivalent to the conservation of probability. But we know that

the elements of a unitary matrix must be bounded, i.e., |Uji| ≤ 1, since (U †U)ii =∑j U†ijUji =

∑j |Uji|2 = 1. What this means is that, for many contexts in physics,

there are bounds on measurable quantities which follow from this fact. Scattering is a

process which is governed by the unitary time-evolution of the system, so the bounds

on cross sections are ultimately traced to this fact. Not surprisingly, such bounds are

generally known as unitarity bounds.

15.5 Validity of approximations

We have considered the Born approximation, obtained by the perturbative first order

change in the wave functions due to the scattering potential. We have also discussed

the method of partial waves. It is important to understand the range of parameters for

which these approximations are valid or useful.

First we consider the Born approximation. The first correction tot he incoming

wave ψ0 was obtained in (15.18) as

ψ1 = − 1

∫d3x′

1

|~x− ~x′|eik|~x−~x

′|[

2mV (x′)

~2

]ψ0(x′) (15.61)

A perturbative expansion like this would be reasonable if this correction is small

compared to the uncorrected initial wave. So an estimate of the validity of the Born

approximation is provided by the condition that the absolute value of ψ1 is much

smaller than the absolute value of ψ0. At large r = |~x|, which is what we considered to

15.6 The spherically symmetric hill 152

extract the scattering amplitude f(~k,~k′), the value decreases due to the denominator

|~k − ~k′|. We expect the maximal modification of the incoming wave to be close to the

origin of the potential, so, for an estimate, we evaluate ψ1 at ~x = 0. This is given by

ψ1(0) = − m

2π~2

∫d3x′

eikr′

r′Aei

~k·~x′ V (x′)

= −A 2m

~2k

∫ ∞0

dr′eikr′sin(kr′)V (r′) (15.62)

where, in the second line, we simplified the integral for a central potential. Since

|ψ0| = |A|, we see that this is small compared to the absolute value of ψ0 if

2

~v

∣∣∣ ∫ ∞0

eikr′sin(kr′)V (r′)

∣∣∣ 1 (15.63)

where v is the velocity fo the incoming particles, v = ~k/m. Since V (r′) is multiplied

by factors of magnitude less than one in the integral, we may think, somewhat crudely,

of it as giving the strength of the potential times its range. Thus the validity of the Born

approximation is given by Validity: Bornapproximation

2

~v(Strength of V × Range of V ) 1 (15.64)

The factor of 1/v tells us that this can be realized for high energy incoming particles

impinging on a potential of short range.

Unlike the Born approximation, the method of partial waves works better for low

energy scattering. Consider incoming particles with an impact parameter ρ, which

is the shortest distance the particles come to the scattering center. The angular

momentum of the incoming particle at this impact parameter is ρ× p. Even though

this is a classical argument, we may take this as a rough estimate of l~. If the range of

the potential is a, we expect that the waves with impact parameters ρ > a to be not

affected very much by the scattering. Thus, we should have a bound Validity: Partialwave analysis

l ≤ pa

~≤[

2mEa2

~2

]12

(15.65)

This tells us that, for low energies and short range potentials (for which the right hand

side of the inequality (15.65) is small), we need to consider only a small number of

partial waves, making it a useful approach in practice.

15.6 The spherically symmetric hill

As an example of using partial waves for scattering, we consider the spherically sym-

metric piecewise constant potential

V (r) =

V0 r < a

0 r > a(15.66)

15.6 The spherically symmetric hill 153

A priori, we could consider V0 to be positive or negative. If V0 < 0, there can be bound

states, so we will take a simple case where V0 > 0 and E > V0. For low energies, it is

sufficient to consider just the l = 0 partial wave. The Schrödinger equation is given by

G′′0 + k2G0 = 0, r > a, k2 =2mE

~2

G′′0 + k′2G0 = 0, r < a, k′2 =2m(E − V0)

~2(15.67)

The solution is obviously of the form

G0 =

A sin(k′r), r < a

B sin(kr + δ0), r > a(15.68)

Notice that the cos k′r solution will be singular at the origin for R = G/r, so it is not

acceptable. We have to match the wave function and its first derivative at r = a. These

conditions become

A sin k′a = B sin(ka+ δ0)

Ak′ cos k′a = B k cos(ka+ δ0) (15.69)

Dividing the first by the second, we identify

δ0 = arctan

[k

k′tan(k′a)

]− ka (15.70)

The arctan function has many branches, the correct branch should be chosen so that

δ0 → 0 for V0 → 0 (or k′ = k). This is what is displayed in (15.70). The differential cross

section is then given by S-wavescattering from

spherical hilldσ =

1

k2sin2 δ0 (15.71)

with the phase shift δ0 given by (15.70).

It is also useful to consider the same potential in the Born approximation. In this

case

f = − m

2π~2

∫d3x′ V (x′) ei~q·~x

= −mV0

2π~2

∫ a

0r′2dr′

∫dϕ dθ sin θ eiqr

′ cos θ

= −mV0

q~2

1

i

[∫ a

0dr′r′eiqr

′ − complex conjugate

](15.72)

The remaining integral is easy to evaluate by integration by parts to obtain

f = −2mV0

~2

1

q3(sin qa− qa cos qa) (15.73)

15.7 Scattering by a hard sphere 154

Here q2 = (~k − ~k′)2 = 4k2 sin2(θ/2), so that q = 2k sin(θ/2). The cross section is given

by dσ = |f |2dΩ and reads Spherical hill:Born

approximation

dσ =

(2mV0

~2

)2[

sin(2ka sin(θ/2)

)− 2ka sin(θ/2) cos

(2ka sin(θ/2)

)(2k sin(θ/2))3

]2

dΩ (15.74)

15.7 Scattering by a hard sphere

Scattering by a hard sphere is an interesting case to which we can apply the method of

partial waves. The potential in this case is given by

V (r) =

∞ r < a

0 r > a(15.75)

This may be viewed as a special case of the scattering by a potential hill, with V0 →∞.

The wave function cannot penetrate into the region of the hard sphere, since the

potential energy is infinite. Thus we only have

G0 = B sin(kr + δ0), r > a (15.76)

with G0 = 0 for r < a. In this case, the matching condition is that the wave function

should vanish at r = a, which gives

δ0 = −ka (15.77)

The differential cross section is

dσ =

[sin2 ka

k2

]dΩ (15.78)

The total cross section, obtained by integrating over all angles, is Hard sphere:

Quantum crosssection

σ = 4π

[sin2 ka

k2

](15.79)

Notice that, for very low energies where k ≈ 0, this becomes 4πa2. This is a dramatic

difference compared to the classical scattering by a hard sphere, for which the cross

section is the geometrical area presented by the scattering center, namely, πa2. This

increase by a factor of 4 in the quantum case is due to diffractive effects, if we think of

it in terms of matter waves, or due to the position uncertainties if we think in more

abstract terms. For ease of this comparison, the classical calculation is as follows. The

geometry of the scattering is shown in Fig. 15.1. The impact parameter is ρ and the

scattering angle is θ. We have basically specular reflection of the incoming particles,

as the collision is elastic. Thus the incident angle and the reflected angle are the same

15.7 Scattering by a hard sphere 155

!

!!

"

#

a

Figure 15.1: Geometry of hard sphere scattering for classical analysis

at the point of contact. From the geometry, we see that ρ = a sinϕ and θ = π − 2ϕ.

Thus ρ = a cos(θ/2). The differential cross section, classically, is given by

dσ = ρdρ dϕ =a2

4dΩ (15.80)

This integrates to σ = πa2, which is the geometrical area presented to the incoming Hard sphere:Classical crosssectionbeam by the hard sphere.

156

16 Time-dependent perturbation theory

16.1 Formulation and general features

So far we have discussed basically time-independent aspects of physical systems, such

as the eigenvalues of the Hamiltonian, shifts or corrections to such eigenvalues due to

perturbations, and even scattering of particles was analyzed in a time-independent

framework. But there are many situations where a physical system is subject to time-

dependent perturbations. A classical example would be the perturbation to an atomic

system due to an incident beam of electromagnetic radiation. The atom can absorb

the radiation and go to a higher energy level. Recall that an electromagnetic plane

wave has a time-dependence due to the e−iωkt+i~k·~x factor. Another situation is where

we can stimulate the atoms to go to a higher level by external electric currents or by

heating, and the atoms can then radiate. These are time-dependent processes and so

we must consider how to treat such phenomena in a perturbative framework. This

is what we will do now. Notice that, in these situations, we start with the system in

some eigenstate of the unperturbed Hamiltonian. The perturbation then leads to a

transition of the system to another state of the unperturbed Hamiltonian. So the key

quantity of physical interest is the probability amplitude such a transition.

As always, we have to solve the Schrödinger equation to understand how this works

out. We can write it in the form

i~∂Ψ

∂t= (H0 +Hint(t)) Ψ (16.1)

where H0 is the unperturbed time-independent Hamiltonian and Hint is the pertur-

bation. To emphasize that the latter can depend on time, we have included t as

the argument of Hint. In the absence of the perturbation, the solution is clear, each

eigenstate, say Ψa, will pick up a factor e−iEt/~ of the form

φa(t) = e−iEt/~ φa(0) (16.2)

More generally, we may write the solution (16.2) for a general state, albeit rather

formally, as

φ(t) = e−iH0t/~ φ(0) (16.3)

For the perturbed situation in (16.1), we therefore seek a solution of the form

Ψ(t) = e−iH0t/~ S(t) Ψ(0) (16.4)

Substituting this into (16.1), we get[H0 e

−iH0t/~S + e−iH0t/~ i~∂S

∂t

]Ψ(0) =

[H0 e

−iH0t/~S +Hint(t) e−iH0t/~S

]Ψ(0) (16.5)

16.1 Formulation and general features 157

This equation must hold for any state Ψ(0). Removing this factor, and after cancella-

tions and multiplication by e+iH0t/~, we find Evolutionequation in

interactionpicturei~

∂S

∂t= HI(t)S

HI(t) = eiH0t/~Hint(t) e−iH0t/~ (16.6)

Before turning to the explicit solution of (16.6), we relate the operator S to what is

physically relevant or observable. The key question we want to ask in such contexts

is the following. Let us say, we start with the system in some state characterized by

the wave function φa(0); this is at time t = 0, before the perturbation is switched on or

is active, so φa(0) is an eigenstate of the unperturbed Hamiltonian. This evolves to a

state Ψ(t) = e−iH0t/~ S(t)φa(0) at time t, according to (16.4). Since we are interested in

a transition to a new state of the unperturbed system, we ask the question: Does this

state resemble some other unperturbed state φb at time t? The probability amplitude

for φb(t) to be obtained in the state Ψ(t) = e−iH0t/~ S(t)φa(0) is given by

A = 〈φb(t)| e−iH0t/~ S(t) |φa(0)〉

= 〈φb(0)| eiH0t/~ e−iH0t/~ S(t) |φa(0)〉

= 〈φb(0)| S(t) |φa(0)〉 (16.7)

Notice that the time-evolution of φb(t) is just what is given by the unperturbed Hamil-

tonian, φb(t) = e−iH0t/~ φb(0). The result (16.7) shows that we can remove the time-

dependent factors e−iH0t/~ from the unperturbed wave functions or states (i.e., we

can use the φ’s at time t = 0) and the transition amplitude is then given by the matrix

element of S for these states; i.e.,

A = 〈b|S(t) |a〉 (16.8)

Thus S is the key quantity of interest. Notice that the effect of time-evolution due to

H0 is not lost, it is included due to the factors eiH0t/~ and e−iH0t/~ in HI in (16.6).

There is another way to arrive at the equation for S, which shows a slightly different

facet of the logic involved. We can consider the set of eigenstates of the unperturbed

system as a complete set and expand the perturbed wave function as

Ψ(t) =∑a

Ca φa(t) (16.9)

Here φa(t) carries the time-dependence e−iEat/~, as is appropriate for the unperturbed

states. The strategy is to consider Ca as time-dependent coefficients. The Schrödinger

equation (16.1) then becomes

i~∑b

∂Cb∂t

φb(t) + i~∑b

Cb∂φb(t)

∂t=∑b

Cb (H0 +Hint(t))φb(t) (16.10)

16.1 Formulation and general features 158

Taking the inner product o this equation withφa(t), and using i~∂φa(t)/∂t = H0φa(t) =

Eaφa(t), we get Evolutionequation:

variation ofconstantsversion

i~∂Ca∂t

=∑b

〈φa(t)|Hint |φb(t)〉 Cb =∑b

〈a| eiH0t/~Hint(t)e−iH0t/~ |b〉 Cb (16.11)

Writing Ca(t) =∑

k SakCk(0), we find∑k

i~∂Sak∂t

Ck(0) =∑b,k

〈a| eiH0t/~Hint(t)e−iH0t/~ |b〉Sb,kCk(0) (16.12)

Since this equation must hold for any initial values Ck(0), we get the matrix equation

i~∂Sak∂t

=∑b

〈a| eiH0t/~Hint(t)e−iH0t/~ |b〉Sb,k (16.13)

This is just the matrix version of the operator equation (16.6).

Turning to the solution of the equation (16.6), the idea is that we are starting with

the system in some eigenstate of the unperturbed Hamiltonian, so that, at time t = 0,

we should have S(0) = 1. We can then convert (16.6) to an integral equation as

S(t) = 1− i

~

∫ t

0dt′HI(t

′)S(t′) (16.14)

By differentiating both sides with respect to t, we see that this does indeed lead to

(16.6). This is still an implicit equation for S since it occurs in the integral on the right

hand side as well. But we can find a solution to this equation as a power series in HI

by postulating the expansion S = 1 + S(1) + S(2) + · · · where S(n) has n powers of HI .

Substituting this into (16.14), we find

1+S(1)(t)+S(2)(t)+ · · · = 1− i

~

∫ t

0dt′HI(t

′)[1 + S(1)(t′) + S(2)(t′) + · · ·

](16.15)

Equating similar powers of HI on both sides, we find

S(1)(t) = − i~

∫ t

0dt′HI(t

′)

S(2)(t) = − i~

∫ t

0dt′HI(t

′)S(1)(t′)

=

(− i~

)2 ∫ t

0dt′∫ t′

0dt′′HI(t

′)HI(t′′), etc.

S(n)(t) =

(− i~

)n ∫ t

0dt1

∫ t1

0dt2 · · ·

∫ tn−1

0dtn HI(t1) · · ·HI(tn) (16.16)

This perturbative solution for S can be applied to many physical situations, once

we know the the perturbationHint(t). For a large class of systems, the time-dependence

16.1 Formulation and general features 159

can be Fourier decomposed into modes with time-dependence of the form e±iωt, so

we will now consider some general features of such perturbations. We write Hint in

the form

Hint(t) = e−iωt V (16.17)

where V denotes the part of the perturbation which does not have explicit time-

dependence, but is a function of the various operators relevant to the problem. The

matrix element of the first order term in S is then given by

Sba = − i~

∫ t

0dt′ 〈b| eiH0t′/~e−iωt

′V e−iH0t′/~ |a〉

= − i~

∫ t

0dt′ ei(Eb−Ea−~ω)t′/~ 〈b|V |a〉

= −〈b|V |a〉 ei∆t/~ − 1

∆= −〈b|V |a〉 2i

sin(∆t/2~)

∆ei∆t/2~ (16.18)

where ∆ = Eb − Ea − ~ω. Taking the absolute square of this, we find the probability

for the system to make the transition |a〉 → |b〉 as

|Sba|2 = | 〈b|V |a〉 |2[

2 sin(∆t/2~)

]2

(16.19)

To understand the behavior of the system in time, we can examine the properties of

the function

f(∆, t) =

[2 sin(∆t/2~)

]2

(16.20)

Considered as a function of ∆, f(0, t) = t2/~2. The sine function tells that f has zeros

at ∆t = 2nπ~, or at ∆ = 2nπ~/t, and peaks at ∆ = (2n+1)π~/t, the height of the peaks

-30 -20 -10 10 20 30

0.02

0.04

0.06

0.08

∆→-30 -20 -10 10 20 30

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Figure 16.1: Behavior of f(∆, t) as a function of ∆, for two values of t. The graph on

the right side is for a higher value of t.

16.1 Formulation and general features 160

decreasing rapidly due to the 1/∆2 factor. As t becomes large, the height of the primary

peak at ∆ = 0 increases quadratically with t, it also becomes narrower as the first zero

at 2π~/t moves closer to the origin with large t, while the secondary and higher peaks

become smaller. Most of contribution to f comes from the neighborhood of ∆ = 0.

This is illustrated in Fig. 16.1. The behavior is similar to the Dirac δ-function.

For small value of t, i.e., for t 2~/∆, f(∆, t), and hence the probability, will

increase as t2. But as t becomes large, the behavior changes. The probability is almost

zero unless ∆ = 0. This can be interpreted as an energy conservation equation, with

the system making a transition |a〉 → |b〉, absorbing energy ~ω from the perturbation.

To make this clearer, and connect f(∆, t) to the Dirac δ-function (for large t), we

consider the integral of f .∫ ∞−∞

d∆f(∆, t) =t

2~

∫ ∞−∞

dy4 sin2 y

y2, y =

∆t

2~

=2π

~t (16.21)

Even though we integrate from−∞ to∞, the contribution to the integral is from an

infinitesimal range around ∆ = 0 as t becomes large. From the general behavior of f

and the integral (16.21), we can thus conclude that

f(∆, t) ≈ 2π

~t δ(∆) (16.22)

The transition probability for |a〉 → |b〉 is thus

|Sba|2 = t2π

~| 〈b|V |a〉 |2δ(Eb − Ea − ~ω), t ~

∆(16.23)

The probability increases linearly with t, corresponding to a constant rate. The transi-

tion rate or the probability of the transition per unit time is thus

(Rate |a〉 → |b〉) =d|Sba|2

dt=

~| 〈b|V |a〉 |2δ(Eb − Ea − ~ω) (16.24)

In most cases, we do not have a single well-defined value of the final energy; instead

we have transitions to a small range of final states. If ρ(E)δE denotes the number

of states in the energy range δE, the transition rate to this small range of states is

obtained as Fermi’s GoldenRule∑

small range of Eb

(Rate |a〉 → |b〉) =

∫ Eb+δEb/2

Eb−δEb/2

~| 〈b|V |a〉 |2δ(Eb − Ea − ~ω)ρ(Eb)dEb

=2π

~| 〈b|V |a〉 |2ρ(Eb) (16.25)

where, in the last line, Eb is to be understood as Ea + ~ω. This result (16.25) is often

referred to as Fermi’s Golden Rule.

16.1 Formulation and general features 161

We obtained this result for time-dependence of the form e−iωt. If we consider

perturbations of the formHint = eiωtV , similar calculations will go through, except for

the final δ-function being δ(Eb −Ea + ~ω). Thus the process will have zero probability

unlessEb+~ω = Ea, indicating that this corresponds to an emission process where the

system transitions to lower energy state giving up energy ~ω to the perturbing agent.

While the details of the matrix element 〈b|V |a〉 are important for specific processes,

we can make a general conclusion from the analysis so far:

Hint = e−iωt V ⇐⇒ Absorption of energy ~ω from

perturbing agent

Hint = eiωt V ⇐⇒ Emission of energy ~ω to the

perturbing agent

We will now discuss the absorption and emission of electromagnetic radiation

or photons. But a couple of remarks are in order before we turn to that problem.

Going back to (16.7), we see that for the purpose of calculating transition probabilities,

we can consider the states φa to be evolving just by the action of S. The operator S

itself changes in time due to Hint. In other words, S = 1 if there is no perturbation.

This helps to isolate the effect of the perturbation or interaction. The unperturbed

Hamiltonian enters the analysis through HI = eiH0t/~Hinte−iH0t/~. This formulation Interaction

pictureof how states evolve is known as the interaction picture or the Dirac picture. We will be

discussing this in more detail later.

The interaction picture is related to a similar strategy used in celestial mechanics

for analyzing planetary perturbations. Thus, for example, if we consider the Earth-

Moon system, in the absence of other gravitating bodies, we have a Keplerian elliptical

orbit. The orbital parameters such as the eccentricity of the orbit, the angle (or

argument) of perigee, the inclination of the orbit, etc. are constants. These will change

once we include other gravitating bodies, for example, the effect of the Sun on the orbit

of the Moon. Lagrange realized that by using a suitable time-dependent coordinate

system, the effect of the perturbation can be expressed as time-evolution equation

for the orbital parameters, their evolution being solely due to the perturbation. Since

the orbital parameters are constants in the absence of perturbations, this approach

became known as the variation of constants method. The quantum analogy is via Variation of

constantsmethod(16.9), where Ca are constants in the absence of the perturbation and evolve with time,

as in (16.11), solely due to the perturbation, giving a variation of constants.

16.2 Absorption and emission of radiation 162

16.2 Absorption and emission of radiation

16.2.1 Electromagnetic waves

We begin with a short discussion of electromagnetic waves. The Maxwell equations,

which describe the dynamics of electric and magnetic fields, are given by

∇ · ~B = 0

∇× ~E +1

c

∂ ~B

∂t= 0 (16.26)

∇ · ~E = ρ

∇× ~B − 1

c

∂ ~E

∂t= ~J (16.27)

where ρ, ~J are the charge and current densities. For discussing free electromagnetic

waves, it is sufficient to consider the case ρ = 0 = ~J . The first two equations are

usually solved in terms of the potentials as

~E = −∇φ− 1

c

∂ ~A

∂t, ~B = ∇× ~A (16.28)

where φ is the electrostatic potential and ~A is the magnetic vector potential. For

electromagnetic waves, φ = 0 and the second set of equations, with vanishing ρ, ~J ,

become

∂t(∇ · ~A) = 0 (16.29)

∇(∇ · ~A)−∇2 ~A+1

c2

∂2 ~A

∂t2= 0 (16.30)

These equations have the wavelike solutions of the form ~A = ~A(0)e−iωt+i~k·~x. Equation

(16.29) is satisfied if ~k · ~A(0) = 0. Equation (16.30) then tells us that ω2 = c2~k · ~k = c2k2

for a nontrivial solution. Further, these are linear in ~A, so we can superpose different

solutions to get a general solutions.

Before writing out the general solution, there is a matter of normalizations we

need to discuss. At various stages we will need to do integrations over all space and

over all values of the wave vector ~k. For planes waves, this can lead to δ-functions and

some awakward expressions at intermediate steps. The simplest way to avoid this is to

consider plane waves in a cubical box, each side being of length L, so that the volume

V = L3. We can impose a periodic boundary condition, which restricts ~k as

~k = (k1, k2, k3) =2π

L(n1, n2, n3) (16.31)

where n1, n2, n3 are integers. Other boundary conditions are also possible. In the end,

when we take L → ∞, the precise choice of boundary conditions will be irrelevant.

16.2 Absorption and emission of radiation 163

Since ~k are now discrete, we will have summations over ni, rather than integrations

over all ~k. We also have the integration formula∫Vd3x ei(

~k−~k′)·~x = V δ~k,~k′ (16.32)

Further, when L becomes large, the difference between nearby values of ~k becomes

infinitesimal,∼ (2π/L), so that ~k is almost a continuous variable and the summation

can be replaced by integration. Thus∑ni

f(~k)→∫d3n f(~k) =

∫L3 d3k

(2π)3f(~k) = V

∫d3k

(2π)3f(~k) (16.33)

The general solutions to (16.30) can now be written as Planeelectromagnetic

wave

Ai(~x, t) =∑~k,λ

√~c2

2ωkV

[a

(λ)~k

ε(λ)i e−iωkt+i~k·~x + a

(λ)∗~k

ε(λ)i eiωkt−i~k·~x

](16.34)

Here ωk = c√~k · ~k, the two sign choices in solving ω2 = c2k2 are displayed separately.

Also, in (16.34), ε(λ)i are the polarization vectors. Since we need ~k · ~A(0) = 0, these

vectors should obey the condition

~k · ~ε = 0 (16.35)

Thus ~ε is perpendicular to the wave vector ~k. This allows for two independent orienta-

tions, in the two-dimensional plane orthogonal to ~k. We denote them by ε(1)i and ε(2)

i ,

corresponding to λ = 1, 2, each being chosen to be a unit vector. Therefore

~ε(λ) · ~ε(λ′) = δλλ′

(16.36)

Any choice of polarization is a linear combination of ~ε(1) and ~ε(2).

The fact that the waves can have an arbitrary amplitude is included via a(λ)~k

and

a(λ)∗~k

, which are the amplitudes for each choice of ~k and λ. Notice that these should be

complex conjugates of each other since ~Amust be real. It is convenient to separate out

a factor√~c2/2ωkV for later simplification of some expressions. (There is no reason to

include an ~-dependent factor from the classical point of view. This could have been

absorbed into the arbitrary amplitudes a(λ)~k

and a(λ)∗~k

. But it is convenient to display it

explicitly, so the interpretation in the quantum theory is simpler.)

The energy contained in the electromagnetic field is given by the Hamiltonian for

the Maxwell theory. The expression for this is Hamiltonian forelectromagneticfields

H =1

2

∫Vd3x (E2 +B2) (16.37)

16.2 Absorption and emission of radiation 164

By taking the derivative of this with respect to time and using the Maxwell equations,

we find

∂H

∂t=

∫Vd3x

[~E · (c∇× ~B) + ~B · (−c∇× ~E)

]= −c

∫Vd3x∇ · ( ~E × ~B) = −

∮∂Vc ( ~E × ~B) · d~S (16.38)

This shows that the rate at which energy decreases in a given volume is due to an

outward flow across the boundary with the flux given by c( ~E × ~B), which is known

as the Poynting vector. For electromagnetic waves, we can thus identify this vector

c ( ~E × ~B) as the intensity of the radiation.

We can now simplify the expression for the energy density and the intensity for the

case of electromagnetic waves using (16.34). For the electric field we can use Ei

Ei = −1

c

∂Ai∂t

= −1

c

∑~k,λ

√~c2

2ωkV(−iωk)

[a

(λ)~k

ε(λ)i e−iωkt+i~k·~x − a(λ)∗

~kε(λ)i eiωkt−i~k·~x

](16.39)

The term in the energy density involving the electric field can thus be simplified as

follows.

1

2

∫d3xE2 =

1

2

∑i

∫d3x

1

c2

∂Ai∂t

∂Ai∂t

= −1

2

∑i,~k,~k′λ,λ′

∫d3x

√~2

2ωk2ωk′V2(ωkωk′)

×[a

(λ)~k

ε(λ)i e−iωkt+i~k·~x − a(λ)∗

~kε(λ)i eiωkt−i~k·~x

]×[a

(λ′)~k′

ε(λ′)i e−iωk′ t+i

~k′·~x − a(λ′)∗~k′

ε(λ′)i eiωk′ t−i~k′·~x

]= −

∑~k,λ,λ′

1

2

~ωk2

[a

(λ)~ka

(λ′)

−~k~ε(λ) · ~ε(λ′)e−2iωkt + a

(λ)∗~k

a(λ′)∗−~k

~ε(λ) · ~ε(λ′)e2iωkt

− a(λ)~ka

(λ′)∗~k

~ε(λ) · ~ε(λ′) − a(λ)∗~k

a(λ′)~k

~ε(λ) · ~ε(λ′)]

(16.40)

In getting to the last line of this equation, we first integrate over all ~x using (16.32),

which gives V δ~k,−~k′ for two of the terms and V δ~k,~k′ for the other two. The summation

over ~k′ is then trivial, it ends up setting ~k′ to∓~k, appropriately. Since ωk = ω−k, there

is further simplification of some of the factors involving ω’s. In the last two terms in

the square brackets in (16.40), we can now use ~ε(λ) · ~ε(λ′) = δλλ′. We cannot do this

for the first two terms since ε(λ′)

i for those terms correspond to−~k. The polarization

vector has a dependence on ~k (not explicitly shown) since it has to be orthogonal to

16.2 Absorption and emission of radiation 165

this vector. Thus we can rewrite (16.40) as

1

2

∫d3xE2 = −

∑~k,λ,λ′

1

2

~ωk2

[a

(λ)~ka

(λ′)

−~k~ε(λ) · ~ε(λ′)e−2iωkt + a

(λ)∗~k

a(λ′)∗−~k

~ε(λ) · ~ε(λ′)e2iωkt]

+1

2

∑~k,λ

~ωk2

[a

(λ)∗~k

a(λ)~k

+ a(λ)~ka

(λ)∗~k

](16.41)

For the term involving the magnetic field we can carry out an integration by parts to

obtain

1

2

∫d3xB2 =

1

2

∫d3x ~A · (∇(∇ · ~A)−∇2 ~A) =

1

2

∫d3x ~A · (−∇2 ~A) (16.42)

where we used the fact that ∇ · ~A = 0. The operator −∇2 gives k2 for each Fourier

mode of ~A. We can now use (16.34) and simplify as we did for the electric field term to

obtain

1

2

∫d3xB2 =

∑~k,λ,λ′

1

2

~ωk2

[a

(λ)~ka

(λ′)

−~k~ε(λ) · ~ε(λ′)e−2iωkt + a

(λ)∗~k

a(λ′)∗−~k

~ε(λ) · ~ε(λ′)e2iωkt]

+1

2

∑~k,λ

~ωk2

[a

(λ)∗~k

a(λ)~k

+ a(λ)~ka

(λ)∗~k

](16.43)

where we also used c2k2 = ω2k. Combining (16.41) and (16.43), we finally get Hamiltonian for

plane

electromagneticwavesH =

1

2

∫d3x (E2 +B2) =

∑~k,λ

~ωk2

[a

(λ)∗~k

a(λ)~k

+ a(λ)~ka

(λ)∗~k

]=

∑~k,λ

~ωk a(λ)∗~k

a(λ)~k

(16.44)

This is a simple and nice expression which tells us that, if we think of the radiation

as made of photons, each carrying energy ~ωk, then we can interpret a(λ)∗~k

a(λ)~k

as the

number of photons of wave vector ~k and polarization given by ~ε(λ).

16.2.2 The interaction Hamiltonian

The interaction Hamiltonian can be obtained by the prescription ~p→ ~p− e ~A/c which

we have used before in chapter 11. Ignoring spin effects for now, the Hamiltonian is

thus

H =p2

2m+ V − e

2mc(~p · ~A+ ~A · ~p) +

e2

2mc2A2 (16.45)

This identifies the interaction part of the Hamiltonian as

Hint = − e

2mc(~p · ~A+ ~A · ~p) +

e2

2mc2A2 (16.46)

16.2 Absorption and emission of radiation 166

For the case of electromagnetic waves, we can use

~p · ~A =(~p · ~A− ~A · ~p

)+ ~A · ~p =

∑i

[pi, Ai] + ~A · ~p = −i~∇ · ~A+ ~A · ~p

= ~A · ~p (16.47)

since∇ · ~A = 0. Thus we may simplify the interaction Hamiltonian as

Hint = − e

mc~A · ~p+

e2

2mc2A2 (16.48)

16.2.3 Absorption of radiation

As we have discussed before, for absorption of radiation, we need the term in Hint

with the time-dependence of the form e−iωt. For a first order calculation, we just need

the first order term in (16.48). Using the e−iωt part of the expression (16.34) for Ai, for

a fixed choice of ~k and polarization λ, we find

Hint = e−iωkt V

V = − e

m

√~

2ωkVa

(λ)~kε(λ) · ~p ei~k·~x (16.49)

Using (16.24), we obtain the transition rate for |a〉 → |b〉 as

Rate (|a〉 → |b〉) =πe2

ωkV(a

(λ)∗~k

a(λ)~k

)∣∣∣ 〈b| ε(λ) · ~p

mei~k·~x |a〉

∣∣∣2 δ(Eb −Ea − ~ωk) (16.50)

This result is for incident electromagnetic waves of precise values for ~k. In reality,

every beam of waves has a at least a small dispersion in values of ~k, (and, perhaps,

in the choice of polarization as well), so we must consider the transition rate for a

small range of parameters of the incident beam. This rate, obtained by summing over

a small set of states around the values (~k, λ), is Absorption ratefor photons

Γba =∑~k,λ

πe2

ωkV(a

(λ)∗~k

a(λ)~k

)∣∣∣ 〈b| ε(λ) · ~p

mei~k·~x |a〉

∣∣∣2 δ(Eb − Ea − ~ωk) (16.51)

This rate is proportional to the number of photons in the incident beam, if we interpret

a(λ)∗~k

a(λ)~k

as the number of photons, as mentioned after (16.44).

16.2.4 Emission of radiation

We now turn to the discussion of the emission of radiation. When an atom is in an

excited state, it can make a transition to a state of lower energy by emitting a photon.

This can happen even in vacuum when there is no electromagnetic wave to begin with.

This process which can happen even in the absence of any ambient radiation is called

16.2 Absorption and emission of radiation 167

spontaneous emission. Notice that Hint, as given in (16.48) is inadequate to describe

this process since it is zero when there is no ambient field, i.e., when Ai = 0.

The excited atom can still decay to a lower state by emitting a photon, in the pres-

ence of an ambient field. This process can be described by the interaction Hamiltonian

(16.48). It is referred to as the stimulated emission of radiation. These terms were

introduced by Einstein in 1917 long before the emergence of the quantum mechanical

description. He identified the need for these two types of emission processes and

used the lettersA andB n for the rates for spontaneous and stimulated emission rates,

where n is the ambient number of photons. These define what are called Einstein’s A

and B coefficients.

To calculate the rate for stimulated emission, we can use the interaction Hamil-

tonian (16.48), with the eiωkt part of the vector potential contributing. From (16.34),

we see that the only difference compared to the absorption part is that we have the

complex conjugate term a(λ)∗~k

eiωkt−i~k·~x rather than a(λ)~ke−iωkt+i~k·~x. Since we take the

absolute square to get the rate, this is simply equivalent to |a〉 ↔ |b〉. So, for the

stimulated emission rate, we find immediately

Γab =∑~k,λ

πe2

ωkV(a

(λ)∗~k

a(λ)~k

)∣∣∣ 〈a| ε(λ) · ~p

me−i

~k·~x |b〉∣∣∣2 δ(Ea − Eb + ~ωk)

= Γba (16.52)

We can express this result in terms of the B-coefficient by writing Stimulatedemission rate

Γab =∑~k,λ

B~k,λ N~k,λ = Γba

B~k,λ =πe2

ωkV

∣∣∣ 〈a| ε(λ) · ~pme−i

~k·~x |b〉∣∣∣2 δ(Ea − Eb + ~ωk) (16.53)

where N~k,λ = a(λ)∗~k

a(λ)~k

.

The calculation of the spontaneous emission rate from the first principles of quan-

tum theory is a bit more involved. It requires quantizing the electromagnetic field

itself. Since any observable, as we have stated before, is an operator in the quantum

theory, the electric and magnetic fields themselves should be understood as operators

obeying certain commutation rules. Once this is done, we can use the interaction term

(16.48), but now Ai is an operator and not just the classical electromagnetic wave. So

it does not vanish even in the vacuum and can lead to the explanation of spontaneous

emission of photons. This was originally done by Dirac in 1927. We will not discuss

this for now.

Einstein, in his original work on theA andB coefficients, gave a clever argument to

obtain the spontaneous emission rate A, by relating these ideas to Planck’s radiation

formula. Consider a number of identical atoms, each of which can be in the ground

16.2 Absorption and emission of radiation 168

state |a〉 or the excited state |b〉. We consider these atoms ina box filed with radiation.

The atoms can absorb and emit photons, eventually coming to thermal equilibrium.

Let na, nb denote the number of atoms in the states |a〉, |b〉, respectively. Consider how

nb can change. This can decrease due to the spontaneous or stimulated emission of

photons, and increase by the absorption of photons by the atoms in the state |a〉. Thus

we can write a rate equation for nb as

dnbdt

= −BN~k,λ nb −Anb +BN~k,λ na (16.54)

where N~k,λ, as mentioned before, denotes the number of photons. Similarly, na can

decrease as the atoms transition to the higher state but can increase due to the excited

atoms decaying. Thus

dnadt

= −BN~k,λ na + (BN~k,λ +A)nb (16.55)

As the system comes to equilibrium, stabilizing the values of na, nb, these rates must

be zero. We can then solve for N~k,λ as

N~k,λ =

(A

B

)(nb/na)

1− (nb/na)(16.56)

The equilibrium values of nb and na are determined by the Boltzmann factor, giving

nbna

= e−(Eb−Ea)/kBT = e−~ωk/kBT (16.57)

Thus, we find for the equilibrium distribution of photons,

N~k,λ =

(A

B

)1

e~ωk/kBT − 1(16.58)

This should agree with Planck’s formula for blackbody radiation, for fixed ~k, λ. Com-

paring with Planck’s formula, we conclude that A = B. The rate for spontaneous

emission of photons should thus be Spontaneousemission rate

A~k,λ = B~k,λ =πe2

ωkV

∣∣∣ 〈a| ε(λ) · ~pme−i

~k·~x |b〉∣∣∣2 δ(Ea − Eb + ~ωk) (16.59)

This is a very good argument, remarkable for its time, namely, for 1917, years

before quantum mechanics. But it is not quite satisfactory. Starting from the basics

of quantum theory, the progression of logic should be that we derive the sponta-

neous emission rate, as well as the stimulated emission rate, from first principles and

then derive Planck’s formula in turn. In the quantum theory of the electromagnetic

field, where Ai is treated as an operator, this has been carried out. The result, not

surprisingly, is in agreement with (16.59).

16.2 Absorption and emission of radiation 169

There is another facet of the result for emission which is worth emphasizing. Since

A = B, the emission rate is given by

Γab =∑~k,λ

B~k,λ (N~k,λ + 1) (16.60)

The emission rate is thus enhanced by the presence of photons in the ambient medium

of the same~k and polarizationλ. Thus if we have a large number of atoms in the excited Comment on

lasersstate, and we also have a number of photons of the same wave length and polarization,

the atoms tend to decay rapidly emitting photons of the same characteristics, thus

building up a large coherent superposition of photons. This enhancement, and the

corresponding build-up of the coherent state of large numbers of photons, which is

due to the stimulated emission part of the process, is the basis for lasers and masers.

16.2.5 The matrix element and selection rules

We have not yet discussed the evaluation of the matrix element in Γab or Γba. This

involves the momentum operator ~p and the factor e−i~k·~x, which involves the position

operator. For most atomic transitions, the matrix element can be evaluated in an

approximation by expanding e−i~k·~x as

e−i~k·~x = 1− i~k · ~x+

(−i)2

2(~k · ~x)2 + · · · (16.61)

The wave vector has a magnitude given by 2π/λ where λ is the wave length. For

most atomic transitions of interest, this is of the order a 100 nanometers or so. The

magnitude of ~x as calculated in various atonic matrix elements will be of the order of

the size of an atom, which is usually less than a nanometer. Thus the average value

of ~k · ~x is ≤ 1% or so. As a first approximation, it is therefore reasonable to replace

e−i~k·~x by 1, just the first term in the expansion (16.61). This is called the electric dipole

approximation. (There is still the operator ~p, so the contribution is related to a dipole-

type moment, hence the name.) The matrix element in the dipole approximation is Electric dipoleapproximation

then

〈a| ε(λ) · ~pme−i

~k·~x |b〉 ≈ 〈a| ε(λ) · ~pm|b〉

=∑i

ε(λ)i 〈a| pi

m|b〉 (16.62)

For the momentum operator, we can use

xiH0 −H0xi = i~pim

(16.63)

since H0 = (p2/2m)+ function of ~x. Taking the matrix element of (16.63), we find

〈a| pim|b〉 =

1

i~〈a|xiH0 −H0xi |b〉 =

1

i~(Eb − Ea) 〈a|xi |b〉

16.2 Absorption and emission of radiation 170

= −iωk 〈a|xi |b〉 (16.64)

where we have used the fact that Eb − Ea will be ~ωk because of the δ-functions in the

formulae for A~k,λ, B~k,λ. Using this result,

A~k,λ = B~k,λ ≈πe2ωkV

∣∣∣ 〈a| ε(λ) · x |b〉∣∣∣2 δ(Ea − Eb + ~ωk) (16.65)

The intensity of the emitted radiation is determined by the transition rate, so we

see that the matrix element of xi is directly related to the intensity of the spectral

line. There are certain conditions which must be satisfied for given choices |a〉 and

|b〉 for this matrix element to be nonzero. Such conditions are known as selection

rules. To obtain these for the dipole approximation, we note that we can write

xi = r(sin θ cosϕ, sin θ sinϕ, cos θ), which can be expressed in terms of the spherical

harmonics Y ml (θ, ϕ) as

x1 = r

√2π

3(Y 1

1 +Y −11 ), x2 = r

√2π

3(−i)(Y 1

1 −Y −11 ), x3 = r

√4π

3Y 0

1 (16.66)

For an atom like the Hydrogen atom, the wave functions are of the form

ψnlm = Rnl(r)Y ml (θ, ϕ) (16.67)

First of all, we notice that the matrix element with the same l-value for |a〉 and |b〉 is

zero. This is because ψnlm(−x) = (−1)lψnlm(x) as we have seen before. Hence∫ψ∗n′lm′(x)xi ψnlm(x) =

∫ψ∗n′lm′(−x) (−xi)ψnlm(−x)

= (−1)2l(−1)

∫ψ∗n′lm′(x)xiψnlm(x)

= −∫ψ∗n′lm′(x)xi ψnlm(x) (16.68)

In the first line we use a change of variables of integration to ~y = −~x, and then use

the transformation properties of the wave functions. Equation (16.68) shows that the

matrix element (being its own negative) should vanish and thus l′ = l transitions are

forbidden, i.e, have zero probability. More generally, notice that the matrix element

〈a|xi |b〉will have a factor involving integration over r and a factor involving integration

over all angles. The latter is of the form

M sab =

∫dΩ Y m′

l′ Y s1 Y

ml (16.69)

where s = 0 corresponds to x3, and linear combinations of s = ±1 will correspond

to x1, x2 according to (16.66). We know from the general theory of adding angular

momenta, the product Y s1 Y

ml will correspond to states obtained from |1, s〉 |l,m〉. This

16.3 Photoelectric effect/Photoionization 171

gives angular momentum eigenstates of form with j-values of l + 1, l and l − 1. Thus,

the orthogonality of the spherical harmonics will imply that the integral in (16.69) will

vanish unless l′ = l ± 1 or l. We have already seen that l→ l transitions are forbidden.

Thus we conclude that allowed transitions must have l′ − l = ∆l = ±1. As for the

m-values, since Y ml ∼ eimϕ, we see that the integration over ϕ will lead to vanishing

matrix element unless m′ = m+ s. Thus for nonzero transition probability, we need Selection rules

∆l = ±1, ∆m = ±1, 0 (16.70)

These are the selection rules for the electric dipole transition.

If we consider the next term in the expansion (16.61), it will generate two types

of contributions, identified as electric quadrupole and magnetic dipole transitions.

The selection rules for these will be different, some of the transitions forbidden at the

electric dipole level can have nonzero probability. However, since there is an additional

power of x, the intensities for these will be suppressed by a factor, which is the square

of the atomic size to the wavelength of the radiation. Higher multipoles result further

even higher terms in the expansion (16.61), but are even more suppressed.

16.3 Photoelectric effect/Photoionization

Another interesting effect arising from electron-photon interactions is photoelectric

effect, which refers to the ejection of an electron from a bound state by an incident

photon, or equivalently, we may think of this as ionization of an atom by a photon. It

is straightforward to calculate the rate and cross section for this process.

The initial state of the electron may be taken, for the purpose of an illustrative

calculation, as the ground state of a Hydrogen-like atom, so that the wave function is

ψ100(x) =

√Z3

πa3e−Zr/a (16.71)

In the final state, the electron is free, so it should be described as a plane wave. As

for the case of the electromagnetic waves, it is easier to consider these to be in a box

of volume V , so that normalizations are simpler. The free particle wave function, for

particle of momentum ~p, may be taken as

〈x|~p〉 = ψ~p(x) =1√Vei~p·~x/~ (16.72)

Again, as we did for the electromagnetic wave, we will impose periodic boundary con-

ditions, so that ~p = (2π/L)(m1,m2,m3) where mi are integers. These wave functions

obey the orthonormality condition∫d3x ψ∗~p′ ψ~p = δ~p,~p′ (16.73)

16.3 Photoelectric effect/Photoionization 172

The energy of this state is given by E~p = (~p2/2m). The relevant term of the interaction

Hamiltonian is again Hint = e−iωtV , with

V = − e

m

√~

2ωkVa

(λ)~kε(λ) · ~p ei~k·~x (16.74)

(~p here is still an operator, to emphasize this we have added a hat.) The transition rate

for |100〉 → |~p〉 is thus

Rate (|100〉 → |~p〉) =2π

~| 〈~p|V |100〉 |2 δ(E~p − E100 − ~ω) (16.75)

The relevant matrix element can be calculated as follows.

〈~p|V |100〉 = − e

m

√~

2ωkVa

(λ)~k

1√V

∫d3x e−i~p·~x/~ε(λ) · ~p

(ei~k·~xψ100(x)

)= − e

m

√~

2ωk

√Z3

πa3a

(λ)~k

1

Vε(λ) · ~p

∫d3x ei(

~k−~p/~)·~xe−Zr/a

= − e

m

√~

2ωk

√Z3

πa3a

(λ)~k

1

Vε(λ) · ~p 8πZ

a[q2 + (Z2/a2)]2(16.76)

where ~q = ~k − ~p/~. Squaring this and substituting in (16.75), we get

Rate =64π2e2

m2ωk(ε(λ) · ~p)2Z

5

a5

1

[q2 + (Z2/a2)]4

a(λ)∗~k

a(λ)~k

V2

δ(E~p−E100−~ωk) (16.77)

We can sum this over a small range of states for the electron with final values of

momentum ~p+ δ~p, essentially using a formula like (16.33) for large L,∑~p

f(~p)→ V∫

d3p

(2π)3~3f(~p) (16.78)

Further, we can do the integration over the magnitude of p because we have a δ-

function,

V∫

d3p

(2π)3~3f(~p)δ(E~p − E100 − ~ωk) = V dΩ

8π3~3

∫dp p2f(~p)δ(E~p − E100 − ~ωk)

= V dΩ

8π3~3mpf(~p) (16.79)

where we used the fact that mdE~p = pdp. The magnitude of p in (16.79) is to be

understood as√

2m(E100 + ~ωk), as dictated by the δ-function. Combining this result

with (16.77), we get

Rate =8e2

πmωkc

p (ε(λ) · ~p)2

~3

Z5

a5

1

[q2 + (Z2/a2)]4

ca(λ)∗~k

a(λ)~k

V

dΩ (16.80)

16.3 Photoelectric effect/Photoionization 173

Since the number of photons (of wave vector ~k and polarization λ) is a(λ)∗~k

a(λ)~k

, the

density of photons is this quantity divided by the volume V . Multiplying the density

a(λ)∗~k

a(λ)~k/V by c, we get the flux of photons. As we did for scattering, we can define a

cross section which is obtained by dividing the rate by the incoming flux. This gives Photoionization

cross section

dσ =8e2

πmωk c

p (ε(λ) · ~p)2

~3

Z5

a5

1

[q2 + (Z2/a2)]4dΩ (16.81)

Upon multiplication by the photon flux for any experimental set-up, this formula

gives the number of electrons which emerge in a given direction (θ, ϕ) relative to the

incoming direction of the photon. Needless to say, the result is in agreement with

experiments.

174

17 Transformations, pictures, etc.

17.1 Transformations and generators

Measurements we can carry out on a physical system always involve transformations

of externally controlled physical parameters. The mathematical representation of

these transformations is done via their action on states. We now turn to a more detailed

consideration of such transformations. Except for the case of time-reversal, which

we take up later, these transformations are represented as unitary operators of the

Hilbert space of states. We will start by considering some transformations which are

characterized by continuous parameters. The simplest one to start with is translations

in space. These are given by

~x→ ~x′ = ~x+ ~a (17.1)

Here we have three parameters a1, a2, a3, the three components of the vector ~a, corre-

sponding to changes in ~x along the three Cartesian directions. These are continuous

parameters, since we can change each xi by any amount. We can now ask how such

a transformation can be implemented on thee states or the wave function. The

good thing about continuous transformations is that we can consider infinitesimal

transformations, which will make the analysis simpler. Consider the wave function

ψα(~x) = 〈~x|α〉 of a state |α〉. If we shift ~x by ~a, the wave function is ψα(~x+ ~a). Taking aito be infinitesimal, we can expand this using Taylor’s theorem to first order and write

ψα(~x+ ~a) = ψα(~x) +∑i

ai∂ψα∂xi

+ · · ·

= ψα(~x)−∑i

aipii~ψα(~x)

=

(1 + i

~a · ~p~

)ψα(~x) (17.2)

In the second line of this equation, we have used the fact that a derivative of the

wave function can be represented as the action of the momentum operator ~p via Momentum asgenerator of

translations~pψ = −i~∇ψ. This shows that translations can be expressed via the action of the

momentum operator. We say that momentum is the generator of translations in space.

Denoting the change of the wave function as δψα, we see that we can also write

−i~∫ψ∗α δψα =

∑i

∫ψ∗α aipiψα (17.3)

This shows that the measurement of the expectation value of momentum is achieved

by an infinitesimal translation and comparing the change of the wave function to the

wave function. In other words, momentum is measured by the response of the system to

an infinitesimal translation in space.

17.1 Transformations and generators 175

We can obtain the result for a finite transformation, by considering it as built up of

a sequence of infinitesimal transformations. Considering the one-dimensional case

for simplicity, consider translations by an amount 2ε in x, i.e., x→ x+ 2ε. We the have

ψ(x+ 2ε) = ψ(x+ ε) + ε∂ψ

∂x

∣∣∣x+ε

=(

1 + iεp

~

)ψ(x+ ε)

=(

1 + iεp

~

)(1 + i

εp

~

)ψ(x)

=(

1 + iεp

~

)2ψ =

(1 + i

2εp

2~

)2

ψ (17.4)

Here we first consider shift by ε and then another shift by ε. The result is equivalent to

applying the combination(1 + i εp~

)twice to the wave function at x. The last expression

gives the result in terms of the total shift 2ε. If we do this several times, say N times,

with a total shift a = Nε, we get

ψ(x+ a) =(

1 + iap

N~

)Nψ (17.5)

Any error in using the infinitesimal formula (17.2) goes to zero if we take ε→ 0,N →∞keeping a = Nε fixed. Thus we obtain the formula for a finite translation as

ψ(x+ a) = limN→∞

(1 + i

ap

N~

)Nψ = exp

(i

~ap

)ψ (17.6)

We have used the standard result

limN→∞

(1 +

θ

N

)N= eθ (17.7)

Equation (17.6) shows that translations in space (by an amount a) can be implemented

on the wave functions by the action of the operator Unitaryoperator for

translationsU(a) = exp (iap/~) (17.8)

Since p is a hermitian operator, U is unitary. More generally, in three dimensions, we

get the result

ψα(~x+ ~a) = U(~a)ψα(~x), U(~a) = exp

(i

~~a · ~p

)(17.9)

Since−i~∇ψα = 〈~x| ~p |α〉, we can also write the action of translation on the abstract

state |α〉 as |α〉′ = U(~a) |α〉.Another interesting transformation is translation in time. How does a wave func-

tion respond under t→ t+ ε? As in the case of translations in space coordinates, we

can write

ψα(t+ ε) = ψα(t) + ε∂ψα∂t

+ · · · ≈(

1− i

~εH

)ψα (17.10)

17.1 Transformations and generators 176

where we have used the Schrödinger equation to write the time-derivative of ψα in

terms of the action of the Hamiltonian operator. Generalizing to a finite time-interval

as we did for spatial translations, Unitaryoperator for

time-translationψα(t+ τ) = U(τ)ψα(t), U(τ) = exp

(− i~τ H

)(17.11)

Thus translations in time are generated by the Hamiltonian with the corresponding

unitary operator U(τ) for finite time-intervals.

Rotations of the spatial coordinates form another set of transformations of interest.

In this case, we can construct the infinitesimal transformations as follows. To start

with an example, consider a point with coordinates (x1, x2, x3). In terms of the polar

angle ϕ on the (x1, x2)-plane, we can write x1 = r cosϕ, x2 = r sinϕ. Now consider a

rotation around the x3-axis by an angle θ3. (The subscript 3 is to specify that this is for a

rotation around the x3-axis.) The angle between the radius vector and the new x1-axis

is now (ϕ− θ3). Thus the new coordinates of the same point will be x′1 = r cos(ϕ− θ),

x′2 = r sin(ϕ− θ), while x3 remains unchanged. For small θ3, we can expand these as

x′1 = r cos(ϕ− θ3) = r cosϕ cos θ3 + r sinϕ sin θ3 ≈ r cosϕ+ r sinϕθ3

= x1 + θ3 x2

x′2 = r sin(ϕ− θ3) = r sinϕ cos θ3 − r cosϕ sin θ3 ≈ r sinϕ− r cosϕθ3

= x2 − θ3 x1

x′3 = x3 (17.12)

The change in ~x can thus be written for this case as

δxi =∑j

εij3 xj θ3 (17.13)

The generalization of this, including rotations around the other two axes, is

δxi =∑j,k

εijk xj θk (17.14)

It is now straightforward to work out how the wave functions change. Using Taylor

expansion, we get

ψ(x+ δx) = ψ(x) +∑i

δxi∂ψ

∂xi+ · · ·

≈ ψ(x) +∑i,j,k

εijkθk xj∂ψ

∂xi= ψ(x)−

∑i,j,k

εkjiθk xjpi

(−i~)ψ

= ψ(x)− i

~∑k

θk Lkψ =

(1− i

~∑k

θk Lk

)ψ (17.15)

17.1 Transformations and generators 177

where Lk is the angular momentum operator

Lk =∑i,j

εkijxipj (17.16)

We can also write the transformation for finite rotations as Angular

momentum asgenerators ofrotationsψ(x′) = U(θ)ψ(x), U(θ) = exp

(− i~∑k

θkLk

)(17.17)

These results show that the angular momentum is the generator of rotations, withU(θ)

in (17.17) as the unitary operator implementing this on the states or wave functions.

What we have found is that for every continuous transformation, we can define

a generator, which is an operator whose action on the states will implement the

transformation. For the examples we have discussed, the results may be summarized

as given in Table 17.1.

The case of spatial rotations needs some elaboration. So far we have only discussed

the rotations of coordinates, hence what we have done only captures the orbital

angular momentum. In the case of spin, we set the transformation of the wave

functions as in (17.17) but including spin as well. This can be viewed as the definition

of spin. The unitary operator U is thus more generally given as Unitaryoperator forrotations

U(θ) = exp

(− i~∑k

θk(Lk + Sk)

)= exp

(− i~∑k

θkJk

)(17.18)

Applied to states of spinning particles, this implies that there will also be mixing of the

different spin components of the wave functions. Thus in the case of spin-12 particles,

Table 17.1: Transformations and generators

Transformation Generator Unitary operator U

Space translation pi exp(i~aipi

)xi → xi + ai

Time translation H exp(− i

~τH)

t→ t+ τ

Spatial rotations Li exp(− i

~θiLi)

xi → xi +∑

j,k εijkxjθk

17.1 Transformations and generators 178

Si = ~σi/2 and the transformation rule is(ψ1(x′)

ψ2(x′)

)′=

[U11 U12

U21 U22

] (e−iθkLk/~ ψ1(x)

e−iθkLk/~ ψ2(x)

)(17.19)

where[U11 U12

U21 U22

]= exp

(− i~∑k

θkSk

)= exp

(− i

2

∑k

θkσk

)(17.20)

It is useful to consider the transformation properties of various quantities under

rotations in some more detail. Going back to (17.12), we can write the rotation for a

finite transformation asx′1

x′2x′3

=

cos θ3 sin θ3 0

− sin θ3 cos θ3 0

0 0 1

x1

x2

x3

(17.21)

We may write this in component notation as x′i =∑

j Rij xj , whereRij are the elements

of the 3×3 matrix in (17.21). More generally, we can write similar matrices for rotations

around the x1 and x2 axes as well. Thus generally we can say that

x′i =∑j

Rij xj (17.22)

For infinitesimal rotations, we can write

Rij ≈ δij +∑k

θkεijk (17.23)

in agreement with (17.14).

A set of three numbers (x1, x2, x3) which transform as given in (17.22) under rota-

tions is called a vector. This applies not just to the coordinates, every vector must have Vector isdefined byrotation

property

a similar transformation property. Thus the momentum of a particle ~p is a vector; this

means that under rotations of coordinates,

p′i =∑j

Rij pj (17.24)

with the same rotation matrixRij as for the transformation of coordinates. The angular

momentum itself is a vector, so we must also have

L′i =∑j

Rij Lj (17.25)

17.1 Transformations and generators 179

As is clear from (17.19), the wave functions of a spin- 12 particle transform differently,

given by Rotation of aspinor(

ψ1

ψ2

)′=

[U11 U12

U21 U22

] (ψ1

ψ2

)(17.26)

where we are only showing how the components get mixed under rotations. A set

of two complex numbers ψ1, ψ2 which transform in this way is called a spinor. In

other words. the two-component wave function of a spin-12 particle is a spinor. An

interesting property of spinors is is the double-valuedness under rotations. Consider a

rotation around the x3 axis again, by an angle θ3 = 2π. This is equivalent to a complete

rotation. From (17.21), we see that ~x′ = ~x; the vector returns to its original value. For a

spinor, the transformation matrix for a rotation around the third axis by angle θ3 is,

from (17.20),

U(θ3) = e−iθ3σ3/2 =

[e−iθ3/2 0

0 eiθ3/2

]= −1, for θ3 = 2π (17.27)

Thus a spinor will change sign under a full 2π rotation. This can be verified for Double-

valuedness of aspinorrotations around the other axes as well. We need to carry out a 4π rotation, a double

full rotation, to get back to the original value. Physical quantities involve matrix

elements of operators, which in turn involve two factors of such wave functions, more

precisely the wave function and its conjugate, so this double-valuedness does not

affect measurements.

It is also useful to express some of these transformation properties directly in terms

of operators as well. We have argued that the states transform as |α〉 → U |α〉. Consider

evaluating the matrix element of xi between transformed states U |α〉 and U |β〉. We

can write this as

〈β|U †xiU |α〉 =

∫d3x 〈β|x〉 〈x|

(U †xiU

)|α〉 =

∫d3x 〈β|x〉 Xi 〈x|α〉

=

∫d3xψ∗β Xi ψα (17.28)

In the first line, we inserted the identity∫d3x |x〉 〈x| = 1 (17.29)

between 〈β| and U †xiU , and Xi denotes the operator U †xiU in terms of its action on

the coordinates. Another way to evaluate the same expression is to insert the identity

(17.29) between U † and xi and use

〈β|U † |x〉 = ψ∗β(Rx), 〈x| xiU |α〉 = xi 〈x|U |α〉 = xi ψα(Rx) (17.30)

17.1 Transformations and generators 180

where Rx denotes the transformed x as in (17.22). This leads to

〈β|U †xiU |α〉 =

∫d3xψ∗β(Rx)xi ψα(Rx)

=

∫d3y ψ∗β(y) (R−1y)i ψα(y), using y = Rx

=

∫d3xψ∗β(x) (R−1x)i ψα(x) (17.31)

In the last line we renamed y as x, which can be done since this is just a variable of

integration. Comparing (17.28) and (17.31), we can conclude that

(U †xiU)i ψα(x) = (R−1x)i ψa(x) =∑j

R−1ij xj ψα(x) (17.32)

More generally, we can write this as an operator equation U †xiU =∑

j R−1ij xj . Since U

and U † are inverses of each other, we can also write this as

U(θ) xi U†(θ) =

∑j

Rij xj (17.33)

This may be taken as the definition of an operator which transforms as a vector. Thus, Vectoroperators

we also expect

U(θ) pi U†(θ) =

∑j

Rij pj , U(θ) Li U†(θ) =

∑j

Rij Lj (17.34)

With the definition of U in (17.18), this can be verified by direct computation using

the standard x, p commutators. Focusing just of the spin part, since S1 = ~σi/2, we

also expect

U(θ)σi U†(θ) =

∑j

Rij σj (17.35)

The matrix commutation rules

σi σj − σj σi = 2i∑k

εijk σk, (17.36)

which was given in equation (11.14), can be used to verify this directly. Notice that,

from (17.35), both U = 1 and U = −1 give the same R, namely, R = 1, in agreement

with the double-valuedness at the spinor level, but not for the vectors.

In writing down (17.33, 17.34), we have taken the transformation to correspond

to a rotation. But this can be done more generally. Thus if U(ξ) denotes the unitary

operator for a transformation characterized by parameters ξ (there could be many

such parameters), then we can write the transformation of an operator A as Generaltransformation

of an operatorU(ξ) A U †(ξ) = A′(ξ) (17.37)

17.2 Schrödinger, Heisenberg and Dirac pictures 181

where A′ denotes the transformed version of the operator. This will depend on ξ as

well, as indicated. (For xi and rotations, x′i =∑

j Rij xj as in (17.33).)

While we have focused on translations (in space and time) and on rotations, there

are many other continuous transformations of interest. For example, in classical

physics, we can make Galilean transformations which connect physics in two frames

of reference moving at constant velocity relative to each other. (In a relativistic theory

these would be Lorentz transformations.) Since velocities can be continuously varied,

these constitute another set of continuous transformations of interest. How do we

represent these in the quantum theory? Again, there is a unitary operator we can

construct to implement these as transformations on the states or wave functions.

17.2 Schrödinger, Heisenberg and Dirac pictures

We have seen in equation (17.11) that the time-evolution of a physical system can be

represented as

ψα(t+ τ) = U(τ)ψα(t), U(τ) = exp

(− i~τ H

)(17.38)

By taking t = 0 and then replacing τ by t, we can write this equation as

ψα(t) = exp

(− i~tH

)ψα(0) (17.39)

In terms of abstract states, this can be written as

|α, t〉 = U(t) |α, 0〉 , U(t) = exp

(− i~tH

)(17.40)

If we take the inner product of |α, t〉with 〈x| and use the fact that 〈x| pi |α〉 = −i~∇〈x|α〉 =

−i~∇ψa and 〈x|xi |α〉 = xiψα, we find

ψα(x, t) = exp

(− i~tH

)ψα(x, 0) (17.41)

where H is now a differential operator (with p→ −i~∇, x→ x in the abstract Hamilto-

nian) acting on the wave function.

This is the description we have used so far, the states change with time according

to (17.40) or, equivalently, according to the Schrödinger equation. Operators are Schrödinger

pictureindependent of time. This way of formulating the dynamics of a physical system is

known as the Schrödinger picture.

Our discussion of how a change due to some transformation can be viewed in

terms of operators directly, as in the case of rotations in (17.33, 17.34), shows that we

can describe time-evolution directly in terms of operators as well. The key point is

that observations pertain to matrix elements of operators. Thus consider the matrix

17.2 Schrödinger, Heisenberg and Dirac pictures 182

element between states |α〉 and |β〉 of a generic operator A. At time t = 0, this is

evidently

Aαβ = 〈α, 0| A |β, 0〉 (17.42)

At time t, this matrix element becomes

Aαβ(t) = 〈α, t| A |β, t〉 = 〈α, 0| U †(t)A U(t) |β, 0〉

= 〈α, 0| A(t) |β, 0〉 (17.43)

A(t) = U †(t)A U(t) = eitH/~Ae−itH/~ (17.44)

In the first line of this equation, we used the time-evolution of the states, but the

second line shows that we can get the same matrix element, and hence the same

physics, by taking states to be fixed, i.e., not varying with time, and taking operators to

vary with time according to (17.44). This way of describing the dynamics is known as Heisenberg

picturethe Heisenberg picture. The analogue of the Schrödinger equation in this case is the

evolution equation for operators. By direct differentiation we get

i~∂A

∂t= i~

∂U †

∂tA U + U †A i~

∂U

∂t

= U †(AH −HA

)U

= AH −HA = [A,H] (17.45)

In obtaining this result, we used i~∂U/∂t = HU and the fact that HU = UH . The Heisenberg

equation ofmotiontime-evolution equation for an operator as in (17.45) is known as the Heisenberg

equation of motion.

Both pictures are completely equivalent and either can be used to analyze any

physical problem. Which picture is convenient may depend on the physical context.

For a number of nonrelativistic systems, the Schrödinger picture may seem easier.

In this picture, there is a difference in how spatial coordinates and time are treated.

In the relativistic case, where we have transformations which can connect spatial

coordinates and the time coordinate, it is often easier to use the Heisenberg picture.

But again, both pictures are completely equivalent and one can either one, albeit at

the cost of extra work for some cases.

There is yet another way of describing dynamics for the situation where the Hamil-

tonian is of the formH = H0+Hint. In this case, we can have an intermediate situation

where operators evolve according to H0 while the evolution of states is done via the

interaction part of the Hamiltonian. Going back to the matrix element in (17.44), we

write it as follows.

Aαβ(t) = 〈α, t| A |β, t〉 = 〈α, 0| U †(t)A U(t) |β, 0〉

17.2 Schrödinger, Heisenberg and Dirac pictures 183

= 〈α, 0| U †(t)U0(t)[U †0(t)AU0(t)

]U †0(t)U(t) |β, 0〉

= I 〈α, t| A(t)I |β, t〉I (17.46)

|β, t〉I = U †0(t)U(t) |β, 0〉 = S(t) |β, 0〉

AI(t) = U †0(t)A U0(t) (17.47)

In the second line of this equation, we inserted U0(t)U †0(t) = 1 twice, where U0(t) =

e−itH0/~ is the time-evolution operator with the free unperturbed Hamiltonian. We

then rearranged terms to define AI and S(t). Notice that

i~∂S

∂t= i~

[∂U †0∂t

U(t) + U †0∂U

∂t

]= U †0

(−H0 +H

)U = U †0HintU0 S

= HI(t)S (17.48)

HI(t) = eitH0/~Hint e−itH0/~

These equations coincide with (16.6), showing that S is the same operator as what we

have used in time-dependent perturbation theory. Th evolution of operators is given

in this picture by (17.47). Differentiating it with respect to time, we find

i~∂AI∂t

= AI(t)H0 −H0AI(t) = [AI(t), H0] (17.49)

Notice that HI(t) in (17.48) may be thought of as time-evolved version of Hint. This Interactionpicture/ Diracpicturepicture is clearly convenient to separate out the effect of interactions and hence, not

surprisingly, is known as the interaction picture. Since its formulation goes back to

Dirac’s work, it is also often referred to as the Dirac picture.

The properties of the three pictures can be summarized as shown in Table 17.2.

Table 17.2: The three pictures

Picture Evolution of states Evolution of operators

Schrödinger i~ ∂∂t |α〉 = H |α〉 ∂

∂tA = 0

Heisenberg ∂∂t |α〉 = 0 i~ ∂

∂tA = [A,H]

Dirac i~ ∂∂t |α〉 = S(t) |α〉 i~ ∂

∂tA = [A,H0]

(Interaction)

17.3 Symmetries and conservation laws 184

17.3 Symmetries and conservation laws

In (17.37), we have given the formula for how an operator transforms under a generic

transformation. For any physical system, we have a set of dynamical variables such

as the positions and momenta, maybe spin as well, but it is the Hamiltonian which

specifies the system completely. For example, the free electron and the electron

bound to the nucleus in an atom both have (~x, ~p, ~S) as the dynamical variables, but

the distinction arises from the fact that H = p2/2m for the free electron, while H =

(p2/2m) − Ze2/r for the electron in the atom. An important question is: How does

the Hamiltonian change under a physically relevant transformation? Since H is an

operator, this is also specified by (17.37), so we can write

U(ξ)H U †(ξ) = H ′ (17.50)

Definition of

symmetry

Definition 17.1 A transformation is said to be a symmetry if it leaves the Hamiltonian

unchanged. Thus U(ξ) represents a symmetry transformation if

U(ξ)H U †(ξ) = H (17.51)

This definition is actually less general than is needed, but we will work with this for

now. Later we will modify this to get a more general statement of symmetry.

Since U † = U−1, we can rewrite the condition (17.51) of symmetry as

U(ξ)H −H U(ξ) = 0 (17.52)

Thus a symmetry transformation commutes with the Hamiltonian. Further, if we

consider the time-evolution of an operator in the Heisenberg picture, we see that

(17.52) is equivalent to

i~∂U

∂t= U(ξ)H −H U(ξ) = 0 (17.53)

Thus U is a conserved operator, if it is a symmetry; i.e., its matrix elements do not

change with time. In the case of a symmetry with continuous parameters like transla-

tions and rotations, we can writeU in thee formU = exp(−iξG/~), for some hermitian

operatorG. In this case, we can consider the infinitesimal version ofU as 1−iξG/~+· · · .Using this in (17.52), we get

GH −H G = 0 (17.54)

In conjunction with (17.53), this immediately gives the following result.

17.3 Symmetries and conservation laws 185

Theorem 17.1 Noether’s theorem: For every continuous symmetry transformation Noether’s

theoremU = e−iξG/~, the corresponding generator G is a conserved operator; its matrix

elements are independent of time.

Noether’s theorem was first proved in classical mechanics; this is the corresponding

statement in the quantum theory.

It is useful to see some examples of how this works out. Consider the free particle

Hamiltonian, H = p2/2m. The Hamiltonian does not involve the position operators,

so it is clearly invariant under spatial translations xi → xi + ai. Thus U = exp(i~a · ~p/~)

is a symmetry transformation, with

exp(i~a · ~p/~)H exp(−i~a · ~p/~) = H (17.55)

Considering infinitesimal values of ai, we find piH − Hpi = 0. Therefore by the

Heisenberg equation of motion,

i~∂~p

∂t= ~pH −H ~p = 0 (17.56)

Thus momentum is a conserved quantity for the free particle.

Further, the Hamiltonian p2/2m only involves the magnitude of the momentum

vector. So it is invariant under rotations. This means that

e−iθkLk/~H eiθkLk/~ = H (17.57)

This leads to

i~∂Lk∂t

= LkH −H Lk = 0 (17.58)

Thus angular momentum is also conserved for the free particle.

As another example, consider the Hamiltonian for a Hydrogen-like atom,

H =p2

2m− Ze2

r(17.59)

Clearly, the potential energy depends on the position via r =√x2

1 + x22 + x2

3. Thus

spatial translations are not a symmetry and hence, for this system, momentum is not

conserved. However, only the magnitudes of the vectors ~p and ~x enter the Hamiltonian,

so it is invariant under rotations. Thus we can conclude that angular momentum is

conserved.

There is another important facet of symmetries and conservation laws, related

to the degeneracy of states. First of all, a conserved quantity commutes withe the

Hamiltonian. It may happen that there are several conserved quantities which do not

17.3 Symmetries and conservation laws 186

commute among themselves. For example, we know that [Li, L−j ] 6= 0 for i 6= j. By

considering all conserved quantities, we can arrive at a set of mutually commuting

operators (which also commute with the Hamiltonian). We can simultaneously diago-

nalize this set. Thus eigenvalues of conserved quantities can be used to label energy

eigenstates.

Now consider an eigenstate of the Hamiltonian labeled by a set of numbers a,

H |a〉 = Ea |a〉 (17.60)

IfU is a symmetry transformation, then we haveHU = UH according to the definition

(17.51). Applying this on the state |a〉, we get

H U |a〉 = UH |a〉 = U Ea |a〉 = Ea U |a〉 (17.61)

This shows that U |a〉 is an eigenstate of H with the same eigenvalue Ea. One can

repeat with different choices of U (or different choices of the parameters ξ in U(ξ)) to

get a number of states with the same eigenvalue. In fact, this tells us that all states

which can be obtained from each other by the action of U ’s will have the same energy

eigenvalue. We will refer to such a set of states which can be obtained from each other

by the application of the symmetry transformations U(ξ) as a multiplet. So the basic

result can be rephrased as follows. Symmetry and

degeneratestates

Theorem 17.2 If the Hamiltonian of a physical system has a symmetry, then the en-

ergy eigenstates fall into degenerate multiplets, with the states within each multiplet

connected by the symmetry transformation.

It may happen that, for some U ’s, U |a〉 is not linearly independent of |a〉, so even

though we may have an infinity of choices for the parameters ξ and hence for U(ξ),

we may only get a finite number of distinct states within each degenerate multiplet.

Again, we will consider the Hydrogen-like atom (17.59) as an example. We have

seen that the energy eigenstates can be labeled by n, l,ml, ignoring spin for the mo-

ment, with the corresponding eigenvalues

Enlml= −Z

2e2

2a

1

n2(17.62)

For n = 1, we have l = 0, giving just one state. For n = 2, we can have l = 0 and l = 1.

All the three states for l = 1, i.e., for ml = 0,±1, are degenerate. They form a multiplet

of three states with the same energy. They are all related to each other by rotations,

consistent with the theorem. Notice that we can go from one value of ml to another by

the application of L+ or L− (which are generators of rotations) appropriately as many

times as needed. Thus all states with different ml but the same l (and of course same

17.4 Discrete symmetries 187

n) are connected by rotations. For the Hamiltonian (17.62), it so happens that the state

|200〉, which is not connected to any |21ml〉 by rotations, also has the same energy.

This is because the Hamiltonian (17.59) has a larger symmetry, which is not so evident

as the rotational symmetry. There is another conserved operator, the Runge-Lenz

vector, which is given by Runge-Lenzvector

Ri =1

2m

∑j,k

εijk(pjLk + Lkpj)− Ze2xir

(17.63)

One can verify by direct computation that [Ri, H] = 0. Further, the state |200〉 is

connected to the set of states |21ml〉 by the action of Ri. More generally, (17.62) shows

that we have the same energy for all values of l,ml for the same n. The degeneracy of

all |nlml〉 for the same n, l can be understood as being due to the rotational symmetry.

The degeneracy of different l values for the same n is due to the Runge-Lenz vector.

This extra symmetry or conservation law is special to the Hamiltonian (17.62). If

the Hamiltonian is modified by adding perturbations such as the spin-orbit coupling

or relativistic corrections, etc. the Runge-Lenz vector is no longer conserved and we

lose the symmetry. Correspondingly, the degeneracy of the states |nlml〉 for the same

n but different values of l is lifted. For example, the spin-orbit coupling was given in

(13.55) as

Hint =Ze2

2m2c2r3~L · ~S (17.64)

This is still rotationally symmetric, since ~J = ~L + ~S commutes with this. Thus we

expect the degeneracy of the states for different ml with the same l to be retained. But

since the Runge-Lenz vector is no longer conserved, the degeneracy of different l’s

for the same n will be lifted. The first order correction to the energy was obtained in

(13.58) as

∆E(1)n.l.ml,ms

=Ze2~2

4m2c2a3n3

[j(j + 1)− l(l + 1)− 3

4

l(l + 1)(l + 12)

](17.65)

As expected this shows that the energies for different values of l are different.

17.4 Discrete symmetries

We will now discuss two discrete symmetries which are important for physical systems.

17.4.1 Parity

The first is parity, which corresponds to a spatial reflection of all coordinates which

may be written in Cartesian coordinates as

P : ~x→ −~x (17.66)

17.4 Discrete symmetries 188

The letter P is usually used to denote the parity transformation. Since momentum is

classically of the form md~x/dt, the parity transformation of ~p should also give a sign

change,

P : ~p→ −~p (17.67)

The parity properties of various other quantities of interest may be obtained from

these. For example, the orbital angular momentum operator is Lk = εijkxipj , so it

does not change sign under parity.

In the quantum theory, corresponding to P , we can define a unitary operator UPsuch that Parity

transformation

UP ~x U†P = −~x, UP ~p U

†P = −~p

UP ψ(~x) = ψ(−~x) (17.68)

If we act with UP twice we should get the identity, since this corresponds to no action

at all. Thus we expect

U2P ψ(~x) = ψ(~x), =⇒ U2

P = 1 (17.69)

The eigenvalues of the parity operator UP should thus be±1. States with UPψ = −ψare said to be odd under parity, those with UPψ = ψ are said to be even. If a system

has a Hamiltonian which is parity even, then we can classify the eigenstates of the

Hamiltonian according to the parity property, in addition to other quantum numbers.

For example, for the one-dimensional harmonic oscillator, the Hamiltonian

H =p2

2m+mω2 x2

2(17.70)

is evidently unchanged under parity. The eigenstates have definite parity transforma-

tion properties: for odd/even values of n, the wave functions 〈x|n〉 are correspondingly

odd/even. Likewise, the Hamiltonian for the Hydrogen-like atom

H =p2

2m− Ze2

r(17.71)

is even under parity. The eigenstates of H are also eigenstates of UP . We have already

seen that

UP ψnlml(~x) = ψnlml

(−~x) = (−1)l ψnlml(~x) (17.72)

Classical electromagnetic theory has parity symmetry if we take the magnetic

vector potential to be odd under parity. This is clear from the Maxwell equations. The

17.4 Discrete symmetries 189

current is of the form ed~x/dt for a charged particle, so it changes sign under parity.

The Maxwell equation

∇× ~B − 1

c

∂ ~E

∂t= ~J (17.73)

tells us that ~E → − ~E under parity and ∇ × ~B → −∇ × ~B. Since ~E ∼ −∂ ~A/∂t, we Parity forelectromagnetic

fieldexpect ~A→ − ~A. This also gives ~B → ~B, since ~B = ∇× ~A. Thus ~B does not behave like

a normal vector, it is an axial vector, transforming as a vector under rotations, but not

changing sign under parity. This is also consistent with∇× ~B → −∇× ~B, as required

by (17.73). Notice that for a uniform magnetic field ~A = 12~B × ~x, consistent with these

transformations.

The Hamiltonian for the electron in an atom with a uniform magnetic field is given

by

H =1

2m

[~σ ·(~p− e

c~A)]2− Ze2

r(17.74)

The Pauli matrices are part of the spin vector ~S, which is part of the angular momen-

tum, so it is even under parity. It is then easy to see that H is invariant under parity.

This means that states of different parity will not be mixed under perturbations due

to the magnetic field. This can be verified from the Zeeman effect results we have

obtained earlier.

Parity properties of states play a role in determining whether a matrix element

vanishes, i.e., they have implications for selection rules. We have already used this in

the context of absorption and emission of radiation to note that

〈n′l′m′l| ~x |nlml〉 = (−1)l+l′+1 〈n′l′m′l| ~x |nlml〉 (17.75)

so that we need l + l′ + 1 = even integer for a nonvanishing transition. (Angular

momentum theory then refined this to ∆l = ±1.)

17.4.2 Time-reversal

In classical mechanics the equations of motion are of the second order in time-

derivatives. As a result, for many systems, time-reversal t → −t is a symmetry in

the sense that given a trajectory for a particle, the time-reversed trajectory (of the

particle going back along the trajectory) is also a solution. An exception would be if we

have an external magnetic field, since the Lorentz force does not have this symmetry.

This is clear from the equations of motion

md2~x

dt2= e ~E +

e

c

d~x

dt× ~B (17.76)

17.4 Discrete symmetries 190

Quantum mechanically, time-reversal is a somewhat peculiar transformation. This

is because the Schrödinger equation tells us that

i~∂ψ

∂t= H ψ (17.77)

We cannot expect this to be unchanged with H → H , since the left hand side changes

under t → −t. We can only hope to map this equation to its complex conjugate

equation. Time-reversal therefore cannot be a linear operation. Mapping a wave

function to its conjugate is still acceptable, since only the absolute square of ψ or

absolute squares of matrix elements are observable as probabilities.

Towards defining the time-reversal operation in the quantum theory, we first note

the expected properties of the basic operators. Obviously we expect ~x→ ~x. Since the

momentum involves the time-derivative of ~x, we define

T : ~p→ −~p (17.78)

(We use the symbol T for time-reversal.) In turn, (17.78) implies that T : ~L → −~L.

Since spin is part of angular momentum, we should also have ~S → −~S under T .

Now we consider the Schrödinger equation for a particle in a potential V (~x),

i~∂ψ

∂t=

[− ~2

2m∇2 + V

]ψ (17.79)

The time-reversed equation reads

−i~∂ψT

∂t=

[− ~2

2m∇2 + V

]ψT (17.80)

Since the Hamiltonian is not only self-adjoint, but is also real as a differential operator,

i.e., H∗ = H , we have from taking the complex conjugate of (17.79),

−i~∂ψ∗

∂t=

[− ~2

2m∇2 + V

]ψ∗ (17.81)

We thus see that we can get the same dynamics if we identify Time-reversal

for ψ

ψ(−t) ≡ ψT = ψ∗(t) (17.82)

Notice that we can have a linear combination of two wave functions, say, ψ = c1ψ1 +

c2ψ2, each of which obeys the Schrödinger equation. The property (17.82) requires

that

ψT = ψ∗ = c∗1ψ∗1 + c∗2ψ

∗2 = c∗1 ψ

T1 + c∗2 ψ

T2 (17.83)

showing that the time-reversal transformation is not linear, but anti-linear. (For a Anti-linearity of

Tlinear transformation, the right hand side should have been c1ψ

T1 + c2ψ

T2 .)

17.4 Discrete symmetries 191

Now consider the case of a charged particle with an additional background mag-

netic field. The Schrödinger equation is now

i~∂ψ

∂t=

[− ~2

2m

(∇− ie

c~A)2

+ V

]ψ (17.84)

The time-reversed equation is

−i~∂ψT

∂t=

[− ~2

2m

(∇− ie

c~A)2

+ V

]ψT (17.85)

while the complex conjugate of (17.84) is

−i~∂ψ∗

∂t=

[− ~2

2m

(∇+ i

e

c~A)2

+ V

]ψ∗ (17.86)

The time-reversal operation defined as ψT = ψ∗ is no longer a symmetry since there Backgroundmagnetic fieldbreaks Tinvariance

is a change of sign for the ~A-dependent term in (17.86). Thus a background magnetic

field breaks time-reversal symmetry, consistent with the expectation fro the classical

theory.

In the case of a particle with spin, we must ensure that the time-reversal operation

also gives the required property ~S → −~S. For a spin- 12 particle, the wave functions are

two-component spinors Ψ, with the action of spin as

Si Ψ = Si

(ψ1

ψ2

)= ~

σi2

(ψ1

ψ2

)(17.87)

The Pauli matrices are not real, so we have

(σiΨ)∗ =

σ1Ψ∗ i = 1

−σ2Ψ∗ i = 2

σ3Ψ∗ i = 3

(17.88)

Since σ2σ1 = −σ1σ2, σ2σ3 = −σ3σ2, we see that

σ2(σiΨ)∗ = −σi(σ2Ψ∗) (17.89)

This shows that we can define the time-reversal transformation for a spin-12 wave

function as Time-reversalfor spin- 1

2

ΨT = σ2Ψ∗ (17.90)

It is useful to consider this transformation for the Schrödinger equation for a

charged partilce with spin for which the Hamiltonian was obtained in(11.43), The

Schrödinger equation reads

i~∂Ψ

∂t=

[p2

2m− e

2mc

∑k

(Lk + 2Sk)Bk +e2

2mc2

∑i

AiAi + V

17.4 Discrete symmetries 192

=

[− ~2

2m∇2 + i

e~2mc

~B · (~x×∇)− e~2mc

~B · ~σ

+e2

2mc2

∑i

AiAi + V

]Ψ (17.91)

The time-reversed equation is

−i~∂ΨT

∂t=

[− ~2

2m∇2 + i

e~2mc

~B · (~x×∇)− e~2mc

~B · ~σ

+e2

2mc2

∑i

AiAi + V

]ΨT (17.92)

Taking the complex conjugate of (17.91) and multiplying by σ2, we get

−i~∂(σ2Ψ∗)

∂t= σ2

[− ~2

2m∇2 − i e~

2mc~B · (~x×∇)− e~

2mc~B · ~σ∗

+e2

2mc2

∑i

AiAi + V

]Ψ∗

=

[− ~2

2m∇2 − i e~

2mc~B · (~x×∇) +

e~2mc

~B · ~σ

+e2

2mc2

∑i

AiAi + V

](σ2Ψ∗) (17.93)

As expected, both terms coupling angular momentum to magnetic field change sign,

consistent with ~L, ~S → −~L,−~S. We do not have time-reversal symmetry since ΨT and

σ2Ψ∗ do not obey the same equation.

The anti-linear property of time-reversal can be expressed in terms of matrix

elements as follows. Consider the inner product of |a〉 and |b〉. For the time-reversed

states, we can write

〈aT |bT 〉 =∑i=1,2

∫(ψTa )∗i (ψTb )i =

∑i

∫(σ2Ψ∗a)

∗i (σ2Ψ∗b)i

=∑ijk

∫(Ψa)j(σ2)∗ij (σ2)ik(Ψ

∗b)k =

∑i

∫(Ψ∗b)i (Ψa)i = 〈b|a〉 (17.94)

Notice the exchange of the states |a〉 and |b〉. Since | 〈a|b〉 |2 = 〈a|b〉 〈b|a〉, we see that

| 〈aT |bT 〉 |2 = | 〈a|b〉 |2, so that probabilities are the same for the time-reversed states

and the original ones.

Another important property of time-reversal for spin-12 particles is that T 2 = −1.

This is evident from the following.

(ΨT )T = (σ2Ψ∗)T = σ2(σ2Ψ∗)∗ = σ2(−σ2) Ψ = −Ψ (17.95)

17.4 Discrete symmetries 193

Consider now the eigenvalue equation for any Hamiltonian which gives time-reversal

symmetry for the Schrödinger equation as discussed above,

H Ψa = Ea Ψa (17.96)

Taking complex conjugate and identifying ΨT = σ2Ψ∗, we find

H ΨT = Ea ΨT (17.97)

Thus ΨT is also an eigenstate with the same eigenvalue. We cannot immediately con-

clude that we have a degenerate state, because it may happen that ΨT is proportional

to Ψ and hence it is not a different state. But for the spin- 12 case, this is not possible by

the following argument. Assume ΨT = cΨ for some constant c. Then

−Ψ = (ΨT )T = (cΨ)T = σ2(cΨ)∗ = c∗σ2Ψ∗ = c∗ΨT = c∗cΨ (17.98)

Thus we need c∗c = −1 which is not possible. Thus for the spin- 12 case, ΨT provides a

new state degenerate with the original one. We have at least a two-fold degeneracy.

When we have many spin- 12 particles, we can get a factor of (−1) for each. Thus for

even number of spin-12 particles this argument does not work, but it does hold for an

odd number of spin-12 particles. We may restate this as follows.

Theorem 17.3 The energy eigenstates of a time-reversal invariant physical system Kramers’

theoremwith an odd number of spin-1

2 particles must have at least two-fold degeneracy.

This result is known as Kramers’ theorem. For the case of a single spin-12 particle, this

degeneracy is the degeneracy of the two spin states. A magnetic field will not preserve

time-reversal symmetry and will lift this degeneracy, as we already know from the

spin-dependent part of Zeeman effect.