Physics 7C Spring 2015 Discussion Section Notes · Physics 7C Spring 2015 Discussion Section Notes Kevin T. Grosvenora;b aBerkeley Center for Theoretical Physics and Department of

Physics 7C Spring 2015 Discussion Section Notes

Kevin T. Grosvenora,b

aBerkeley Center for Theoretical Physics and Department of Physics

University of California, Berkeley, CA, 94720-7300, USA

bTheoretical Physics Group, Lawrence Berkeley National Laboratory

Berkeley, CA 94720-8162, USA

Abstract: Some discussion section notes for Physics 7C.

Contents

1. Vectors 1

2. The Wave Equation 5

3. Solving the Wave Equation 7

3.1. Electromagnetic Plane Waves 10

4. Poynting Vector and Flux 11

4.1. Red Laser Pointer 12

5. Ray Tracing Diagrams for Mirrors 13

6. Ray Tracing Diagrams for Lenses 14

7. Compound Optical Systems 16

7.1. Two-Lens Problem 16

7.2. Two-Lens Demonstration 17

8. Midterm 1 Quiz 22

9. Interference 25

9.1. Laser Wavelength Measurement via Metal Ruler 25

10.Thin-Film Interference 27

11.Relativity 28

11.1. How to Measure the Length of a Moving Object 28

11.2. Relativistic Train 30

11.3. Passing Trains 32

12.Midterm 2 Quiz 36

13.Energy and Momentum 40

13.1. 4-Vectors 40

13.2. Colliding Photons 42

14.Quantum Mechanics 45

14.1. The Wacky World of the Double Slit 45

14.2. Blackbody Radiation and the Ultraviolet Catastrophe 47

14.3. Stephan-Boltzmann Law 48

14.4. Bohr Model 49

14.5. Time-Evolution in 1D Infinite Square Well 53

– i –

15.Final Review 57

15.1. Human Eye Optics 57

15.2. Optical Fiber 58

15.3. Modified Michelson Interferometer 59

15.4. Diffraction Grating 61

15.5. Optical Spectroscopy 62

15.6. Relativity and Current-Carrying Wires 64

15.7. Pi Decay 66

15.8. Relativistic Doppler Effect 69

15.9. Quantum Tunneling and Frustrated Total Internal Reflection 70

15.10.Wavefunction Shapes 73

16.Final Exam Solutions 75

16.1. The Pole Vaulter Paradox 75

16.2. Pion Decay 77

1. Vectors

For our purposes, a vector will be something that has several components (usually three;

or four in relativity). We must be able to add two vectors (component by component)

and we must be able to multiply a vector by a real number. There are a slew of other

requirements, but they are usually trivially satisfied, at least for the main vector space

we will care about: R3, or Rn for general dimension. The symbol Rn means the set of

all n-component expressions, A = (A1, A2, . . . , An), such that each component is a real

number (i.e. Ai ∈ R for i = 1, . . . , n.)

In three-dimensional space, let us replace x, y, z with x1, x2, x3 in order to make it

easier to generalize to any dimension. We often denote the unit vector in the direction of

xi by xi, whose components are all zero except for the ith one, which is a one. Then, A

may be written, in a general dimension, n,

A =n∑i=1

Aixi. (1.1)

So that we don’t have to keep writing summation signs everywhere, we will usually fol-

low Einstein’s convention that repeated indices are summed over, unless stated otherwise.

Then, (1.1) becomes neater:

A = Aixi. (1.2)

The dimensionality is nowhere to be found now, so you must make sure you know what it

is from context.

– 1 –

Next, we introduce the Kronecker delta symbol:

δij =

{1 if i = j,

0 if i 6= j.(1.3)

Rn has what’s called an inner product structure. This is a map (·, ·) : Rn×Rn → R. That

is, you take two vectors, put one in the first slot and the other in the second slot of (·, ·),and you will get a real number, called their inner product. It is also often called their dot

product, especially in three dimensions, and we will denote it by A ·B. It is defined by

A ·B = δijAiBj = AiBi, (1.4)

where you need to keep in mind that repeated indices are summed over.

In addition, R3 has a special structure called a cross product. This is given by a map

(· × ·) : Rn ×Rn → Rn, so it takes two vectors and spits out another vector.1 In order to

talk about cross products, we must introduce the Levi-Civita symbol, and to do that, we

must understand cyclic indices. There is a mnemonic for this. Think of a clock that goes

from 1 through 3 instead of 1 through 12. Starting at any of the numbers, if you traverse

the clock in a clockwise fashion, then the order is declared to be cyclic. If you traverse the

clock in the counter-clockwise direction, then the order is anti-cyclic. So, (123), (231) and

(312) are cyclic, whereas (132), (213), (321) are anti-cyclic. By the way, and for example,

(231) is the permutation that sends 1 (the first slot) to 2 (the first number appearing), 2

to 3 and 3 to 1.

The Levi-Civita symbol is defined to be

εijk =

1 if (ijk) is cyclic,

−1 if (ijk) is anti-cyclic,

0 if any index is repeated.

(1.5)

Roughly speaking, the Levi-Civita symbol is to the cross product what the Kronecker delta

is to the dot product:

A×B = εijkxiAjBk. (1.6)

Let us take a moment to ensure that this definition of the cross-product agrees with the

definition of the cross-product that we are likely to have learned before, namely

A×B = (AyBz −AzBy)x + (AzBx −AxBz)y + (AxBy −AyBx)z. (1.7)

To aid in comparison, let us first rewrite (1.7) in terms of our new notation where x is

replaced with x1, and y with x2 and so on. Also, the x-component is the 1-component,

the y-component is the 2-component and so on. Then,

A×B = (A2B3 −A3B2)x1 + (A3B1 −A1B3)x2 + (A1B2 −A2B1)x3. (1.8)

1Actually, the result of a cross product is what’s called a pseudovector, but no matter.

– 2 –

Now, let us expand out the right hand side of (1.6) to see that it is in fact the same as the

right hand side of (1.8):

εijkxiAjBk = ε123x1A2B3 + ε132x1A3B2 + ε231x2A3B1 + ε213x2A1B3

+ ε312x3A1B2 + ε321x3A2B1

= x1A2B3 − x1A3B2 + x2A3B1 − x2A1B3 + x3A1B2 − x3A2B1. (1.9)

Here, we used ε123 = ε231 = ε312 = 1 and ε132 = ε213 = ε312 = −1. Now, it is quite easy to

see that (1.8) is the same as (1.9) just organized slightly more neatly.

Okay, so far all we have done is introduce a bunch of notation in order to express the

dot product and the cross product more compactly. However, this notation is actually

useful once we have to deal with multiple products (like multiple cross-products). For such

purposes, the following identity is very useful:

εijkεi`m = δj`δkm − δjmδk`. (1.10)

This identity is actually reasonably easy to understand. Remember that i, j and k have

to take on different values or else the Levi-Civita symbol would be zero anyway (also i, `

and m have to take on different values). Well, since they can only take on three different

values, namely 1, 2 or 3, either j is equal to ` and k is equal to m or ` is equal to m and

k is equal to `. There just aren’t any other possibilities! That’s what the right hand side

of the equation says, except for the signs, which you can figure out by plugging in some

specific set of values for the indices, say i = 1, j = ` = 2 and k = m = 3, and then try

i = 1, j = m = 2 and k = ` = 3.

In fact, this can be extended to any dimension. In n+ 1 dimensions, we write

εii1···inεij1···jn =

∣∣∣∣∣∣∣∣∣∣δi1j1 δi1j2 · · · δi1jnδi2j1 δi2j2 · · · δi2jn

......

. . ....

δinj1 δinj2 · · · δinjn

∣∣∣∣∣∣∣∣∣∣, (1.11)

where the vertical lines surrounding the matrix means “take the determinant”.

The identity (1.7) will allow you to deal with situations involving multiple cross prod-

ucts. A number of very important vector identities can be proven using these formulas.

For example, let us prove the “BAC-CAB” rule:

A× (B×C) = B(A ·C)−C(A ·B). (1.12)

Of course, you could prove this identity component by component by literally expanding

both sides out completely. It will take some time and is tedious, but should be pretty

straightforward. On the other hand, it is quite easy to prove in our new notation. Define

D = B×C = εijkxiBjCk. (1.13)

Of course, as in (1.2), we can write D in terms of its components:

D = Dixi. (1.14)

– 3 –

Comparing (1.13) with (1.14) gives us the components of D in terms of the components of

B and C:

Di = εijkBjCk. (1.15)

Getting back to the left hand side of (1.12), we have

A× (B×C) = A×D = ε`mixÀmDi. (1.16)

Notice that I have used `,m, i instead of the customary i, j, k. These are just dummy

indices, so it does not matter what you call them. The reason why I have written i last is

because that index is the same as the index on D and I have already written Di in (1.15),

so might as well keep that index as i. The reason why I have used ` and m instead of j and

k is because j and k already appear in (1.15), so I don’t want to use them again or else

I will get confused as to which pair of i’s are supposed to be summed over and, similarly,

which pairs of j’s are supposed to be summed over.

Plugging (1.15) into (1.16) gives

A× (B×C) = ε`mixÀmεijkBjCk = εijkε`mixÀmBjCk. (1.17)

Now, we see why (1.10) might be useful since we have a product of two Levi-Civita symbols

here with a pair of indices being summed over, namely the index i. However, we have a bit

of a problem: in (1.10) the index being summed over, namely i, is the first index of both

Levi-Civita symbols. In (1.17) it is the first index in one Levi-Civita symbol, but the last

in the other. No matter: we can always cyclically permute the indices of the Levi-Civita

symbol without changing it:

ε`mi = εi`m = εmi`. (1.18)

Think about it: this is just like rotating our clock 3-hour clock in the clockwise direction.

That doesn’t change anything. On the other hand, if you switch any two indices, you get

a minus sign:

ε`mi = −εìm = −εim` = −εmì. (1.19)

In any case, we can safely replace ε`mi with εi`m in (1.17). The result is

A× (B×C) = εijkεi`mxÀmBjCk. (1.20)

Now, we can use (1.10) directly:

A× (B×C) = (δj`δkm − δjmδk`)xÀmBjCk= xjAkBjCk − xkAjBjCk

= (Bjxj)(AkCk)− (Ckxk)(AjBj)

= B(A ·C)−C(A ·B). (1.21)

That’s the “BAC-CAB” rule proven!

There are some vector product identities in the back of the front cover of Griffiths’

E&M textbook. It would be great if you can try to prove some of those using these same

techniques. Also, to apply the above result, try to prove the following derivative identity:

∇× (∇×A) = ∇(∇ ·A)−∇2A, (1.22)

– 4 –

where ∇ is the gradient operator

∇ ≡ xi∂i ≡ xi∂

∂xi, (1.23)

and ∇2 is the Laplacian,

∇2 = ∇ · ∇ = ∂i∂i =3∑i=1

∂2

∂x2i

. (1.24)

Remember the names of these derivatives: ∇f is the gradient of a function f , and ∇ ·Ais the divergence of the vector function A, and ∇×A is the curl. The identity (1.22) will

be useful in deriving the wave equation that electromagnetic waves satisfy from Maxwell’s

equations.

2. The Wave Equation

We will now derive the wave equation satisfied by electromagnetic waves traveling through

vacuum. Maxwell’s equations in vacuum read

∇ ·E = 0, ∇×E = −B,

∇ ·B = 0, ∇×B = 1c2

E.(2.1)

To compare these to equations involving µ0 and ε0, remember that µ0ε0 = 1c2

.

Using the two equations above involving the curl, we find

∇× (∇×E) = ∇× (−B) = −∂t(∇×B) = −∂t(

1c2

E)

= − 1c2

E. (2.2)

Note that in the second step, I swapped the order of the curl and the time derivative. That

is perfectly fine - you are free to take derivatives in which order you please.

On the other hand, we can also use the vector derivative identity (1.22) applied to E

along with Maxwell’s equation ∇ ·E = 0:

∇× (∇×E) = ∇(∇ ·E)−∇2E = −∇2E. (2.3)

Setting (2.2) and (2.3) equal to each other derives the wave equation:

− 1c2

E = −∇2E =⇒ �E = 0. (2.4)

The differential operator, �, often called the d’Alembertian (or just “box”), is defined as

� ≡ − 1

c2

∂2

∂t2+∇2. (2.5)

The same derivation shows that B satisfies the exact same wave equation,

�B = 0. (2.6)

In fact, you see this type of wave equation all over the place where you expect to see waves

of some sort - the wave equation is not special to electromagnetic waves. The things that

– 5 –

change from situation to situation are the quantity that satisfies the wave equation and

propagation speed. For electromagnetic waves in vacuum, the electric and magnetic fields

satisfy the wave equation and the speed c is the speed of light in vacuum. For sound in

air, c would be the speed of sound in air and the quantity satisfying the wave equation

would be the displacement of air molecules along the direction of propagation of the sound

wave relative to their equilibrium position. For transverse waves on a string, c would be

the speed of those particular waves, and the quantity satisfying the wave equation is the

displacement of the string up or down transverse to the direction in which it is stretched.

The propagation speed is related to properties of the material through which the wave

is propagating. For electromagnetic waves in vacuum, the speed is related to the magnetic

permeability, µ0, and electric permittivity, ε0, of vacuum via c = 1/√µ0ε0. In fact, the

discovery of this relationship between the speed of light and the electromagnetic properties

of vacuum led Maxwell to the discovery that light is an electromagnetic wave.

Let us assume for simplicity that air is a diatomic ideal gas. Let Ψ(t, x) be the dis-

placement relative to equilibrium of air molecules as a function of time along the direction

x, which is the direction of propagation of the sound wave. Then, one can show that Ψ

satisfies the wave equation

�Ψ =

(− 1

c2

∂2

∂t2+

∂2

∂x2

)Ψ(t, x) = 0, (2.7)

where the speed of sound in air, c, is related to the temperature, T , of the air and the

mass, m, of the air molecule via

c =

√7kT

5m. (2.8)

Here k is Boltzmann’s constant.

For transverse waves on a string, let Ψ(t, x) be the up and down (transverse) displace-

ment of a point at position x along the string as a function of time. One can show that

Ψ(t, x) satisfies the exact same wave equation as (2.7), but where the propagation speed is

related to the tension (force per unit length), T , in the string and the mass per unit length,

µ, of the string via

c =

√T

µ. (2.9)

Aside: You may have learned Maxwell’s equations in integral form and with charges and

currents: ∮E · da =

Qenc

ε0, (2.10a)∮

B · da = 0, (2.10b)∮E · d` = −ΦB, (2.10c)∮B · d` = µ0Ienc +

1

c2ΦE . (2.10d)

– 6 –

The first two integrals are done over a closed surface, which is the boundary enclosing

some volume of space. Then, Qenc is the charge inside that volume of space. The last two

integrals are done over a closed loop, which is the boundary of some open surface. Then,

ΦE and ΦB are the electric and magnetic fluxes through that open surface (i.e., the surface

integral of the electric and magnetic fields over that open surface), and Ienc is the current

piercing that open surface. Let ρ and J be the volume charge and current densities. The

volume integral of ρ gives the total charge inside that volume, and the surface integral of

J gives the total current piercing that surface. Denote a volume of space by V and the

closed surface (or collection of closed surfaces) which is its boundary by ∂V . Denote an

open surface by S and the closed loop (or collection of closed loops) which is its boundary

by ∂S. Then, we can write Maxwell’s equations as∮∂V

E · da =

∫V

ρ

ε0d3x, (2.11a)∮

∂VB · da = 0, (2.11b)∮

∂SE · d` = −

∫S

B · da, (2.11c)∮∂S

B · d` =

∫S

(µ0J +

1

c2E

)· da. (2.11d)

Now, we can make use of the divergence theorem and Stokes’ theorem,∮∂V

E · da =

∫V

(∇ ·E) d3x,

∮∂S

E · d` =

∫S

(∇×E) · da, (2.12)

to write Maxwell’s equations as∫V

(∇ ·E) d3x =

∫V

ρ

ε0d3x, (2.13a)∫

V(∇ ·B) d3x = 0, (2.13b)∫

S(∇×E) · da = −

∫S

B · da, (2.13c)∫S

(∇×B) · da =

∫S

(µ0J +

1

c2E

)· da. (2.13d)

Since these equations hold for arbitrary V and S, we must have

∇ ·E =ρ

ε0, ∇×E = −B,

∇ ·B = 0, ∇×B = µ0J + 1c2

E.(2.14)

These are Maxwell’s equations in differential form. Then, Maxwell’s equations in vacuum

are simply these without any charges or currents: ρ = 0 and J = 0. By the way, you should

now be able to derive

�E = µ0J + 1ε0∇ρ, �B = µ0∇× J. (2.15)

Indeed, these reduce to the wave equation in vacuum when we set ρ = 0 and J = 0.

– 7 –

3. Solving the Wave Equation

For simplicity, let us try to solve the one-dimensional wave equation first. Then we can

generalize our solution to three dimensions. One-dimensional wave equations look like

(2.7), where there is only one spatial dimension, which we have called x. This equation

is often described as a “linear differential equation”. That may strike you as peculiar; if

anything, the equation looks quadratic. People often get around this confusion by saying

that the more “advanced” meaning of the word “linear” differential equation is that the

sum of two arbitrary solutions to the equation is itself a solution. Even though there is

nothing wrong with that statement, there is no real need for such a redefinition. If a

differential equation is truly linear then there must exist a change of variables that really

truly makes the equation look linear. For the one-dimensional wave equation, the change

of variables is

τ ≡ 12(x+ ct), σ ≡ 1

2(x− ct). (3.1)

One can write the derivatives with respect to the old variables in terms of the new ones.

Define ∂t = ∂∂t and ∂x = ∂

∂x as well as ∂τ = ∂∂τ and ∂σ = ∂

∂σ . Then,

1

c∂t =

1

c

∂τ

∂t∂τ +

1

c

∂σ

∂t∂σ =

∂τ − ∂σ2

, (3.2a)

∂x =∂τ

∂x∂τ +

∂σ

∂x∂σ =

∂τ + ∂σ2

. (3.2b)

We could also invert these relations:

∂τ = ∂x + 1c∂t, ∂σ = ∂x − 1

c∂t. (3.3)

Therefore, the one-dimensional d’Alembertian can be written as

− 1c2∂2t + ∂2

x = (∂x + 1c∂t)(∂x −

1c∂t) = ∂τ∂σ. (3.4)

So, we see that the one-dimensional wave equation (2.7), can be written as

∂τ∂σΨ = 0. (3.5)

Now it is clear that this is a linear differential equation - it is linear in τ and σ separately!

It is also very easy to solve now - either ∂τΨ = 0 or ∂σΨ = 0. Of course, you could have

both, but that just means Ψ is a constant, which is uninteresting, and does not actually

describe a traveling wave. In other words, Ψ can be an arbitrary function of τ , as long as

it does not depend at all on σ, or Ψ can be an arbitrary function of σ, as long as it does

not depend at all on τ . In general, Ψ can be a sum of these two things. Thus, the most

general solution can be written as

Ψ(t, x) = ΨL(τ) + ΨR(σ). (3.6)

The part of Ψ which just depends on τ is called “left-moving” (hence the subscript L) and

the part which just depends on σ is called “right-moving” (hence the subscript R).

– 8 –

An interesting set of solutions are called plane wave solutions:

ΨL(τ) = A sin(2kτ + ϕ), ΨR(σ) = A sin(2kσ + ϕ). (3.7)

Here, |A| is the constant amplitude, k the constant wave number, and ϕ the constant phase

shift. I do not mean here that the left-moving part and the right-moving part of the general

solution have to have the same amplitude, wave number and phase shift - that certainly

need not be the case! If you want, you can put a subscript L or R on A, k and ϕ. I haven’t

done so because it will clutter these expressions unnecessarily. In general, one can treat

the left- and right-moving parts completely independently of each other.

As far as solving the wave equation is concerned, there is absolutely nothing special

about the sine function. The utility of these “plane wave” solutions is that any arbitrary

solution to the wave equation can be written as some sort of superposition of plane wave

solutions. Therefore, we don’t actually lose any generality by focusing only on these special

solutions.

In terms of the old time and position variables, (3.7) reads

ΨL(t, x) = A sin(kx+ ωt+ ϕ), ΨR(t, x) = A sin(kx− ωt+ ϕ), (3.8)

where

ω = kc. (3.9)

The wave number, k, is related to the wavelength via k = 2πλ . The angular frequency is ω,

which is related to the frequency, ν, via ω = 2πν. The relation (3.9), which relates ω and

k, is in general called a dispersion relation.

We could also have used cosines instead, but cosines and sines are related by a π/2

phase shift. So, with an arbitrary phase shift, ϕ, including cosines would be redundant.

Actually, when we increase the number of space dimensions from one, it becomes more

convenient to fix the sign of the ωt term to be negative and allow k to be positive or

negative:

Ψ(t, x) = A sin(kx− ωt+ ϕ). (3.10)

With this convention, we should write |k| = 2πλ and ω = |k|c because it is possible for k

to be negative. If k is positive, then this wave is propagating in the +x direction (right-

moving) and if k is negative, then it is moving in the −x direction (left-moving). For

example, below, I have graphed the Gaussian pulse function e−k2(x+ct)2 over x and for

various values of time from t = 0, 4, 8, 12. I have set k = 1 and c = 1 just for convenience.

You can see that the Gaussian pulse moves to the left over time. Indeed, e−k2(x+ct)2 is a

function of x+ ct, or of τ , which is the so-called left-moving coordinate, but not of x− ct,or of σ, which is the so-called right-moving coordinate.

-15 -10 -5 0

0.2

0.4

0.6

0.8

1.0

– 9 –

-15 -10 -5 0

0.2

0.4

0.6

0.8

1.0

-15 -10 -5 0

0.2

0.4

0.6

0.8

1.0

-15 -10 -5 0

0.2

0.4

0.6

0.8

1.0

The generalization of (3.10) to higher dimensions is

Ψ(t,x) = A sin(k · x− ωt+ ϕ), (3.11)

where the wavenumber, k, becomes a wavevector, k and x = (x, y, z) = xx = yy+ zz. The

direction of k, which is k ≡ k|k| , is the direction of propagation of the wave.

The cosine and sine functions may be written in terms of complex exponentials:

cosx =eix + e−ix

2, sinx =

eix − e−ix

2i. (3.12)

The inverse relationships are

e±ix = cosx± i sinx. (3.13)

Therefore, instead of considering plane waves of the form (3.11), it is often computationally

simpler to consider the form

Ψ(t,x) = Aei(k·x−ωt+ϕ). (3.14)

In this form, derivatives of plane waves simply turn into multiplication:

Ψ = −iωΨ, ∇Ψ = ikΨ. (3.15)

3.1. Electromagnetic Plane Waves

Let us consider plane wave solutions to the wave equation satisfied by E and B:

E = E0 ei(k·x−ωt+ϕ), B = B0 e

i(k′·x−ω′t+ϕ′). (3.16)

Here, |E0| and |B0| are the amplitudes of the electric and magnetic fields, respectively.

Since the fields separately satisfy the wave equation, at this point, there is no need for

their wavevectors, frequencies and phase shifts to be the same. That is why we have put

– 10 –

primes on those quantities in the magnetic field. However, Maxwell’s equations imply that

they are actually the same. Mawell’s equations read

k ·E0 ei(k·x−ωt+ϕ) = 0, (3.17a)

k′ ·B0 ei(k′·x−ω′t+ϕ′) = 0, (3.17b)

k×E0 ei(k·x−ωt+ϕ) = ω′B0 e

i(k′·x−ω′t+ϕ′), (3.17c)

k′ ×B0 ei(k′·x−ω′t+ϕ′) = − ω

c2E0 e

i(k·x−ωt+ϕ). (3.17d)

These equations must hold for all t and x. Either one of the last two equations immediately

implies that

k′ = k, ω′ = ω, ϕ′ = ϕ. (3.18)

Now, we can write Maxwell’s equations more simply as

k ·E = 0, k×E = ωB,

k ·B = 0, k×B = − ωc2

E.(3.19)

We have already noted that the wavevector, k, points in the direction of propagation of

the wave. Maxwell’s equations in the above form say that E and B are perpendicular to k.

Thus, electromagnetic waves are transverse waves. In addition, E and B are perpendicular

to each other with directions related via

E×B ∝ k, (3.20)

and magnitudes related via

|k||E| = ω|B| =⇒ |E| = c|B|. (3.21)

4. Poynting Vector and Flux

Suppose you had a pipe with cross-sectional area, A, and through which water of mass

density, ρ, is flowing at a speed, v. A reasonable question to ask would be “How much

water (i.e. mass) is passing through the pipe in a given amount of time?” If we divide

this quantity by the cross-sectional area through which the water flows, then we get the

flux (of water mass) with units of massarea·time . Well, in a given time, ∆t, the water travels a

distance ∆x = v∆t. This means that all the water a distance less than or equal to ∆x to

the left of a particular point along the pipe will pass through that point in the given time

∆t. The volume of this region is A∆x = Av∆t and so the mass of water contained in this

region is ρAv∆t. This is the total mass that passes through the area A in a time interval

∆t. Therefore, the flux is just this divided by the area, A, and the time interval, ∆t:

Φ = ρv. (4.1)

Let us make a formal analogy in the case of electromagnetic waves. In this case, we would

like to measure the energy flux (how much energy flows per unit area per unit time). For

– 11 –

the water example, when we wanted mass flux, we multiplied mass density by speed, as in

Eqn. (4.1). Therefore, if we want the energy flux, we need to multiply the energy density

by the speed. The speed of the electromagnetic wave is just c. The electric and magnetic

fields carry energy density given by

uE = 12ε0|E|

2, uB = 12µ0|B|2. (4.2)

The total energy density is just the sum of these two. Using Eqn. (3.21), we may write

u ≡ uE + uB = 1µ0c|E||B|. (4.3)

Therefore, the energy flux is

Φ = 1µ0c|E||B|c = 1

µ0|E||B|. (4.4)

Since E and B are perpendicular to each other, we could write Φ = 1µ0|E ×B|. Since we

also know that E×B ∝ k, which is the direction of propagation of the wave, we define the

Poynting vector,

S ≡ 1µ0

E×B, (4.5)

whose magnitude is simply the energy flux and whose direction is the propagation direction.

Since the momentum and energy of light are related via p = E/c, the momentum flux

is

P ≡ S/c. (4.6)

The magnitude, P, of this is the rate at which momentum is passing through some area,

per unit area. If it is perfectly absorbed by a surface, then it is equal to the radiation

pressure exerted on that surface. If it is perfectly reflected back by a surface, then the

radiation pressure exerted on the surface is twice as big.

4.1. Red Laser Pointer

The output of a red laser pointer (λ = 635 nm) has a beam power of 10.0 mW and a beam

diameter of 1.00 mm. It is propagating in vacuum in the +x direction and is polarized

in the y direction. Write down an expression for the electric and magnetic fields in the

beam and the Poynting vector as a function of time and position. If this beam illuminates

a surface that absorbs 40% and reflects 60%, find the net force on the surface due to ra-

diation pressure. [Assume uniform irradiance across the beam’s cross-section. Note: the

polarization is the direction of the electric field with +y and −y directions being counted

as the same.]

SOLUTION:

The wavevector is k = 2πλ x ≈ (107 rad/m)x. It is in the x direction because that is

the direction of propagation. Therefore, k · x =(107 rad/m

)x. The angular frequency is

related to the wavevector via ω = |k|c ≈ 3 × 1015 rad/s. Since the polarization is in the

y direction, the amplitude of the electric field is E0 = E0y. Now, E ×B ∝ k = x, which

– 12 –

implies that B ∝ z, or B0 = B0z. Since |B| = |E|/c, we can also write B0 = 1cE0z. The

amplitude of the Poynting vector is S0 = 1µ0

E0 ×B0 = 1µ0cE2

0 x = ε0cE20 x. As usal, since

E and B oscillate identically in time, the average Poynting vector is half of its amplitude:

S = 12ε0cE

20 x. The irradiance is simply the magnitude of the average Poynting vector:

I = |S| = 12ε0cE

20 . This is equal to the power per unit area:

1

2ε0cE

20 = I =

P

A=

P

πr2,

where r is the radius of the beam.

Solving for E0 and using ε0 = 8.9× 10−12 J ·V−2 ·m−1 gives

E0 =

√2P

πr2ε0c= 3.10× 103 V

m.

Note that r = 5× 10−4 m (half the diameter). We also therefore have

B0 =E0

c= 1.03× 10−5 T.

Here, T stands for Teslas, which is the metric unit for magnetic fields.

Therefore, the electric and magnetic fields are

E = (3.10× 103 V/m)y cos[(107 rad/m)x− (3× 1015 rad/s)t+ φ

]B = (1.03× 10−5 T)z cos

[(107 rad/m)x− (3× 1015 rad/s)t+ φ

]There is an arbitrary phase, φ, which we can dial to whatever we want depending on when

we choose t = 0 to be. Since φ is arbitrary, you could have used sines instead of cosines.

You could also have used the complex exponential form, if you wish, as long as you keep

in the back of your mind that the actual fields are the real parts or the imaginary parts

since the fields can’t be complex.

Recall that half of the amplitude of the Poynting vector is equal to the irradiance,

P/πr2 = 1.27× 104 W ·m−2. Thus,

S = (2.55× 104 W/m2)x cos2[(107 rad/m)x− (3× 1015 rad/s)t+ φ

].

The average of the cos2 term is just 1/2. The average momentum flux is

P = S/c = S0/2c = (4.24× 10−5 N/m2)x.

By momentum conservation, if this is absorbed by a surface, then this momentum is trans-

ferred to the surface. If it is perfectly reflected, then the momentum flux of the beam after

the reflection is −P . Conservation of momentum implies that 2P must be transferred to

the surface so that the total is still P . That is, the reflected case gives twice as much

pressure as the absorbed case. Therefore, the radiation pressure on the surface is

0.4|P |+ 0.6|2P | = 1.6|P | = 6.79× 10−5 N/m2.

The force would be this pressure multiplied by the beam area:

F = (6.79× 10−5 N/m2)[π × (5× 10−4 m)2] = 5.33× 10−11 N .

– 13 –

5. Ray Tracing Diagrams for Mirrors

Consider a concave mirror whose radius of curvature is 10.0 cm. Draw ray-tracing diagrams

when the object is

(a) real and sits 20.0 cm in front of the mirror;

(b) real and sits 7.0 cm in front of the mirror;

(c) real and sits 2.0 cm in front of the mirror;

(d) virtual and sits 10.0 cm behind the mirror.

I’ll leave it to you to check that these diagrams agree numerically with the results of the

formulae 1di

+ 1do

= 2R and m = − di

do.

– 14 –

6. Ray Tracing Diagrams for Lenses

Consider a lens whose focal length has magnitude 10.0 cm. Draw ray-tracing diagrams for

the following scenarios:

(a) The object is real and sits 20.0 cm in front of the converging lens;

(b) The object is real and sits 20.0 cm in front of the diverging lens;

(c) The object is virtual and sits 5.0 cm behind the converging lens;

(d) The object is virtual and sits 5.0 cm behind the diverging lens.

Again, I leave it to you to check that these diagrams agree with the results of the formulae1di

+ 1do

= 1f and m = − di

do.

– 15 –

7. Compound Optical Systems

Compound optical systems just have more than one lens and/or mirror, called optical

elements. Light goes from one optical element to the next. The image of optical element

1 becomes the object for optical element 2; the image of optical element 2 becomes the

object for optical element 3; and so on. It is only in this case that a virtual object can arise

because the image of the previous optical element may very well lie behind the following

optical element. Keep in mind that if the system contains mirrors, it is possible for one

physical lens or mirror to play the role of multiple optical elements. For example, if you

have a lens in front of and positioned parallel to a mirror, then light can come from an

object on one side of the lens, go through the lens, hit the mirror, bounce back, and then

go through the lens again! In this case, the lens plays the role of optical elements 1 and

3, while the mirror plays the role of optical element 2. In principle, you can imagine with

multiple mirrors, you can even have one physical mirror playing the role of infinitely many

optical elements. If you have ever been in a house of mirrors, you know well how this can

come about and what it’s like.

7.1. Two-Lens Problem

You have two lenses whose focal lengths have magnitude 10.0 cm, one converging and one

diverging. You want to place an object 20.0 cm in front of the first lens in such a way as

to produce a final image which is real, upright and twice as large as the object. Where can

you place the lenses in order to do this, and must you place the original object right side

up or up side down? Where is the final image?

SOLUTION:

Part (d) of the previous section produces a linear magnification of 2. Part (a) of the

previous section produces a linear magnification of −1. Combined appropriately, these

could produce a linear magnification of −2. We want the image to be upright. With a

negative linear magnification, the object would have to be up side down if we are going to

combine parts (a) and (d) to achieve the objective.

So, the first lens will be the converging one, and the object sits 20.0 cm to the left of

this lens and up side down. The image of this lens sits 20.0 cm to the right of this lens, is

right side up and equal in size with the object.

This image becomes the object for the second lens. According to part (d), we want

this object to be virtual and 5.0 cm to the right of the diverging lens. Therefore, place the

diverging lens 15.0 cm to the right of the converging lens. The final image will be 10.0 cm

to the right of the diverging lens, which is 25.0 cm to the right of the converging lens, or

45.0 cm to the right of the original object.

– 16 –

The black ray goes through the vertex of the first lens. It heads towards the would-be

image of the first lens, but hits the second lens first and gets bent upwards towards the

final image. It is not one of the rays with simple rules for the second lens, but nevertheless

we know that it must end up at the final image.

The blue ray goes parallel to the axis first, hits the first lens, gets bent towards the

focal point of the first lens. It heads towards the would-be image of the first lens, but hits

the second lens first and gets bent upwards towards the final image. Again, this is not one

of the rays with simple rules for the second lens, but nevertheless we know that it must

end up at the final image.

The red ray goes through the secondary focal point of the first lens, hits the first lens

and comes out parallel to the axis. It heads towards the would-be image of the first lens,

but hits the second lens first and gets bent upwards towards the final image. This is one of

the rays with simple rules for the second lens: the red ray on the right of the second lens

looks like it went through the focal point (the square dot) of the second lens.

The cyan and green rays are the other two with simple rules for the second lens. We

have just continued them backwards since we know they must originate from the object.

The green ray corresponds to the black ray in part (d) of the previous section and the cyan

ray corresponds to the red ray in part (d) of the previous section.

7.2. Two-Lens Demonstration

The previous problem is a warm-up to begin understanding the “demonstration” that I

brought in to section involving one converging and one diverging lens. The converging lens

will be “Lens 1” and the diverging lens will be “Lens 2”. The focal length of the converging

lens is f1 = +15.5 cm and the focal length of the diverging lens is f2 = −15.0 cm. I asked

you to hold the converging lens at arm’s length away from you and look at some object

somewhat far away across the room. What you found was that the image that you see

is smaller than and up-side-down relative to the original object. You can easily see this

with a ray-tracing diagram. Consider what happens to the diagram in part (a) of Section

6 when you move the object further and further to the left of the lens. The path of the

blue ray does not change! However, for example, the black ray gets closer and closer to the

horizontal axis. Therefore, where the black and blue rays intersect approaches the primary

focal point from the right side. The image remains up-side-down relative to the object and

– 17 –

it gets smaller and smaller. I had you hold the converging lens at arms length because the

image of that lens becomes the object for the lens of your eye. Therefore, it is as if your eye

is looking at a small up-side-down object whose distance in front of you is roughly equal

to your arm length minus a bit more than the focal length of the converging lens. If you

were to hold the converging lens too close to your eye, the image it produces would be very

close to your eye and your eye would have to strain as much as it does when you try to

look at any object very close to your eye.

The other reason why I had you hold the converging lens at arm’s length is that I then

wanted you to place the diverging lens very close to the converging lens but closer to you.

The main difference between this set-up and the one in the previous two-lens problem is

that the image of the first lens is a bit further way from the second lens than the secondary

focal point of the second lens, whereas in the previous problem, the image of the first lens is

within the secondary focal point of the second lens. At this point, you know see an upright

image that is maybe a little larger than the original object and certainly much larger than

the image you saw with just the converging lens.

Then, I asked you to keep the converging lens fixed, but very slowly move the diverging

lens closer and closer towards your eye. You described the image you saw as getting larger

and larger. Then, at some point the image gets smaller and smaller and is up-side-down.

How can we make sense of this phenomenon?

Let’s do it mathematically first. Let the original object be “Object 1” and let it have

distance do1 relative to the converging lens, which is “Lens 1”. I asked you to look at an

object “far away”. But, what does that mean? Far away compared to what? The object

distance is a DISTANCE; it has units, namely meters. It can’t just be “big”, it has to be

big compared to something. In this case, we want it to be big relative to the focal length

of Lens 1. That is, we want do1 � f1. Therefore, it makes sense to define the ratio

ao1 ≡do1

f1. (7.1)

This number is positive because the original object is real (and so do1 > 0) and the first

lens is converging (and so f1 > 0).

A far-away object means large ao1, or ao1 � 1. I can now literally say “large ao1” with

impunity because ao1 has no units; it’s just a number. Similarly, define

ai1 ≡di1

f1. (7.2)

Then, the lens equation for the first lens reads

1

do1+

1

di1=

1

f1=⇒ 1

ao1f1+

1

ai1f1=

1

f1=⇒ 1

ao1+

1

ai1= 1. (7.3)

Solving for ai1 gives

ai1 =1

1− 1ao1

. (7.4)

Now, 1ao1� 1 since ao1 � 1. Therefore, we can Taylor expand the above result:

ai1 = 1 +1

ao1+

(1

ao1

)2

+ · · · . (7.5)

– 18 –

Since di1 = ai1f1, we see that the image produced by the converging lens is real (that is,

di1 > 0) and the image is located just a bit further away from the lens than one focal

length. The further away the original object is, the bigger ao1 is, the closer ai1 gets to 1

(from above), the closer the image of the first lens gets to its focal point.

To see that the image is up-side-down and small, calculate the transverse magnification:

m1 = − di1

do1= − ai1f1

ao1f1= − ai1

ao1= −

1ao1

1− 1ao1

= − 1

ao1−(

1

ao1

)2

− · · · . (7.6)

Indeed, m1 is negative and small if ao1 is large.

Now, we place the diverging Lens 2, with focal length f2 < 0, after Lens. We will write

f2 as −|f2| instead so that we never forget that it is actually negative! Note that the image

of Lens 1 may be behind Lens 2 (i.e., on the opposite side of Lens 2 as the side from which

the light is coming towards Lens 2). Define

δ ≡ distance between Lens 1 and Lens 2

f1. (7.7)

At the start of the demo, δ is small. Then,

do2 = δf1 − di1 = (δ − ai1)f1. (7.8)

Note that if δ < ai1, then do2 < 0, but if δ > ai1, then do2 > 0. That is, if the distance

between the two lenses is less than the distance of the image of Lens 1 from Lens 1, then

image 1 is a virtual object 2 for the second lens. However, if the distance between the two

lenses is greater than the distance of the image of Lens 1 from Lens 1, then image 1 is a

real object 2 for the second lens.

Again, define

ao2 ≡do2

|f2|= (δ − ai1)

f1

|f2|, ai2 ≡

di2

|f2|. (7.9)

A priori, we do not yet know whether the final image of Lens 2 is real or virtual. Therefore,

ai2 may be positive, in which case the image is real, or negative, in which case the image

is virtual. Then, the lens equation for the Lens 2 reads

1

do2+

1

di2=

1

f2=⇒ 1

ao2|f2|+

1

ai2|f2|=

1

−|f2|

=⇒ 1

ao2+

1

ai2= −1. (7.10)

Solving for ai2 gives

ai2 = − 1

1 + 1ao2

= − 1

1 + 1(δ− 1

1− 1ao1

)f1|f2|

. (7.11)

At this point, let us plug in some appropriate numbers,

f1 = 15.5cm, |f2| = 15.0cm, ao1 ≈ 50. (7.12)

– 19 –

The value ao1 = 50 corresponds to an object 50× 15.5 cm = 7.75 m away from Lens 1.

Then, (7.11) becomes

ai2 =1.02− δδ − 0.053

=15.8 cm− δf1

δf1 − 0.82 cm, (7.13)

where we multiplied numerator and denominator by f1 = 15.5 cm to get the final expression.

When δf1 < 0.82 cm, meaning that the two lenses are less than 0.82 cm apart, we have

ai2 < 0, which means that the final image of the two-lens system is virtual. Also, ai2 has a

fairly large magnitude, starting out as about −19.4 at δ = 0 (when the lenses are right on

top of each other) and going to −∞ as we increase the separation of the two lenses towards

0.82 cm. If we separate the lenses a both further, then ai2 becomes huge and positive;

this is a real image now very far away from the lenses. As we increase the separation, ai2

decreases until we reach a separation of 15.8 cm, at which point the image is technically

exactly at the location of the second lens. If we increase the separation even further, the

image becomes virtual again since ai2 becomes negative again.

The transverse magnification of Lens 2 is

m2 = − di2

do2= − ai2

ao2=

1

1 +(δ − 1

1− 1ao1

) f1|f2|

. (7.14)

The total transverse magnification is

m = m1m2 = −1ao1

1− 1ao1

+[(

1− 1ao1

)δ − 1

] f1|f2|

. (7.15)

Again, plugging in the numbers gives

m =0.020

0.053− δ=

0.31 cm

0.82 cm− δf1. (7.16)

Indeed, this agrees with our previous description of our observations. When the lenses are

very close together (δ ≈ 0), the image is upright (m > 0). As we increase the separation

towards 0.82 cm, the image gets bigger and bigger (m increases). Just beyond 0.82 cm

separation, m becomes huge and negative, and then m remains negative and approaches

zero as the separation is increased further.

To summarize, when the lenses are very close together, the image is virtual and pretty

far away in front of you. As you increase the separation of the lenses by moving Lens 2

towards you, the image grows and appears to move further away from you. As you cross

0.82 cm of separation, the image goes from infinitely large and upright infinitely far in

front of you to infinitely large and up-side-down infinitely far behind you. As you increase

the separation, the image remains up-side-down but gets smaller and smaller in size. At

a separation of 15.8 cm, the image goes from being real to virtual, but the still keeps on

getting smaller and remains up-side-down.

As a side note, an image in front of you is a real object for the lens of your eye. An

image behind you is a virtual object for the lens of your eye. While you cannot see a real

object behind you, it may be possible to see a virtual object behind you. A virtual object

behind you is nothing more than the image of optical elements in front of you. Please make

sure you understand this; ask us about it if this is unclear.

– 20 –

Now, let us try to understand this phenomenon using ray tracing diagrams. We have

already discussed what the diagram looks like for the converging lens, Lens 1. An up-side-

down and small image is formed a bit further away from Lens 1 than one focal length.

We draw this image as dashed because it is a virtual object for the diverging lens, Lens 2.

Initially, this virtual object is a bit further away from Lens 2 than one focal length. The

diagram might look like

As the lens is moved further to the right, the diagram might look like

Indeed, the image is upright further to the left and bigger. The image keeps moving further

to the left and increases in size until the virtual object sits right at the secondary focal

point of the lens. At this point, the outgoing light rays are exactly parallel and therefore

look like they are coming from a very large image infinitely far to the left:

Notice that the red light ray doesn’t really change as the lens approaches the virtual object.

The black ray just gets steeper and steeper. This continues as we move the lens even further

to the right. It is clear to see that that means that the black and red rays will actually

converge to the right of the lens, at real and up-side-down image. This image is at first

very large and very far to the right, but gets smaller and smaller and closer and closer to

the lens as we move the lens to the right. The second “funky” point that we discovered

earlier, where the image again changes from being real to virtual is the point when Lens 2

passes the image formed by Lens 1 and the second object goes from being virtual to real.

– 21 –

8. Midterm 1 Quiz

(1) An electromagnetic wave is propagating in the +z direction. At some time and point in

space, the electric field points in the y direction. In which direction does the magnetic

field point?

Answer: B ∝ −x at this point in space and at this time.

(2) Why are electromagnetic plane waves called “plane waves”? Explain with a drawing.

Answer: The electric field are identical (as are the magnetic field) at points on the

same plane perpendicular to the direction of motion of the plane wave.

(3) A cube of index of refraction n sits in air. A light ray inside the cube hits one face

and gets totally internally reflected. It then hits an adjacent face and also gets totally

internally reflected. Calculate the minimum possible value of n.

Answer: If θ is the angle of incidence on the first face, then θ′ = π2 − θ is the

angle of incidence on the second face. We need both θ and θ′ to be greater than or

equal to the critical angle, sin−1(1/n). Thus, sin θ ≥ 1n and sin θ′ ≥ 1

n . However,

sin θ′ = cos θ =√

1− sin2 θ ≤√

1−(

1n

)2. Therefore, 1

n ≤√

1−(

1n

)2. We can square

both sides of the inequality without changing the direction of the inequality since both

sides are manifestly positive:(

1n

)2 ≤ 1−(

1n

)2or 1

n ≤1√2. Inverting gives n ≥

√2 ≈ 1.4.

(4) If a lens is cut in half through a plane perpendicular to its surface, does it show only

half an image?

Answer: It still shows a full image, just a dimmer one.

(5) If your near-point distance is N , how close can you stand to a mirror and still be able

to focus on your image?

Answer: The image is virtual and is the same distance behind the mirror as you are

in front of the mirror. Therefore, you should stand no closer than N/2 from the mirror.

(6) When you open your eyes underwater, everything looks blurry. Explain.

Answer: Your eyes have an index of refraction roughly equal to that of water. There-

fore, if submerged in water, they cannot refract light and will not be able to focus light

rays to form real images on the retina.

(7) Would you benefit more from a magnifying glass if your near-point distance is 25 cm

or if it is 15 cm? Explain.

Answer: The angular magnification of a magnifying glass is M = N/f , where N is

the near point and f is the focal length of the lens. Therefore, the larger your near

point is, the more you can benefit from the magnifying glass.

(8) When you use a simple magnifying glass, does it matter whether you hold the object

to be examined closer to the lens than its focal length or farther away? Explain.

– 22 –

Answer: Yes it matters crucially. You must keep the object just within one focal

length of the lens in order to produce a large virtual image very far in front of your

eyes. If the object is beyond one focal length from the lens then a real image is produced

on your side of the lens likely behind you and therefore you will not be able to see it

clearly.

(9) Is the final image produced by a telescope real or virtual? Explain.

Answer: Virtual and far away in front of you.

(10) Two people are stranded on a deserted island. Both people wear glasses, though one

is nearsighted and the other is farsighted. Which person’s glasses should be used to

focus the rays of the Sun and start a fire? Explain.

Answer: Whoever has the converging lenses. A far-sighted person is able to converge

parallel light rays (coming from faraway objects) just fine, but is unable to focus

diverging light rays (from nearby objects) strongly enough to form a clear image at

the retina. Therefore, the far-sighted person needs converging lenses to help “beef up”

their eyes’ converging power. A near-sighted person has strongly converging eyes able

to converge the diverging light rays from nearby objects, but too strongly converges

parallel light rays from faraway objects. Therefore, the near-sighted person needs

diverging lenses to “handicap” their eyes’ converging power.

(11) You have two lenses: lens 1 with a focal length of 0.45 cm and lens 2 with a focal

length of 1.9 cm. If you construct a microscope with these lenses, which one should

you use as the objective? Explain.

Answer: You want the one with a shorter focal length to act as the eyepiece because

that acts as a magnifying glass on the image of the objective and because the angular

magnification of a magnifying glass is inversely proportional to its focal length. There-

fore, you want the 1.9 cm focal length lens to act as the objective lens and the 0.45 cm

lens to act as the eyepiece.

(12) Why is it restful to your eyes to gaze off into the distance?

Answer: I don’t know! But here’s some information on the matter. Most of the

refraction in your eye is actually performed by the cornea, which is a pocket of fluid at

the front of the eye which covers the lens and iris (Wikipedia says that 23 of the eye’s

refractive power comes from the cornea). As far as I know, nothing happens to the

cornea as our eyes adjust between looking at nearby and faraway things. They can be

reshaped temporarily or permanently, but by external methods. On the other hand,

the adjustments we make to clearly image objects at various distances are made to the

lens via ciliary muscles which are connected to the edge of the lens by tendons called

zonules. Muscles can only pull (i.e., contract). Pulling on the lens flattens it out and

reduces its converging power. This is what you want to do when looking at faraway

things since the light rays are reaching your eye basically parallel. Relaxing the ciliary

muscles a bit allows the lens to bulge a bit more at the center, which increases its

– 23 –

converging power. This is what you want to do when looking at nearby objects since

the light rays are diverging when they reach your eye. In fact, it happens that the

ciliary muscles are most contracted when looking at distant objects. So, why does

your eye feel relaxed when gazing off into the distance, which is precisely when your

ciliary muscles are most contracted? I don’t know! The best I can guess is that that’s

just the state that our eyes are used to. Also, a nonzero lever arm between the ciliary

muscles and the lens might account for this as well, but I don’t think there is one.

However, there is one thing that this helps us understand: the fact that our eyes

can only properly focus diverging light rays, not converging ones. To properly focus

already-converging light rays, we must decrease the converging power of our eye’s lens

even further compared to when we are looking at faraway things. Well that would

require the ciliary muscles to pull even harder on the lens. But, they can’t because

for some reason the eye is “designed” so that the ciliary muscles are most contracted

when looking at faraway things. In other words, we were not “designed” to see images

produced behind our eyes. I suppose that might make sense evolutionarily. I can’t

imagine an environmental stressor that would select that ability since lenses and such

are recent human inventions.

– 24 –

9. Interference

9.1. Laser Wavelength Measurement via Metal Ruler

Devise and explain a method for measuring the wavelength of a laser pointer chiefly using

a finely graded metal ruler (e.g. with 1/32 inch markings or smaller).

SOLUTION:

Consider reflecting the laser off of the ruler at a shallow angle. If there were no notches

on the surface of the ruler, then each point on the ruler where the light hits becomes a

source for outwardly spreading spherical waves. The superposition of these waves produces

wavefronts that travel in the direction of specular reflection. That is, on a far-away screen,

we get constructive interference only around the point of specular reflection, as expected.

Imagine we make wide notices with narrow reflective bands in between. Then, consider

the following diagram showing two adjacent light beams headed towards a far-away screen

(e.g. a wall) having reflected off of two adjacent reflective bands.

The optical path length difference is d(cosα − cosβ), which must be set equal to mλ for

some integer m for constructive interference. For a fixed α (angle at which we shine the

laser on the ruler), this gives discrete values for β where bright spots occur (i.e. we get a

diffraction pattern).

The claim is that we see the same thing if instead we have narrow non-reflective notches

with wider reflective bands. Can you think of why? Hint: superposition. This goes under

the name of Babinet’s principle, by the way.

Consider the following setup

This gives us an expression for the wavelength

λ =d

m

[ 1√1 + (s0/L)2

− 1√1 + (sm/L)2

].

– 25 –

If you make α very small and use only low orders, then we can assume that sm/L << 1:

λ ≈ d(s2m − s2

0)

2mL2.

As an example, when I did this experiment at home, I used the marks on the ruler that

were d = 0.5 mm apart and the distance to the wall was L = 105 cm. I found s0 = 10.5

cm and s1 = 11.9 cm. Assuming small angles, this gives

λ =(5× 10−4 m)[(11.9 cm)2 − (10.5 cm)2]

2× 1× (105 cm)2≈ 711 nm.

That’s not bad! The wavelength should be around 635 nm. Before we rejoice, however,

we should note that a millimeter difference in any sm makes a huge difference in the fi-

nal answer. For example, if I change s1 to 11.8 cm, I get λ = 657 nm! So, unless I can

measure sm and L with very high precision, the uncertainties are likely to swamp the final

measurement of λ anyway.

Note: In section, I claimed that if the laser light reflects off of a smooth metalic surface,

then we only get specular reflection. This is true only if the region on the surface that

is illuminated is much wider than the wavelength of the light. Well, our laser beam has

a width of a few millimeters, which will obviously do since its wavelength is on the order

of 10−4 mm! As calculated above, the extra optical path length travelled by one beam

relative to another that hits the surface a distance x to the left of it is x(cosα− cosβ). So,

the phase shift is φ = 2π xλ(cosα− cosβ). Let a be the width of the illuminated region and

let x run from −a/2 to a/2 with the “zero phase” corresponding to x = 0. The intensity

is proportional to

I ∝∣∣∣∫ a/2

−a/2e2πi x

λ(cosα−cosβ) dx

∣∣∣2 ∝ cos2A

A2,

where A = π aλ(cosα− cosβ).

In the limit a/λ << 1, the intensity becomes a delta function:

Ia/λ→0−−−−→ δ(A).

Thus, the intensity vanishes everywhere except when A = 0, or when cosα = cosβ, or

α = β, which is the condition for specular reflection!

– 26 –

10. Thin-Film Interference

A piece of paper is wedged between the ends of two sheets of glass. The setup is illuminated

at normal incided by cyan laser light (λ = 500 nm). Excluding the point where the two

glass sheets meet, you count about 400 dark interference fringes. Calculate the thickness

of the sheet of paper.

SOLUTION:

Let t be the thickness of the paper. Let ` be the length of the glass sheets. Let x be

the horizontal coordinate starting at 0 at the point where the two glass sheets meet and

increasing to the right up to `, the length of the glass sheets. Let t(x) be the thickness

of the air gap between the two glass sheets as a function of the coordinate x. By similar

triangles,t(x)

x=t

`=⇒ t(x) =

t

`x.

The two beams whose interference we care about are the ones shown below.

Of course, these rays are actually right on top of each other since the incidence is normal and

since the paper is presumably very very thin, any refraction at the interfaces is negligible.

Since ray 1 reflects off of a glass-air interface (high to low index of refraction), it does

NOT receive a π reflection phase shift. On the other hand, ray 2 does because it reflects

off of an air-glass interface (low to high index of refraction). Thus,

ϕref,1 = 0 and ϕref,2 = π =⇒ ∆ϕref = π.

We will set ϕpath,1 = 0 since the only difference in path between 1 and 2 is that 2 goes

through a thickness, t(x), twice whereas 1 does not. Thus,

ϕpath,1 = 0 and ϕpath,2 =2π

λ/nair2t(x) =

4t(x)π

λ=⇒ ∆ϕpath =

4t(x)π

λ.

Therefore,

∆ϕtot = ∆ϕpath + ∆ϕref =

(4t(x)

λ+ 1

)π.

For the interference to be destructive (dark fringe), we must have

∆ϕtot =

(4t(x)

λ+ 1

)π = (2m+ 1)π,

– 27 –

where m is an integer. Remember: odd numbers of π are destructive while even numbers

of π are constructive. Thus,

t(x) =t

`x =

mλ

2.

The maximum value of t(x) is t, which occurs when x = `. The maximum value of m is

400 according to the problem statement. Therefore,

t =mmaxλ

2=

400× 5× 10−7 m

2= 10−4 m = 0.1 mm .

11. Relativity

Newtonian mechanics is incorrect! Time is not just time is not just time! The passage of

time depends on your state of motion. How much time it takes to get from one point to the

next depends on the path you take. How big something is depends on its state of motion.

The order of events depends on the state of motion of the observer. All of these statements

might seem absurd, and they surely would do to most physicists before Einstein. However,

they are all direct consequences of two seemingly innocuous postulates often summarized

by the pithy statement that “the laws of physics are identical between all inertial reference

frames.” Actually, the statement that the speed of light is constant in all inertial reference

frames is already counter-intuitive because it implies that the speed of light is the same no

matter the state of motion of the source of that light. Certainly, the same cannot be said

of other projectiles like balls and bullets!

11.1. How to Measure the Length of a Moving Object

Before we explore some of the counter-intuitive consequences of the principle of special

relativity, we will need to know how to measure the length of a moving object. Suppose

an object (like a train) of unknown length is moving left-to-right relative to you and your

friends at some unknown speed. Can you devise a plan with your friends to measure the

length of the train?

Here is a method which some of you suggested in discussion section. You and one

of your friends synchronize clocks and decided to stand some fixed known distance apart

along the direction in which the train is moving such that the train reaches you before your

friend. You note the time when the front of the train passes you and when the back passes

you and your friend notes when the front of the train passes them. Then, you come back

together. You can determine the speed of the train via

speed of train =distance between you and your friend

time front passes friend− time front passes you. (11.1)

Then, you can determine the length of the train via

length of train = speed of train× (time back passes you− time front passes you). (11.2)

This requires you and your friend to synchronize your clocks. This is trickier than it may

seem at first. Remember that the passage of time depends on your state of motion and

– 28 –

certainly you and/or your friend have to be moving at some point if you first synchronize

your clocks when you are together and only then move apart! Even if you didn’t believe

in this relativity business, you must agree that your experimental method had better not

depend on your own prejudices, whether they be ultimately correct or incorrect.

Fortuitously, there is a simple way for you and your friend to synchronize clocks after

you are already apart and no longer moving relative to each other. You shine a light at

your friend at some time that you set to be “zero”. The light moves relative to either of

you at the same speed of about 3× 108 m/s. Since you know how far apart you are, when

your friend receives the light, he knows exactly how long ago your clock read “zero” and

he can set his clock appropriately.

Actually, we can line up you and millions of your friends (all with synchronized clocks)

along a line parallel and very close to the path of the train. Then, each one of you can

record the times when the front and back of the train pass you. If you then come together

at the end, you will find that the “front” times pair up between pairs of friends a distance

apart equal to the length of the train (as measured by you and your friends). The same

can be said of the “back times”. That is, you can pick one specific time and at that

time exactly one of you or your friends will have recorded the back passing them and one

will have recorded the front passing them. The distance between these two people is the

measured length of the train. This method allows us to measure the length of the train

without technically measuring the speed of the train first, even though the speed of the

train could easily be determined from this data as well. The following problem in the next

subsection will show why I prefer this method.

This discussion of synchronization of clocks brings up an interesting subtlety in state-

ments about relativity. Often questions are asked like “what do you observe?” or “what

does someone in such and such reference frame observe?” The “what” might be a time or

position or whatever else. Such statements are lazy and possibly misleading. Usually, what

is meant is not what is observed by one person at one instant in time, but rather when this

person can infer once he gathers time and position records from very many observers scat-

tered everywhere (technically, the limit of infinitely many such observers packed infinitely

closely). That is, you imagine a lattice of synchronized clocks everywhere in space and you

are asked what you can infer if you could take the readings from all of those clocks after

the process of interest is over. This way, you do not have to take into consideration the

finite amount of time it might take light from some event of interest to reach you, which

would drastically complicate matters.

So, for example, when you are asked what is the length of a moving object that is

observed by the person standing still, the question is really what would be measured by

the army of friends as described previously. The question is not asking what do you (the one

person standing still) actually see. That would be very different and far more complicated

because light from different points along the object take different amounts of time to reach

your eyes!

– 29 –

11.2. Relativistic Train

People at rest relative to a train measure the length of the train to be L0 (this is the

train’s so-called proper length). Alice stands at the back of the train, Bob at the front

and Charlie at the middle. They have synchronized their clocks relative to each other.

The train travels at a speed v relative to the platform where the Stationmaster stands.

These people all decide to set the origin of space and time to be when Charlie passes the

Stationmaster.

(a) At the moment C passes S, C turns on a lightbulb. What time will A and B read on

their clocks when they see the light?

(b) What is the length of the train as measured in the reference frame of S? (You are not

being asked to derive the result. Just take it as an assumption.)

(c) In the reference frame of S, at what time(s) does the light reach A and B. What does

this tell you about simultaneity?

(d) In the reference frame of S, what time(s) registers on the clocks of A and B when the

light reaches A and B, respectively? What does this tell you about the synchronization

of clocks.

(e) A and B hold up mirrors to reflect the light back to C. What time does C measure

when he sees the reflections? What time is measured in the S reference frame? What

does this tell you about the ticking rate of the clocks on the train as observed by the

S reference frame?

SOLUTION:

(a) Let S′ be the rest frame of the train, which is the same as the reference frames of A,

B and C. Time and space coordinates measured in this frame will likewise be primed.

S can also stand for the reference frame of the Stationmaster and coordinates in this

frame will be unprimed.

Event 0 is when C passes S and turns on the lightbulb. By agreement, the spacetime

coordinates of this event in either reference frame is identically zero:

(ct0, x0) = (ct′0, x′0) = (0, 0). (11.3)

Event 1 is when the light reaches A. Relative to A, the light must travel a distance

of L0/2. Therefore,

(ct′1, x′1) =

(L02 ,−

L02

). (11.4)

Event 2 is when the light reaches B. Similarly,

(ct′2, x′2) =

(L02 ,

L02

). (11.5)

That is, the light reaches A and B at the same time, L2c , as measured on the train.

– 30 –

(b) In reference frame S, the length of the train is contracted by a factor of γ:

L =L0

γ, where γ =

1√1− β2

and β =v

c. (11.6)

(c) In reference frame S, A actually moves towards the light that is headed towards her.

The speed of the light is still c (postulate 2). Therefore, the relative speed between A

and the light is c+ v = (1 + β)c. The distance that must be covered is not L0/2, but

L/2. Therefore, the time is

t1 =L/2

c+ v=⇒ ct1 =

L0/2

(1 + β)γ=

√1− β1 + β

L0

2. (11.7)

By the same argument, the relative speed between the light and Bob as measured in

S is c − v = (1 − β)c. The distance that must be covered is still L/2. Therefore, the

time is

t2 =L/2

c− v=⇒ ct2 =

L0/2

(1− β)γ=

√1 + β

1− βL0

2. (11.8)

Note that, even though ct′1 = ct′2 so that these two events (the light reaching Alice and

the light reaching Bob) occur simultaneously in the S′ reference frame, they do not

occur simultaneously in the S reference frame. In S, event 1 happens first, then event

2. The time difference is

c∆t ≡ ct2 − ct1 =

√1 + β

1− βL0

2−

√1− β1 + β

L0

2= βγL0. (11.9)

Events that are simultaneous in one reference frame may not be simultaneous in another

reference frame. This is “loss of simultaneity”.

(d) Whether it is Alice looking at her watch or someone at rest on the platform immediately

by Alice when the light reaches her, they must agree on what Alice’s watch reads at

this moment, which we have already determined is ct′1 = L0/2. The same can be

said of Bob’s watch. Therefore, the reference frame S observes Alice’s watch read

ct′1 = L0/2 when their own watch reads ct1 =√

1−β1+β

L02 , and they observe Bob’s watch

read ct′2 = L0/2 when their own watch reads ct2 =√

1+β1−β

L02 . It follows that the clocks

of A and B are no longer synchronized in the S reference frame! The clock at the

back of the train (Alice’s) is systematically ahead of the clock at the front of the train

(Bob’s). The offset is c∆t = βγL0. This is “loss of synchronicity”.

(e) In S′, after the light has reflected off of A or B’s mirror, it has to travel a further L0/2

distance before returning to C. Therefore, the lights reach C at the same time. Let us

call this event 3. Then,

(ct′3, x′3) = (L0, 0). (11.10)

– 31 –

As measured in reference frame S, the return trip of the light from B to C takes as

much time as it took for the light to go from C to A. The return trip of the light from

A to C takes as much time as it took for the light to from from C to B. Therefore,

the light comes back to C also at the same time, namely

ct3 = ct1 + ct2 = γL0. (11.11)

That is, between event 0, when C turns on the light, and event 3, when the light

returns to C, a time L0 has elapsed in S′ while a time γL0 has elapsed in S. This is

“time dilation”. S observes clocks in S′ tick more slowly than his own.

11.3. Passing Trains

This problem is a combination of several problems taken from “Introduction to Classical

Mechanics” by David Morin.

Charlie stands on a platform while Alice, in one train, and Bob, in another train, pass by

going in the same direction. Both trains have a proper length L. A’s speed is 4c/5, and

B’s speed is 3c/5. A starts out behind B.

(a) In C’s reference frame, how long does it take for A to overtake B (i.e., the time between

the front of A passing the back of B, and the back of A passing the front of B)?

(b) Same question, but in the reference frame of A.

(c) Same question, but in the reference frame of B.

(d) David moves from the back of B’s train to the front at a constant speed, such that he

coincides with both the even of the front of A passing the back of B and the back of

A passing the front of B. How long does the overtaking process take in D’s reference

frame?

(e) Verify that the interval between the two events E1 = front of A passes back of B, and

E2 = back of A passes front of B, is the same in all reference frames A, B, C and D.

SOLUTION:

(a) Let xiS and ctiS denote the position and time of event Ei (i = 1, 2) in some reference

frame S, which can be A, B, C or D. Set the origin of coordinates to be event E1:

x1A = x1B = x1C = x1D = ct1A = ct1B = ct1C = ct1D = 0. (11.12)

Let γSS′ be the gamma factor associated with the motion of reference frame S as

viewed by reference frame S′. By definition, γSS = 1, of course, for any S. The gamma

factors of A and B as viewed by C are

γAC =1√

1− (4/5)2=

5

3, γBC =

1√1− (3/5)2

=5

4. (11.13)

– 32 –

Let LAS and LBS be the length of A’s and B’s train, respectively, in the reference

frame S. By definition, LAA = LBB = L are the proper lengths. LAC and LBC are

length contracted by the appropriate gamma factor:

LAC =L

γAC=

3L

5, LBC =

L

γBC=

4L

5. (11.14)

Let xfrontAS (tS) and xfront

BS (tS) be the position of the front of train A and B, respectively,

in reference frame S as a function of the time in reference S, and similarly define

xbackAS (tS) and xback

BS . Then,

xfrontAC (tC) =

4ctC5

, xbackBC (tC) =

3ctC5

, (11.15a)

xbackAC (tC) = −3L

5+

4ctC5

, xfrontBC (tC) =

4L

5+

3ctC5

. (11.15b)

Indeed, t1C is defined to be the time tC when xfrontAC = xback

BC , which is indeed t1C = 0

and happens when xfrontAC = xback

BC = 0. The overtake happens when xbackAC = xfront

BC :

− 3L

5+

4ct2C5

= xbackAC (t2C) = xfront

BC (t2C) =4L

5+

3ct2C5

. (11.16)

Solving for t2C gives

ct2C = 7L . (11.17)

Just plug this back into xbackAC or xfront

BC to get x2C , the position where the back of A

passes the front of B as viewed by C’s reference frame. This is

x2C = 5L . (11.18)

Aside: We can also determine ct2C as follows. A must travel farther than B by an

excess distance equal to the sum of their lengths (as viewed by C’s reference frame),

which is 7L/5. The relative speed between A and B as viewed by C’s reference frame

is c/5. Therefore, the overtaking time is

t2C =7L/5

c/5=

7L

c=⇒ ct2C = 7L. (11.19)

(b) We need to know the speed of B as viewed by A. From A’s perspective, C is moving

with velocity vCA = −4c/5. From C’s perspective, B is moving with velocity vBC =

3c/5. Therefore, from A’s perspective, B is moving with velocity

vBA =vBC + vCA1 + vBCvCA

c2=

3c5 +

(−4c

5

)1 + 3

5

(−4

5

) =− c

5

1− 1225

= −5c

13. (11.20)

The associated gamma factor is

γBA =1√

1−(− 5

13

)2 =13

12. (11.21)

– 33 –

Therefore, the length of train B as measured in A’s reference frame is

LBA =L

γBA=

12L

13. (11.22)

Then,

xfrontAA (tA) = 0, xback

BA (tA) = −5ctA13

, (11.23a)

xbackAA (tA) = −L, xfront

BA (tA) =12L

13− 5ctA

13. (11.23b)

Again, t2A is defined to be the time when xbackAA = xfront

BA :

− L = xbackAA (t2A) = xfront

BA (t2A) =12L

13− 5ct2A

13. (11.24)

Solving for t2A gives

ct2A = 5L . (11.25)

Furthermore,

x2A = −L . (11.26)

(c) From B’s perspective, A is moving with velocity

vAB =vAC + vCB1 + vACvCB

c2=

4c5 +

(−3c

5

)1 + 4

5

(−3

5

) =c5

1− 1225

=5c

13. (11.27)

It should not be a surprise that vAB = −vBA!

Therefore, in this reference frame,

ct2B = 5L , and x2B = L . (11.28)

(d) In C’s reference frame, D must travel the distance x2C = 5L in the time t2C = 7L/c.

Therefore, the velocity of D with respect to C is

vDC =5L

7L/c=

5c

7. (11.29)

The velocities of A and B as viewed by D are

vAD =vAC + vCD1 + vACvCD

c2=

4c5 +

(−5c

7

)1 + 4

5

(−5

7

) =c

5, (11.30a)

vBD =vBC + vCD1 + vBCvCD

c2=

3c5 +

(−5c

7

)1 + 4

5

(−5

7

) = − c5. (11.30b)

– 34 –

It should not be a surprise that vBD = −vAD (why not?) In fact, instead of determining

vDC as we did in (11.29), we could have determined vDC by insisting that vAD = −vBD.

The associated gamma factors are equal:

γAD = γBD =1√

1− (1/5)2=

5

2√

6. (11.31)

The lengths of A and B as viewed by D are equal and given by

LAD = LBD =2√

6L

5. (11.32)

From D’s perspective, each train travels a distance equal to each one’s length during

the overtaking process. Thus,

t2D =LADvAD

=2√

6L/5

c/5=

2√

6L

c. (11.33)

Both events occur at the position of D, which is just the origin in D’s own reference

frame. Therefore,

ct2D = 2√

6L , and x2D = 0 . (11.34)

(e) In A and B:

(ct2A)2 − (x2A)2 = (ct2B)2 − (x2B)2 = 25L2 − L2 = 24L2 . (11.35)

In C:

(ct2C)2 − (x2C)2 = 49L2 − 25L2 = 24L2 . (11.36)

In D:

(ct2D)2 − (x2D)2 = 24L2 − 0 = 24L2 . (11.37)

– 35 –

12. Midterm 2 Quiz

(1) The first missing order in the interference/diffraction pattern produced by a double-slit

setup is the fifth interference fringe. What is the ratio of the center-to-center distance

between the two slits and the slit width?

Answer: Center-to-center slit separation = d and slit width = a. We are asked forda . Angle relation for fifth interference maximum: d sin θ = 5λ. Angle relation for first

diffraction minimum: a sin θ = λ. Take the ratio of the two equations: da = 5.

(2) A thin layer of oil sits on top of some water in a beaker. Looking above, where are you

most likely to see the highest density of interference rings and why?

Answer: The interference is between reflected light off of the air-oil interface and the

light reflected off of the oil-water interface. The latter travels twice the thickness of the

oil more than the former (and you also have to take into account the different indices

of refraction, and possible reflection phase shifts).

If the thickness of the oil film were absolutely constant, then there wouldn’t be

interference fringes or rings. Instead, the entire layer would appear to be some constant

brightness somewhere between complete constructive or destructive interference.

You only get fringes or rings if the thickness of the oil changes as a function of

position. In a very clean sample, the thickness of the oil is probably going to be

changing most rapidly near the edge of the beaker due to surface tension (this is the

so-called meniscus). Therefore, one expects to the see the highest density of rings near

the edge.

(3) Fighter jets flying towards a Radar tower on a coast find that they can remain well-

hidden if they fly very low, close to the water. Why?

Answer: The wavelength of the radar signal is large compared to the characteristic

size of structure on the surface of the water, such as waves, etc. So, the water surface

pretty much looks flat from the radar signal’s perspective and acts like a flat mirror.

Direct radar signals can now interfere with reflected ones. The reflected ones look like

they are coming from a virtual radar tower at the same position of the original radar

tower, but just the same distance below the sea level as the original tower is above

sea level. However, the signal from this virtual tower appears to be phase shifted by

π relative to the original tower from the beginning because the actual radar signal is

phase shifted by π upon reflection on the air-water interface.

In summary, this is like a two-slit interference problem, where the slit separation is

about twice the height of the tower above sea level and where the light coming from

one of the slits is already phase shifted by π relative to the other slit right from the

start. In this case, the middle point of the interference pattern directly ahead of the

two slits would have destructive rather than constructive interference. The midway

point is the surface of the water. Thus, the radar signal is weak near the surface of the

water.

– 36 –

Of course, there are other angles at which the radar signal is weak, but if you are in

a fighter jet, you might now know the details of the radar towers set up on the enemy

shore (e.g., their locations, their heights, etc.). The radar signal will be low near the

surface of the water regardless of such details. Therefore, that’s the pilot’s safest bet.

(4) What effect does putting a quarter-wave plate in front of just one slit in a two-slit

setup have on the interference pattern?

Answer: I’m assuming here that the incident light is polarized along the direction

of the optical axis of the quarter-wave plate. This problem would be much harder

otherwise. In this case, there is in initial phase difference of π2 (or a quarter wave)

between the two slits. The interference/diffraction pattern will look the same, just

shifted in the direction towards the slit that was covered with the quarter-wave plate.

The shift is such that the new central maximum lies halfway between the old central

maximum and the old first minimum.

(5) Four lightning beams strike a passing train, two at the front and two at the back. In the

frame where the train is passing by, all four lightning strikes happen simultaneously?

Order the events in the train’s reference frame.

Answer: In the train’s reference frame the two lightning strikes at the back hap-

pen simultaneously and the two lightning strikes at the front happen simultaneously.

However, the two at the front happen before the two at the back. Remember that syn-

chronized clocks on the train are not synchronized from the prospective of the frame

in which the train is moving; the clock at the back of the train is systematically ahead

of the clock at the front of the train. In the frame where the train is passing, a clock

at the back of the train reads a later time than does the a clock at the front when the

lightning beams strike. Therefore, in the train reference frame, the lightning strikes

the front before the back.

(6) If I’m on a train traveling at 4c/5 relative to you and I shoot a rocket forwards at

speed 3c/5 relative to me, then at what speed is the rocket moving relative to you?

Answer: The train is reference frame S′ and you are in reference frame S. The speed

of S′ relative to S as measured in S is v = 4c5 . The speed of the rocket as measured in

S′ is u′ = 3c5 . The speed of the rocket as measured in S is

u =u′ + v

1 + u′vc2

=3c5 + 4c

5

1 + 35 ·

45

=7c5

1 + 1225

=35c

37.

(7) If I’m on a train traveling at 4c/5 relative to you and I shine light forward, then at

what speed is the light moving relative to you?

Answer: c. If you plug in u′ = c in the previous problem, you will get u = c.

(8) “Derive” time dilation using the relativistic clock example.

AnswerL A train (reference frame S′) moves at speed v relative to reference frame

S. Transverse to the direction of motion of the train, light is sent from one side of the

– 37 –

train to the other and reflected back, for a total distance of, say, 2h. The time it takes

for this round trip in S′ is t′ = 2hc .

In S, the total speed of the light beam is still c, but the component of this velocity

along the direction in which the train is moving is now v. Therefore, the transverse

component is√c2 − v2 = c

γ . The transverse distance that must be covered is still 2h.

Therefore, the round trip time in S is t = 2hc/γ = γt′.

(9) “Derive” length contraction using time dilation.

Answer: A light signal is sent from the back of the train to the front and back. Let

L0 be the proper length of the train (measured in its own rest frame). Then the round

trip time in S′ is t′ = 2L0c .

Let L be the length of the train in S. The relative speed between the light beam

and the front of the train in S is c − v. The relative speed between the back of the

train and the light beam after reflected is c+ v. Therefore, the round trip time in S is

t = Lc−v + L

c+v = 2Lcc2−v2 = 2γ2L

c .

From time dilation, we have 2γ2Lc = t = γt′ = 2γL0

c , which gives L = L0γ .

(10) What are the two effects involved in the relativistic Doppler effect?

Answer: (1) The standard Doppler effect, whereby the wavelength of the signal is

shortened if the source is moving towards you and lengthened if moving away; and

(2) Time dilation, whereby the time it takes for a new wavefront to be created by the

moving source increases relative to when it is at rest.

(11) A train and a tunnel both have proper lengths L. The train moves toward the tunnel

at speed v. A bomb is located at the front of the train. The bomb is designed to

explode when the front of the train passes the far end of the tunnel. A deactivation

sensor is located at the back of the train. When the back of the train passes the near

end of the tunnel, the sensor tells the bomb to disarm itself. Does the bomb explode?

Answer: Yes, the bomb explodes. Let us first consider the train reference frame, in

which the answer is obvious. In this frame, the train has length L and the tunnel has

length L/γ < L and is heading towards the train at speed v. Therefore, it is clear that

the back end of the tunnel will pass the front of the train before the front end of the

tunnel reaches the back of the train.

In the tunnel frame, the tunnel has length L and the train has length L/γ < L.

Therefore, the back of the train reaches the near end of the tunnel before the front of

the train reaches the back of the tunnel. You might be tempted to say that the bomb

is then deactivated before it can explode. However, you have to keep in mind that the

deactivator at the back of the train needs to send a signal to the bomb at the front

of the train saying that it has reached the front end of the tunnel and that the bomb

should therefore disarm itself. At best, that signal can travel at the speed of light. It

will take time for that signal to reach the bomb at the front of the train. If that time is

– 38 –

longer than the time it takes for the front of the train to reach the back of the tunnel,

then it will be too late and the bomb will explode.

Let the front of the tunnel correspond to x = 0 and let t = 0 be when the back of

the train passes the front of the tunnel. Henceforth, the signal sent by the deactivator

travels forward at the speed of light, its worldline described by xs = ct (the s subscript

stands for “signal”). At t = 0, the front of the train is at x = L/γ, since that is the

length of the train in the tunnel reference frame. The trajectory of the front of the

train is xb = Lγ + vt (the b subscript stands for “bomb”). Which one reaches x = L

(the back of the tunnel) first? Well, the time it takes for the signal is ts = L/c whereas

for the bomb takes tb =(L− L

γ

)/v = ts

β

(1− 1

γ

), where β ≡ v/c. We claim that tb < ts

and so the bomb explodes.

To prove this, start with the inequality β < 1, which just says that the train must be

moving at less than the speed of light. Multiply by 2β and add 1 to both sides to get

1 + 2β2 < 1 + 2β. Now, subtract 2β + β2 from both sides to get 1− 2β + β2 < 1− β2.

Rewrite the left hand side as (1−β)2, then take the positive square root of both sides to

get 1−β <√

1− β2 = 1γ . This final inequality can be rearranged to read 1

β

(1− 1

γ

)< 1.

But, the left hand side is just tb/ts, and so tb < ts.

Below are spacetime diagrams in both reference frames in the case β = 4/5. Note

that we have set t = t′ = 0 when the front of the train lines up with the front of the

tunnel. But, note that we have not set x = x′ = 0 to be the position of this event.

The spatial origins of the frames are different: x′ = 0 for the center of the train and

x = 0 for the center of the tunnel. In the train frame, the explosion happens before

the deactivation signal is sent. In the tunnel frame, those two events occur in the

opposite order. However, in both reference frames, the bomb explodes; it certainly

cannot be the case that the train explodes in one frame whereas it does not in the

other! Assuming that the explosion “signal” (i.e. the fires, etc.) travel at the speed of

light, the red shaded regions represent the region of the train that is engulfed in fire,

or at least the region that is aware of the fact that the explosion has occurred.

– 39 –

13. Energy and Momentum

Relativistic energy and momentum are given by

E = γmc2, p = γmv, (13.1)

where γ is the usual gamma factor associated with v.

You can argue these forms with the use of somewhat cryptic collision arguments and

energy and momentum conservation, as is done in your textbook. I would like to discuss

4-vectors instead.

13.1. 4-Vectors

We can combine the time and space coordinates of an event, as measured in some reference

frame S, into a column of four numbers: ct

x

y

z

.

Then, we know how these coordinates transform when we change reference frames: they

change via a Lorentz transformation. For example, if the reference frame S′ is moving

with speed v in the +x direction relative to S, then the primed coordinates for the event,

measured in S′, are related to the unprimed coordinates for the event, measured in S, viact

x

y

z

=

γ βγ 0 0

βγ γ 0 0

0 0 1 0

0 0 0 1

ct′

x′

y′

z′

. (13.2)

Incidentally, if S′ is moving with speed v in the +y direction instead, thenct

x

y

z

=

γ 0 βγ 0

0 1 0 0

βγ 0 γ 0

0 0 0 1

ct′

x′

y′

z′

.

and similarly if S′ is moving in the +z direction.

Any collection of four numbers that can be combined into a column and transforms in

this way from one reference frame to the next is called a 4-vector.

If we consider two events, each with its own set of coordinates (both measured in

the same reference frame S), then we can also write down the difference between those

coordinates. This is the spacetime displacement from event 1 to event 2:c∆t

∆x

∆y

∆z

=

c(t2 − t1)

x2 − x1

y2 − y1

z2 − z1

,

– 40 –

The same can be done for any 4-vector (i.e., one can consider differences in a pair of

4-vectors measured in the same reference frame). The same argument that leads to the

invariance of the interval (c∆t)2 − (∆x)2 − (∆y)2 − (∆z)2 implies that the same can be

said for any 4-vector.

Furthermore, if you multiply any 4-vector by a scalar, which is some number which

does not change from one reference frame to the next (e.g., mass), then the result is still

a 4-vector. For example, between any two events that are causally related (i.e., can be

connected by something traveling at a speed less than or equal to the speed of light), there

is one particular reference frame, S∗, in which the two events occur at the exact same

location in space. The time between the two events in that particular reference frame is

called the proper time, denoted ∆τ .

This proper time is invariant under Lorentz transformation. This statement is essen-

tially tautological: It is true by fiat, because the proper time is defined with respect to

the particular reference frame S∗. This is the same reason why mass is invariant: Mass is

defined as the energy (up to factors of c) in the rest frame of the object. If you were to ask

me what is the mass of an object that is moving, I would say it is the same mass that the

object would have if it were not moving. Note that we are talking about mass here, not

this bizarre thing called the relativistic mass, γm, which you should expeditiously excise

from your minds.

Therefore, the spacetime displacement between two events, measured in some reference

frame S, can be divided by the proper time between those two events and the result is still

a 4-vector, since the latter is a scalar: c∆t/∆τ

∆x/∆τ

∆y/∆τ

∆z/∆τ

.

The time between those two events measured in any other reference frame, S, is equal to

∆t = γ∆τ , where γ is the gamma factor associated with the velocity at which S∗ is moving

relative to S (this is time dilation). Thus,c∆t/∆τ

∆x/∆τ

∆y/∆τ

∆z/∆τ

=

γc

γ∆x/∆t

γ∆y/∆t

γ∆z/∆t

.

Taking the limit as all these deltas become really small turns the ratios into derivatives.

We recognize these derivatives to be the components of the velocity of S∗ relative to S.

This defines the 4-velocity: γc

γvxγvyγvz

. (13.3)

– 41 –

Finally, we can multiply by the scalar mass of some hypothetical object which moves

between the two events. The result is also a 4-vector and it is called the energy-momentum

4-vector: E/c

pxpypz

=

γmc

γmvxγmvyγmvz

, (13.4)

which are precisely the definitions (13.1) given at the start.

For free, we have the invariance of the interval associated with this 4-vector, which is

E2

c2− |p|2. (13.5)

Usually, this is actually multiplied by the constant c2 to get E2 − |p|2c2.

13.2. Colliding Photons

[Goldstein, Poole & Safko 7.22 ] A photon of energy E2 collides at angle θ with a photon

of energy E1. Determine the minimum value of E2 permitting the formation of a pair of

particles of mass m, as a function of E1, m and θ.

SOLUTION:

Expectations: We should expect that the we would need to pump in more energy if we

are to create heavier particles. Therefore, if m gets larger, we expect that E2 must get

larger as well: E2 ∼ m#, where # is some positive exponent. If the first photon already

has a lot of energy (E1 is large), then the second photon shouldn’t have to have so much

energy anymore, and vice versa. Therefore, if E1 is big, then E2 can be small, and if E1 is

small then E2 should be big: E2 ∼ 1

E#1

. In fact, we can do a bit better than that. Since

E2 must have units of energy, and mc2 and E1 have units of energy, we ought to have

E2 ∼ mc2(mc2

E1

)#, where # is some positive exponent. Actually, to be most conservative,

all we can really say is that E2 ∼ mc2f(mc2

E1

), where f(x) is an increasing function for x > 0

as x increases. If θ → 0, the collision is very weak and the incoming energies must be huge

in order to produce something. Thus, we expect that E2θ→0−−−→ ∞. The opposite scenario

is θ → π, which corresonds to a head-on collision. This is the “best-case scenario” since it

is the strongest collision. Thus, E2 should be minimal at this angle: E2θ→π−−−→ min E2. In

summary,

E2 =mc2f

(mc2

E1

)g(θ)

, (13.6)

– 42 –

where f is an increasing function and g is a function which goes to zero as θ goes to zero

and attains a maximum as θ goes to π.

Center of Momentum Frame Picture: It is difficult to describe exactly what happens

in the lab frame, S, which is the frame in which the drawing above is drawn. The two

masses are in general moving in all sorts of possible directions with all sorts of possible

energies. However, the picture is very simple in the center of momentum frame, S′. This is

the frame where the total momentum of the system is always exactly 0. So, in this frame,

the two photons undergo a head-on collision with both photons coming in with the same

energy and equal and opposite momenta. If there is insufficient energy to produce the two

masses, m, then the photons could just pass each other, or they could turn into something

else. If there is more energy than is the minimum required, then the two masses, m, will

be produced and the remaining energy is distributed evenly between the two of them as

their kinetic energies. So, the two masses fly off in opposite directions with equal energy

and equal and opposite momenta. At the absolute critical case, the two photons collide

and all of their energy is used up to produce two masses, m, just sitting there in the center

of momentum frame... not moving.

Method 1 (Relativistic Invariant): The relativistic invariant in the COM of frame, S′,

is E′2− p′2c2, where E′ is the total energy and p′ is the magnitude of the total momentum

vector in the COM frame. Well, by definition, p′ = 0. Thus, the relativistic invariant in

the COM frame is just E′2.

The total energy in the lab frame, S, is E = E1 + E2. We have to break up the

momentum vectors of each photon into their components to calculate the magnitude of

the total momentum vector, p. The horizontal component of p is E1c + E2

c cos θ and the

vertical component is E2c sin θ. Therefore,

p ≡ |p| =√(

E1c + E2

c cos θ)2

+(E2c sin θ

)2= 1

c

√E2

1 + E22 + 2E1E2 cos θ.

I would like to rewrite this by adding and subtracting 2E1E2 under the square root. Adding

2E1E2 to E21 + E2

2 completes the square to give (E1 + E2)2. Thus,

p = 1c

√(E1 + E2)2 − 2E1E2(1− cos θ) = 1

c

√(E1 + E2)2 − 4E1E2 sin2 θ

2 ,

where I used the trigonometric identity sin2 θ2 = 1−cos θ

2 . This last step is certainly not

necessary; it is just my habit to do this whenever I see 1 − cos θ, even though it is not

always useful.

The relativistic invariant calculated in the lab frame, S, is thus

E2 − p2c2 = (E1 + E2)2 − (E1 + E2)2 + 4E1E2 sin2 θ2 = 4E1E2 sin2 θ

2 .

This is equal to the relativistic invariant in the COM of frame, which we have already

determined to be just E′2 because p′ = 0. Thus,

E′ = 2√E1E2 sin θ

2 . (13.7)

– 43 –

As we have already discussed above, in the critical case, all of the total energy in the COM

frame, E′, is used up to form two masses, m, at rest. Thus,

E′ = 2√E1E2 sin θ

2 = 2mc2 =⇒ E2 =m2c4

E1 sin2(θ/2). (13.8)

Notice that this does satisfy all of the expectations we stated in the beginning!

Method 2 (Transform to COM frame): The COM frame is moving relative to the lab

frame along the direction of the total momentum vector, p, in the lab frame. Therefore, we

will set that direction to be the +x-direction. Note that this is not the horizontal direction,

which is what you might have been tempted to call the +x-direction instead. With this

choice of coordinates, the total momentum vector, p, does not have any y or z components

and thus we can neglect y and z altogether. The x-component of the total momentum

vector is therefore just the magnitude of the total momentum vector.

The top two components of the momentum 4-vector in the lab frame are

pµ =

(E/c

p

)=

(E1+E2

c

1c

√(E1 + E2)2 + 4E1E2 sin2 θ

2

).

Notice the notation here. The upper Greek index on pµ just indicates that this is a 4-

momentum vector and the components are p0 = E/c, p1 = px, p2 = py and p3 = pz.

Technically, I should write down the y and z components, but they are both 0.

Let us rewrite pµ by factoring out (E1 + E2)2 from the square root in p:

pµ =E1 + E2

c

(1√

1 + 4E1E2 sin2(θ/2)(E1+E2)2

)≡ E1 + E2

c

(1

A

). (13.9)

Note that I just called the whole mess in the square root A, so that I don’t have to keep

writing it over and over again.

All we know is that the COM frame moves in the +x-direction relative to the lab

frame. But, we don’t know how fast it is moving. Let us set its speed to be βc, with

corresponding γ factor. We will have to determine what β has to be for the COM frame.

We boost pµ to get the 4-momentum vector in the COM frame:(E′/c

p′

)= p′µ = γ

(1 −β−β 1

)E1 + E2

c

(1

A

)= γ

E1 + E2

c

(1− βAA− β

).

For the COM frame, we know that p′ = 0. But, we see above that p′ ∝ A− β. Therefore,

the β parameter that takes us from the lab frame to the COM frame must be β = A.

Plugging that back in to the equation above gives(E′/c

p′

)=

1√1−A2

E1 + E2

c

(1−A2

0

)=E1 + E2

c

√1−A2

(1

0

).

– 44 –

Plugging in the definition of A in Eqn. (13.9) gives(E′/c

p′

)=

2√E1E2 sin(θ/2)

c

(1

0

),

which gives precisely the same E′ as we found in method 1 in Eqn. (13.7).

14. Quantum Mechanics

For me, the double slit experiment is the gateway to quantum mechanics. This is not

historically how the field developed. I would say that that is closer to the way your textbook

presents the material, with Planck’s discovery of the Planck distribution, derived from his

clever insight that light came in discrete units called photons, whose energy was directly

proportional to the frequency of the light, the proportionality being Planck’s constant.

Then, Einstein ran with this idea to explain the photoelectric effect, etc.

However, I think that the double slit experiment, moreso that either the Planck distri-

bution or the photoelectric effect, really captures a broad scope of the weird and wonderful

phenomena that propelled quantum mechanics in the early days and which were the subject

of many a heated debate. I hope I’ll be able to convince you of this, but in the meantime,

please accept my apologies for presenting material now that is in a later chapter of your

textbook.

14.1. The Wacky World of the Double Slit

Imagine performing the double slit experiment with light that is weak enough so that

photons arrive at the screen at a low enough frequency that you (or, more accurately,

the detectors on a screen) can actually distinguish the arrival event of each single photon.

Surely, we would have to conclude that light is made up of bona fide particles in this case

since you can see when each one arrives at a particular point on the screen.

If you were to cover one of the slits, then photons pass through the other slit, theo-

retically one at a time, and they just go straight through to the screen. You would expect

to see dots form on the screen (if you used photographic film or something like that) right

around the point on the screen directly in front of the slit. These dots would pile up over

time as you exposed the film longer and longer.

If you were to have both slits open, you might think that you would just get two regions

on the screen, on directly in front of each one of the slits, where photons pile up over time.

After all, if one photon goes through the slits at a time, then it either goes straight in front

of one slit or the other, right? Surprisingly, that’s not what happens at all. Instead, you

will observe the same old interference pattern that you see when you shine a strong light

source through the slits, it just takes time for the pattern to build up as you expose the

film longer and longer!

In real life, these experiments were first done with electrons rather than photons. For

the time being, let us postpone discussion why you might or can use electrons instead of

photons in the double slit experiment. Below are pictures taken from the original papers of

– 45 –

(a) P. G. Merli, G. F. Missiroli and G. Pozzi.

“On the statistical aspect of electron inter-

ference phenomena.” American Journal of

Physics 44 306, (1976).

(b) A. Tonomura, J. Endo, T. Matsuda, T.

Kawasaki and H. Ezawa. “Demonstration

of single-electron build-up of an interference

pattern.” American Journal of Physics 57

117, (1989).

Figure 1: Time lapse exposures in the double-slit experiment performed using electrons.

the first experiments to actually observe this effect. If you want to see a video of this done in

2012, see http://iopscience.iop.org/1367-2630/15/3/033018/media/njp458349movie2.mov.

Please take a moment to contemplate how amazing this is. The electrons are passing

through the slits one at a time. What on earth are they interfering with? How do they

know to land with a greater probability in some regions of the screen more than others?

To me, this experiment is the definitive demonstration of the wave-particle duality.

How can something be a wave and a particle at the same time? Well, here it is, in all its

glory. To understand this phenomenon, we will develop the rudiments of the wavefunction

picture of quantum mechanics and the so-called Copenhagen interpretation. But, let us

leave that for another day. For now, consider the following thought experiment.

Suppose you put a light source behind the double slit shooting light across each of the

slits. There is then a detector on each side that detects this light. When an electron passes

through, it may interact with the light and cause decrease in the intensity of the light that

is measured at the detectors. Basically, the electron cuts off the light beam for an instant

as it passes by. The point of this whole setup is for us to experimentally verify which slit

each electron goes through. The question is: does this have any effect on the pattern that

you observe on the wall, and, if so, what is the effect?

If you think there might be an effect, you might wonder how great an effect this might

– 46 –

have. Could I not just make the observation light arbitrarily weak so as to perturb the

system minimally?

The answer turns out to be pretty catastrophic. If you can determine which slit each

electron passes through, then the interference pattern will be completely destroyed. You

will end up with a wash of electrons on the screen mostly concentrated at the two points

on the screen directly in front of the slits! You can imagine turning the observation light

on and off, effectively destroying and then reviving the interference pattern at will!

We will not resolve this seeming paradox at the moment. But, let me just tell you the

punchline, and you will see how it works later on. The point is that you cannot simply

make the observation light arbitrarily weak. If you do, you will not be able to determine

the position of the passing electrons with sufficient resolution to determine which slit each

passed through. Furthermore, you will find out that you cannot really use arbitrarily high

momentum electrons in this experiment. It turns out that the momentum of the electrons

and the momentum of the photons you would have to use to observe those electrons will be

comparable; they are both very small, but nevertheless comparable in magnitude with each

other. Therefore, when they interact (e.g., collide), the photon may have a large effect on

the final momentum of the electron and may deflect it significantly. This will completely

destroy the interference pattern. Therefore, there is no hidden mini-demon whose job it

is to confound your efforts to measure the electrons and observe interference at the same

time. You yourself are destroying the interference pattern by perturbing the system too

strongly.

14.2. Blackbody Radiation and the Ultraviolet Catastrophe

We learn from the photoelectric effect that light may be thought of as being built out of

particles called photons, even though it behaves like a wave in most familiar situations.

Somehow, very many photons conspire to produce wave-like behavior. This concept of

photons is what starts us down the road towards blackbody radiation, although historically

the ideas of Planck about blackbody radiation, which we are about to describe, preceded,

and in fact inspired, Einstein’s explanation of the photoelectric effect.

One thoroughly embarrassing problem that remained before Planck came on the scene

is called the ultraviolet catastrophe. Consider a thermally insulated cavity of volume V

containing radiation. The energy associated with an electric field is proportional to the

square of the electric field. The equipartition theorem states that, at thermal equilibrium

at temperature T , the average energy associated with a quadratic degree of freedom, such

as this, is ∼ kT (or kT/2; it really doesn’t matter for this discussion). However, there

are technically infinitely many possible modes of radiation inside a cavity, with arbitrarily

short wavelength. If each mode is to possess an average energy of kT , then the total energy

would be infinite! Schroeder describes just how embarrassing this conclusion is: if it were

correct, you would expect to be blasted with an infinite amount of radiation every time

you open the oven door to check the cookies!

The classical assumption is that each mode can have any non-negative energy, E.

From 7B, we know that the probability for a mode to have energy E is proportional to the

Boltzmann factor: P (E) ∝ e−E/kT . Calculating the average energy per mode as you did

– 47 –

for an ideal gas in 7B for such a continuous spectrum produces the equipartition theorem

and leads to the UV catastrophe as described above.

Planck’s neat idea was that electromagnetic energy is not continuously distributed, but

is quantized in integer units of hν, where ν is the frequency of radiation and h is Planck’s

constant. He proposed that light was absorbed and emitted by matter in quanta called

photons. So, a single mode with frequency ν can have an energy of 0, or hν, or 2hν, etc.

But it cannot have an energy between these values, like hν/2, since that would correspond

to half a photon!

This leads to the Planck distribution and eventually to the Stephan-Bolztmann law

of radiation, which states that the average intensity of radiation from a blackbody at

temperature T is proportional to T 4, with a proportionality constant given by the Stephan-

Bolztman constant.

14.3. Stephan-Boltzmann Law

The Stephan-Boltzmann law gives the irradiance of a graybody at temperature T :

I = εσT 4, where σ =2π5k4

B

15h3c2= 5.67× 10−8 W

m2K4 , (14.1)

and ε is the emissivity of the graybody, which is a number between 0 and 1, with 1

corresponding to a perfect blackbody. On the other hand, the absorptivity, a, of an object

measures the fraction of the light incident on the object that the object absorbs. At

equilibrium, a = ε, meaning that what radiation the object absorbs, it emits, so that it

neither heats up (if it emits less than it absorbs) or cools down (it emits more than it

absorbs).

Assume that the sun is a blackbody of temperature 5800 K and radius 7 × 108 m,

located 1.5× 1011 m from the earth. Assume that the earth is a graybody, which absorbs

part of the radiation incident upon it from the sun, and then re-radiates it isotropically.

Neglect any other effects which could heat the earth. Calculate the surface temperature of

the earth under these assumptions.

SOLUTION:

Let RS be the radius of the sun, RES be the earth-sun distance, RE the radius of the earth,

TS the temperature of the sun, TE the temperature of the earth, and ε the emissivity of

the earth, which, at equilibrium, is also the absorptivity of the earth. The power being

radiated by the sun is just its irradiance multiplied by its surface area:

PS = (σT 4S)(4πR2

S).

By the time this light reaches the distance of the earth, the power has spread over a sphere

with radius RES . Thus, the irradiance of sunlight at the earth, which we call IES , is

IES =PS

4πR2ES

= (σT 4S)

(RSRES

)2

.

– 48 –

This light irradiance is travelling radially outwards from the sun, and so only the cross-

sectional area of the earth from the view of the sun is actually absorbing the light. This

cross-sectional area is πR2E . Furthermore, not all of that light is absorbed: only ε of it is

absorbed. Thus, the power absorbed by the earth is

P(abs)E = επR2

EIES = πε(σT 4S)

(RSRERES

)2

.

On the other hand, the power re-radiated by the earth is isotropic and radiated by all the

surface area of the earth:

P(rad)E = (εσT 4

E)(4πR2E).

At equilibrium, P(abs)E = P

(rad)E , and solving for TE gives

TE =

√RS

2RESTS = 280 K ≈ 7◦ .

That’s quite cold, but it’s supposed to represent an average surface temperature for the

earth. However, even if it were a good value, we would have to take it with a heap of salt

since we didn’t even take into account the fact that the earth has an atmosphere!

14.4. Bohr Model

Some time after the Planck’s discovery of his model of blackbody radiation (1990) and

Einstein’s explanation of the photoelectric effect (1905), Niels Bohr proposed an explana-

tion for atomic spectra: the so-called Bohr model of the atom (1913). I will not reproduce

the derivation of the radii, speeds and energies of the electron in its various orbitals in the

Bohr model. However, I will mention the way I remember the orbital energy and radius.

A special case of the virial theorem says that for orbital paths in the presence of a

central force, which is proportional to the inverse square of the radial distance, the average

potential energy along the orbit, 〈V 〉, is−2 times the average kinetic energy, 〈T 〉. Therefore,

the average total energy is 〈E〉 = −〈T 〉 = 〈V 〉/2. The convention for potential energy here

is that Vr→∞−−−→ 0−. That is, the potential energy is negative and approaches zero from

below at large distances. This statement of the virial theorem is particularly powerful

for circular orbits because these orbits have constant kinetic and potential energies and

therefore, we can just get rid of the averages and the result still holds!

An electron orbiting a proton in hydrogen is in the presence of the Coulomb force,

which is an inverse square force and therefore satisfies the conditions of the special case of

the virial theorem discussed above. Therefore, the energy is simply negative of the kinetic

energy or half the potential energy:

E = −T = −L2

2I= − L2

2mr2, and E =

V

2= − e2

8πε0r= −α~c

2r. (14.2)

I have introduced the dimensionless fine structure constant,

α =e2

4πε0~c≈ 1

137. (14.3)

– 49 –

Set the above two expressions for E equal to each other and solve for r:

r =L2

α~mc. (14.4)

Finally, we use Bohr’s postulate: angular momentum comes in integer units of ~:

rn =n2~2

α~mc=

n2~αmc

. (14.5)

I actually prefer writing this as

rn =n2~cαmc2

, (14.6)

because I always remember that mc2 = 0.511 MeV for the electron. It’s not really that

important because I never remember what ~c is anyway. For future reference, the value of

~c is ~c = 1.24 µeV·m (that’s micro-electron volts times meters).

Plugging this back into the expression E = −α~c2r gives the energies

En = −α2mc2

2n2. (14.7)

This is the only expression for the orbital energy I ever remember because it is nice and

succinct. I always remember that the hydrogen energy levels go like E ∼ 1/n2. The only

sensible unit of energy in this problem is mc2, the rest mass energy of the electron. In fact,

we are assuming in our analysis above that the electron is non-relativistic. This means that

the energy levels should be very small compared to the rest mass energy of the electron.

That is, they should be measured in units of the electron rest mass energy and in those

units, they should be small. Indeed, this is the case because α2 is a small number.

I remember the factor of α2 via an argument from quantum field theory. Don’t worry,

you don’t have to understand the details of the argument or how it is derived in quantum

field theory. A qualitative picture will suffice. Worst case scenario: this can just serve as

a memory aid. The interaction between the electron and the proton, or indeed any two

charged objects, happens via the exchange of a photon. For example, the simplest such

exchange might look like

�

JJJ]

JJJ

]

�α α

e− p+

(14.8)

This is what is called a Feynman diagram. It is supposed to denote an electron and a

proton coming in, interacting via the exchange of a photon, and then going out. In the

diagram, time goes upwards. The diagram makes it seem as though the electron and proton

go away from each other after the interaction (i.e., repel). This is just the conventional

way this diagram is drawn; in fact, the diagram does not usually live in space anyway,

but rather in momentum space. But, if you like, there is nothing wrong with drawing the

outgoing electron and proton lines to be heading towards each other rather than away.

Each vertex denotes a local interaction between a charge and the photon and counts

as one factor of α. There are two vertices in the above diagram, and therefore the overall

– 50 –

interaction strength goes like α2. Of course, you can have ever-more complicated diagrams

with more and more photon lines. You can even have internal loops consisting of electrons

and positrons and all sorts of other particles. However, these will necessarily come in with

ever-more factors of α and since α is small, these are ever-smaller effects. The energy,

(14.7), is sometimes called the tree-level energy because it is derived from the above tree-

level Feynman diagram, which contains no loops.

Finally, there is the pesky factor of 2 in the denominator. If you remember everything

else and, in addition, remember that, for hydrogen, when you plug in n = 1, you are

supposed to get −13.6eV, then you can’t miss the factor of 2, since otherwise you would

get −27.2eV instead.

Now, consider the following problem:

(a) The power radiated by an accelerated charge e is given in classical physics by the

formula

P =1

4πε0

2e2

3c3a2 (SI units),

where a is the acceleration.

Using this formula, calculate the power radiated by an electron in a Bohr orbit

characterized by the quantum number n. (According to the correspondence principle,

when n is very large this should agree with a proper quantum mechanical calculation.)

(b) The decay rate for an electron in an orbit may be defined to be the power radiated,

P , divided by the energy emitted in the decay. (The decay rate is the inverse of the

lifetime). Use the Bohr theory expression for the energy radiated, and the expression

for P from part (a) to calculate the “correspondence” value of the decay rate when the

electron makes a transition from orbit n to orbit n−1. What is the value of this decay

rate when n = 2? (This will not agree exactly with the true quantum theory, since

the correspondence principle will not hold when n is not � 1.) What is the decay rate

when the transition is from an orbit n to an orbit n−m?

(c) Use the value of the “lifetime” of an electron in an n = 2 Bohr orbit, calculated in part

(b), to estimate the uncertainty in the energy of the n = 2 energy level. How does it

compare with the energy of that level?

SOLUTION:

(a) The acceleration in a circular orbit is related to tangential speed and radius via

a =v2

r.

The radius rn of the nth orbit is in (14.5). The speed in this orbit is given by solving

for v in the equation L = mvr and plugging in L = n~:

rn =n2~αmc

, vn =n~mrn

=αc

n. (14.9)

– 51 –

The expression for vn is particularly nice because it shows you that the electron is

pretty non-relativistic, since α is a small number, so v � c.

Therefore, the acceleration in the nth orbit is

an =v2n

rn=α3mc3

n4~. (14.10)

We can write the power as

P =2α~

3

(a

c

)2

.

Therefore, the power radiated by an electron in the nth Bohr orbital is

Pn =2α~

3

(α3mc2

n4~

)2

=2α7m2c4

3n8~. (14.11)

If we plug in n = 2, we will get

P2 =2(

1137

)7(0.511 MeV)2

3(28)(6.58× 10−16 eV · s)= 1.14× 109 eV

s.

(b) Classically, there would be a continuum of orbital states between n = 2 and n = 1

and the electron could radiate continuously and decay continuously. It’s orbit would

very quickly spiral inwards and the electron would crash into the proton. There would

be no stable atoms at all and no chemistry or life could possibly exist. Clearly, that’s

wrong! You could say that our very existence is evidence for quantum mechanics.

The model we are suggesting in this problem is that the electron sort of waits until

it would have radiated away the difference in energy between the n = 2 and n = 1

orbitals had it been radiating continuously at the rate P2, and then at that point it

radiates that whole energy difference at once. At the rate P2, the time it would take

to radiate away the energy difference between n = 2 and n = 1 is

∆t2 ≡∆E2→1

P2=−13.6 eV

22− −13.6 eV

12

1.14× 109 eVs

≈ 10−8 s. (14.12)

This is the average lifetime of the n = 2 orbital. The decay rate, γ, is just the inverse

of this.

(c) The energy-time uncertainty relation is

∆E∆t ≥ ~2.

If you use the smallest bound for our estimate and ∆t in (14.12), you get

∆E2 =~

2∆t2=

6.58× 10−16 eV · s2× 10−8 s

= 3.3× 10−8 eV. (14.13)

Since E2 is of order eV, we can say that we know the energy of the orbital to a high

precision since the uncertainty is so small in comparison.

– 52 –

14.5. Time-Evolution in 1D Infinite Square Well

First, let us prove that the wavefunctions of the one-dimensional infinite square well of

length L are orthonormal. Recall that the wavefunctions and the energies of a particle of

mass m occupying the corresponding states are labeled by a positive integer, n:

ψn(x) =

√2

Lsin

nπx

L, En =

n2π2~2

2mL2. (14.14)

We would like to prove that ∫ L

0ψ∗m(x)ψn(x) dx = δmn, (14.15)

where δmn equals 1 if m = n and zero if m 6= n. This is called the Kronecker delta. Note

that the complex conjugation is actually immaterial in this case because the wavefunctions

happen to be real. However, this is not always the case, so it’s a good idea to keep the

complex conjugation in when you write the orthonormality condition in general. Let us

write out the left hand side:∫ L

0ψ∗m(x)ψn(x) dx =

2

L

∫ L

0sin(mπx

L

)sin(nπxL

)dx = 2

∫ 1

0sin(mπξ) sin(nπξ) dξ,

where we changed the integration variable to ξ ≡ x/L, for convenience.

We can use the trigonometric identity

2 sinα sinβ = cos(α− β)− cos(α+ β).

Using this identity, we can write the integral we are calculating as∫ L


∫ 1

0

(cos[(m− n)πξ]− cos[(m+ n)πξ]

)dξ

=sin[(m− n)πξ]

(m− n)π

∣∣∣∣10

− sin[(m+ n)πξ]

(m+ n)π

∣∣∣∣10

= sinc[(m− n)π]− sinc[(m+ n)π]. (14.16)

Since m and n are both positive integers, so is m + n. Therefore, sin[(m + n)π] = 0 and

therefore sinc[(m + n)π] = 0, since the denominator, (m + n)π 6= 0. On the other hand,

m − n can be any integer - positive, negative, or zero. If m − n 6= 0, then we still have

sinc[(m − n)π] = 0, but if m − n = 0, then sinc[(m − n)π] = sinc 0 = 1. This proves the

desired relation: that this integral is equal to zero except when m = n, in which case it is

equal to 1. This is precisely the orthonormality condition, Eqn. (14.15).

It turns out that this orthonormality condition is all we need to prove that the wave-

functions, ψn(x), form a complete basis. The completeness condition says that any wave-

function that satisfies Schrodinger’s equation (in this case, for the one-dimensional infinite

square well potential) may be written as a superposition of the basis wavefunctions. We

– 53 –

would like to prove this now. Suppose we have an arbitrary wavefunction, ψ(x), that sat-

isfies the one-dimensional infinite square well potential Schrodinger equation. We would

like to write it as a superposition:

ψ(x) =

∞∑n=1

Cnψn(x). (14.17)

Let us multiply both sides by ψ∗m(x) and integrate from x = 0 to x = L:∫ L

0ψ∗m(x)ψ(x) dx =

∞∑n=1

Cn

∫ L


∞∑n=1

Cnδmn = Cm. (14.18)

This gives us a formula for calculating the expansion coefficients, Cm. There are some

technicalities regarding whether or not the integral expression on the LHS for Cm makes

any sense, but these are mathematical qualms and do not, to my knowledge, arise in

any meaningful physical situation. Thus, we have shown that the wavefunctions, ψn(x),

furnish a complete basis for all appropriate wavefunctions. [Note: this is very similar to

Fourier’s theorem, which claims that any “sufficiently nice” function may be written as a

superposition of sines and cosines or complex exponentials.]

Now, here comes the true utility of these basis wavefunctions. By construction, they

are what are called energy eigenstates because they have well-defined energies given in Eqn.

(14.14). This is useful because it is easy to write down the time evolution of a state that

has a well-defined energy. If ψ(x) is the wavefunction of a state that has energy E, then

the time evolution of that state is

ψ(x, t) = e−iEt/~ψ(x). (14.19)

Often, one defines ω ≡ E/~ so that the exponential can be written e−iωt.

We may apply this to the basis wavefunctions. The energies are En given earlier.

Define ωn ≡ En/~. Then,

ψn(x, t) = e−iωntψn(x). (14.20)

If ψ(x) does not have a well-defined energy, then this simple relation no longer holds.

However, we can write ψ(x) as a superposition of the basis states and evolve each term:

ψ(x, t) =∞∑n=1

Cne−iωntψn(x). (14.21)

Voila! We are able to time-evolve ψ(x) even though it does not have a well-defined energy!

Let’s work out an example. Suppose the particle is located somewhere on the left hand

half of the infinite square well, but most likely to be found in the middle of the left half.

Suppose its wavefunction is

ψ(x) =

2√L

sin(

2πxL

), 0 ≤ x ≤ L

2 ,

0, elsewhere.(14.22)

– 54 –

I have made sure that the integral of |ψ(x)|2 is 1, which has to be the case since |ψ(x)|2 dxis supposed to represent the probability for the particle to be located in a region of size dx

around the point x and so the integral is the probability for the particle to be anywhere,

which had better be 1. Note that this wavefunction looks very much like the n = 2 basis

wavefunction, but only on the left half of the well.

Let us use the formula for the expansion coefficient, Eqn. (14.18):

Cm =

∫ L

0ψ∗m(x)ψ(x) dx

=2√

2

L

∫ L/2

0sin(mπx

L

)sin

(2πx

L

)dx

=√

2

∫ 1

0sin(m

2πξ)

sin(πξ) dξ

= 1√2

(sinc

[(m2 − 1

)π]− sinc

[(m2 + 1

)π]). (14.23)

Note that we changed the variable of integration to ξ ≡ 2x/L. Note that for m even but

m 6= 2, this formula gives Cm = 0. We also have C2 = 1/√

2. There’s no real point to

simplifying this when m is odd. We have

ψ(x) =∞∑n=1

Cnψn(x). (14.24)

Below is a diagram of the wavefunction and its expansion. The blue is ψ(x) and the purple

is the result of adding the first 10 terms in the expansion. Of course, the fit is not perfect

because the sum must go to ∞, but it’s not bad for just the first 10 terms.

Now, we can evolve this state through time:

ψ(x, t) =∞∑n=1

Cne−iωntψn(x). (14.25)

We can take the complex square of this (i.e. multiply it with its complex conjugate) and

the result is supposed to be the probability density, P (x, t). Where P (x, t) is big is where

the particle is likely to be found if a measurement of its position is to be made. Below are

snapshots of P (x, t) at various moments in time. Notice that the particle tends to swish

– 55 –

back and forth from left to right and back again. We have shown half a period, where the

particle starts from being just on the left half to being just on the right half. This takes

t = π/ω1 worth of time.

By the way, this state has no well-defined energy. However, it does have an average energy.

The interpretation is that if one were to prepare a very very large number of identical

systems all in this initial state and then one were to take a measurement of the energy

for all the identical setups, one would get different measurements for each setup, but the

average energy is well-defined. This average energy is just the sum of the products of the

probability for the particle to be in the wavefunction ψn and the energy of that state, En:

〈E〉ψ =∞∑n=1

|Cn|2En. (14.26)

Since Cn is the coefficient of the wavefunction, ψn, in ψ, its complex square is the probability

for the particle to be in the wavefunction ψn. Note that this is an average energy as

described earlier. It is not the energy of the state. The state does not have a well-defined

energy. This is in contrast to the energy eigenstate with wavefunction ψn(x). If we prepared

a large number of identical systems all in the initial wavefunction ψn(x) and we measured

the energy of each system separately, we would always measure En.

It might be tempting to define 〈ω〉ψ ≡ 〈E〉ψ/~ and then say that the time evolution of

ψ(x) is simply

ψ(x, t) = e−i〈ω〉ψtψ(x) (INCORRECT!). (14.27)

However, this is incorrect because we cannot interpret 〈E〉ψ as the energy of the wavefunc-

tion ψ(x). This wavefunction does not have a well-defined energy. The only way we can

time-evolve the state is to write it as a superposition of the basis of energy eigenstates and

then time-evolve each piece separately, as in Eqn. (14.25).

– 56 –

15. Final Review

15.1. Human Eye Optics

Let’s consider the optics of human eyes. A human eye can be simplified as one convex lens

projecting images on to a screen (the retina). The focal length of the human eye lens is

variable. Let’s assume that the distance between the retina and the eyeball is 25 mm, and

the diameter of the pupil (which is the effective diameter of the lens of an eye) is 3 mm.

(a) If you are reading a book that is 300 mm away from your eyes and an arrow of 1 cm

size on the book forms a clear image on your retina, what is the focal length of the

lens of your eye? What is the actual image size of the arrow on your retina?

(b) What is the smallest object you can identify on the book based on the diffraction limit

of the eye? Assume the illumination light wavelength to be 600 nm.

(c) In order to achieve the diffraction limited resolution in 2, how small must the “pixel”

on your retina be?

SOLUTION:

(a) The lens equation reads 1f = 1

so+ 1

si, and so

f =sosiso + si

=(300 mm)(25 mm)

(300 + 25) mm= 23.1 mm .

The transverse magnification is

MT = − siso

= − 25 mm

300 mm= − 1

12= −0.0833.

The negative sign means the image is up-side-down. The size of the image on the retina

is

|yi| = |MT |yo =1

12· 1 cm =

1

12cm = 0.833 mm .

(b) The angular resolution is given by the Rayleigh criterion:

∆θ = 2.44λ

d,

where d is the diameter of the pupil.

This converts to a spatial resolution, ∆x, on the book by use of the small angle

approximation: ∆x = L∆θ, where L = 300 mm is the distance from the eye to the

book (also the object distance). Thus,

∆x = 2.44λL

d= 2.44

(6× 10−4 mm)(300 mm)

3 mm= 0.15 mm .

– 57 –

(c) According to part (a), the image size for an object the size in part (b) is

|yi| = |MT |yo =1

12· 0.15 mm = 12 µm .

That’s a very small pixel! However, if one rod is to serve as one pixel, measurements

done on some animals show that rods are on the order of microns in size.

15.2. Optical Fiber

An optical fiber can be considered as a glass waveguide guiding light through total internal

reflection. Light can be coupled in from the end surface of the fiber. What is the range of

angle θ of the input light so that it can be guided in the fiber? (nglass = 1.4).

SOLUTION:

Consider the following diagram:

In order to get total internal reflection at θ2, we must have

sin θ2 ≥ 1n ,

where n = 1.4 is the index of refraction of the glass. We have taken this optical fiber to be

surrounded by air with index of refraction ≈ 1. Then,

cos θ2 =√

1− sin2 θ2 ≤√

1− 1n2 = 1

n

√n2 − 1.

Since θ1 = π2 − θ2, we have

sin θ1 = cos θ2 ≤ 1n

√n2 − 1.

Using Snell’s law, we get

sin θ = n sin θ1 ≤√n2 − 1 = 0.98 =⇒ 0 ≤ θ ≤ 78◦ .

– 58 –

15.3. Modified Michelson Interferometer

The Michelson interferometer in the diagram has a birefringent plate in one arm. The

birefringent plate has a thickness of 20 µm and refractive index difference between the high

and low refractive index directions is 0.01.

A beam of y-polarized light with wavelength 800 nm is incident on the Michelson

interferometer. Initially, the high-refractive-index direction of the birefringent plate is

along the y-direction and an interference pattern shown in the diagram is generated at the

screen. Points A, B and C denote the interference first maximum, first minimum, and

second maximum in the pattern, respectively.

Describe the interference pattern under the following conditions? Give your reasoning.

(a) The birefringent plate is rotated by 45◦ from its original position.

(b) The birefringent plate is rotated by 90◦ from its original position.

Now the wavelength of the light is changed to 400 nm. Again, in terms of I0, what will

be the light intensity at positions A, B and C under the following conditions? Give your

reasoning.

(c) The birefringent plate is returned to its original position (high refractive index direction

along the y-direction).

(d) The birefringent plate is rotated by 45◦ from its original position.

(e) The birefringent plate is rotated by 90◦ from its original position.

[Hint: First determine what kind of waveplate the birefringent material is for 800 nm and

400 nm light, respectively.]

– 59 –

SOLUTION:

(a) Let us assume, for simplicity, that the bright bands near the center have roughly the

same intensity, so that the intensity at C in the diagram in the problem is also I0.

As hinted in the problem, we should first work out what type of wave-plate the

birefringent plate is for 800 nm and for 400 nm. The number of wavelengths that fit

in a plate of index of refraction n and thickness t is

N =t

λ/n=tn

λ.

Therefore, the difference in the number of wavelengths that fit in the plate for the fast

and slow directions of the waveplate is

∆N =t∆n

λ=

(2× 10−5 m)(10−2)

λ=

2

λ/(100 nm).

For λ = 800 nm, we have ∆N = 1/4 and so the waveplate is a quarter-waveplate (qwp)

and for λ = 400 nm, we have ∆N = 1/2 and so the waveplate is a half-waveplate (hwp).

After the qwp, the light is circularly polarized. Reflection off of the mirror does not

change the rotation of the circular polarization, but it does reverse the direction of

propagation of the light. Thus, left circular polarization (lcp) turns into right circular

polarization (rcp) and vice-versa. Thus, when the light passes through the qwp again,

it becomes linearly polarized light, but polarized in the x-direction rather than the

y-direction. Upon reflection off of the center half-silvered mirror, this x-polarization

becomes z-polarization. Meanwhile, the light from arm 1 remains y-polarized. Hence,

the two light beams are polarized in different directions and cannot interfere. We do

not observe rings, just one big bright spot.

(b) The beam in arm two will be phase shifted less than before by a quarter of a wavelength

for each time it passes through the qwp. That is a total of half a wavelength. Therefore,

where the interference used to be constructive, it will now be destructive and vice-versa.

A and C will now be dark and B will be bright.

(c) Twice passing through a qwp produces the image in the problem. Twice passing

through a hwp produces a phase shift that is twice as large as with the qwp. The

bright fringes in the pattern when the plate is a qwp occur when the relative phase

shift between the two arms is 2πn for an integer n and the dark fringes are when it

is 2π(n + 1

2

). If we multiply either one of these by 2, we will always get an integer

multiply of 2π. Thus, the bright fringes in the pattern when the plate is a hwp occur

at both the bright and the dark fringes in the pattern when the plate is a qwp. Of

course, there will be dark fringes in between. In other words, the pattern when the

plate is a hwp is twice is tight as when the plate is a qwp. A, B and C will all be

bright with one dark fringe in between A and B and in between B and C.

– 60 –

(d) Any component of the light perpendicular to the high-index-of-refraction axis of the

hwp will be phase shifted less relative to the component parallel to this axis by π

each time it passes through the hwp, which adds up to 2π (going twice through the

hwp). That’s the same as no phase shift at all. Therefore, there will be absolutely no

difference if we rotate the hwp by any angle: same as (c).

(e) Same as (c) for the reason stated in (d).

15.4. Diffraction Grating

Diffraction gratings can separate different wavelengths into different directions. They can

be understood as multiple slits structures. Consider a grating with 600 lines (i.e. slits)

per mm. Now we shine a red beam with wavelength 632 nm at normal incidence on to the

grating.

(a) How many strong outgoing beams will be observed? (Hint: the largest diffraction angle

will be 90◦ in this case). What are there respective outgoing angles?

(b) The outgoing beam with the smallest non-zero angle is called the first order diffraction

beam. Now, we have two incident light beams with wavelengths at 632.00 nm and

632.01 nm, respectively. What is the angular separation between their first order

diffraction beams?

SOLUTION:

(a) The condition for strong maxima for the diffraction grating is the same as that for the

double-slit: d sin θm = mλ, where d is the separation distance between the centers of

successive slits. Solve for m:

m =d sin θ

λ≤ d

λ=

(1/600) mm

632 nm= 2.64.

Note that we used the fact that, no matter what θ is, sin θ is always ≤ 1. Since m is

an integer, we conclude that the largest value for m is

mmax = 2 =⇒ There will be 5 strong outgoing beams.

These five correspond to m = 0, m = ±1 and m = ±2. The angles are θm =

sin−1(mλ/d):

θ0 = 0◦, θ±1 = ±22.3◦, θ±2 = ±49.36◦ .

(b) Since the differences are going to be very small, we would need quite a few significant

figures to calculate this on a calculator. Instead, we will write θ(1)1 for the angle of the

first order diffraction beam for wavelength λ(1) = 632.00 nm and θ(2)1 for λ(2) = 632.01

nm. Define ∆λ ≡ λ(2)−λ(1) = 0.01 nm = 10 pm (pico-meters). Define ∆θ1 ≡ θ(2)1 −θ

(1)1 ,

which we know is very small. Thus, we may Taylor expand sin θ(2)1 around θ

(1)1 :

sin θ(2)1 ≈ sin θ

(1)1 + (∆θ1) cos θ

(1)1 .

– 61 –

By our formula from part (a),

sin θ(1)1 =

λ(1)

d, sin θ

(2)1 =

λ(2)

d.

The second equation is expanded as

sin θ(1)1 + (∆θ1) cos θ

(1)1 =

λ

d+

∆λ

d.

Using the previous equation for θ(1)1 , we get

(∆θ1) cos θ(1)1 =

∆λ

d.

Thus,

∆θ1 =∆λ

d cos θ(1)1

=10−11 m

((1/600)× 10−3 m) cos 22.3◦= 6.5× 10−6 rad = (3.7× 10−4)◦ .

Your calculator could probably have given you the angles to within more than four

decimal places, in which case you could have just calculated θ(2)1 as you did θ

(1)1 and

taken the difference. If you do the calculation the way I did above, you need to

remember that the ∆θ1 you get just from plugging in the numbers is going to be in

radians, not in degrees. I had to convert to degrees at the end.

15.5. Optical Spectroscopy

(30 points) Optical spectroscopy is widely used to determine the properties of materials.

The figure below is a reflection spectrum from a thin transparent film. It displays the

reflectivity of the thin film as a function of light frequency for normal incident light. Based

on this spectrum, determine the thickness and refractive index of the thin film.

– 62 –

SOLUTION:

The thin film is surrounded by air. Let ray 1 be the ray that reflects off of the front

(air-to-film) interfrace and ray 2 the one that reflects off of the back (film-to-air) interface.

Then,

ϕref,1 = π, ϕref,2 = 0, ∆ϕref = −π.

The difference in path between 1 and 2 is that in addition to the path of 1, ray 2 goes

through the thickness, t, of the film twice. Let n be the index of refraction of the film.

Setting ϕpath,1 = 0, we have

∆ϕpath =2π

λ/n· 2t =⇒ ∆ϕtot =

(4nt

λ− 1

)π.

Let us write this in terms of frequency instead of wavelength. We write 1λ = ν

c . Thus,

∆ϕtot =

(4ntν

c− 1

)π.

For destructive interference, we set this equal to (2m− 1)π, for some integer m. Thus,

2ntν

c= m =⇒ 2nt∆ν

c= ∆m = 1,

where ∆ν = 12.5 × 1012 Hz, is the frequency separation between adjacent minima, and

∆m = 1 since m increases in unit steps (it is always an integer). Thus,

nt =c

2∆ν= 1.2× 10−5 m. (15.1)

We will work out the value of n by calculating the maximum reflectivity. We can think

of rays 1 and 2 as being two separate light sources, each of intensity RI0, where I0 is the

intensity of the incident light. Technically, the second light ray has intensity TRTI0 =

(1− R)2RI0. However, assuming RI0 instead, we will find R ≈ 0.1, which we will assume

is sufficiently small to justify the approximation. This is just a simplifying assumption;

you could very well do the more exact calculation if you wished. The rays are polarized

the same way, so we can write the total measured intensity from the superposition of the

two rays as

I = I1 + I2 + 2√I1I2 cos ∆ϕ ≈ 2RI0(1 + cos ∆ϕ),

where we calculated ∆ϕ earlier. Let us derive this result about adding intensities. Recall

that the intensity is proportional to the (complex) square of the total electric field: I =ε0c2 |E|

2. The proportionality constant will not really matter here, but there it is anyway.

The reflectance, r, tells you how much of the electric field gets reflected. Therefore, the

amplitude of the electric field in ray 1 is rE0 and in ray 2 is approximately also rE0, where

E0 is the incident electric field. However, the rays have different phases relative to each

other. The phase of ray 2 relative to ray 1 is ∆ϕ. Therefore, if the electric field in ray 1

– 63 –

is represented by rE0, then the electric field in ray 2 is rE0ei∆ϕ. The total electric field in

the sum of the two is rE0

(1 + ei∆ϕ

). The total intensity is proportional to the square of

this total electric field:

I =ε0c

2|r|2|E0|2

(1 + ei∆ϕ

)(1 + e−i∆ϕ

)= 2RI0

(1 + cos ∆ϕ

),

where the reflection coefficient, R, is the square of the reflectance, R = |r|2, and where we

have denoted ε0c2 |E0|2 as I0, the incident intensity.

At total constructive interference, ∆ϕ = 2mπ, for some integer π, and so I = 4RI0.

Therefore, the reflectivity of the film at constructive interference is 4R, that is four times

the reflectivity of just one of the air-film interfaces. According to the graph, this reflectivity

is equal to 0.4. Therefore, R = 0.1. But, R is just the square of the the reflectance, whose

formula we are given, and which simplifies at normal incidence to r = 1−n1+n :

(n− 1

n+ 1

)2

= r2 = R = 0.1 =⇒ n =1 +√

0.1

1−√

0.1= 1.925 .

Plugging this n into Eqn. (15.1) gives the thickness:

t = 6.23 µm .

15.6. Relativity and Current-Carrying Wires

Recall from E&M that an infinite straight wire containing a linear charge density, λ, gen-

erates an electric field whose magnitude, as a function of the radial distance, r, from the

wire, is E = λ2πε0r

.

Recall, as well, that an infinite straight wire carrying a current, I, generates a magnetic

field whose magnitude is B = µ0I2πr . The direction of the magnetic field rotates around the

wire in a right-handed fashion (if your right thumb points in the direction of the current,

then your right fingers wrap around the wire in the direction of the magnetic field).

If a positive charge, q, is moving with speed, v, parallel to a current-carrying wire a

distance r away, then the charge experiences a magnetic force with magnitude Fm = qvB =

qv µ0I2πr . The force is attractive if the point charge moves parallel to the current and repulsive

if it moves anti-parallel (opposite). In either case, the point charge starts accelerating in

the radial direction.

So far, we have been discussing the picture in the “lab frame”. What if we were to

consider the picture from the frame that is moving relative to the lab frame along with

the point charge, so that the point charge looks to be at rest in the horizontal direction

in this frame. Call this frame, S′, the horizontal rest frame of the point charge. In this

frame, the charge is not moving initially. Therefore, it cannot experience a magnetic force.

Yet, according to our analysis in the lab frame, it has to start accelerating in the radial

direction. Let’s see how.

(a) The wire is made up of an immobile lattice of heavy positive ions and a sea of free

mobile electrons. Suppose that each atom gives up one electron so that each ion has

– 64 –

charge +e. The wire is neutral in the lab frame and so the linear charge densities of

the positive ions and the electrons are λ and −λ, respectively. On average, how far

apart along the wire are adjacent positive ions or adjacent electrons as measured in

the lab frame?

(b) Let v be the speed of the electrons along the wire. Calculate the current.

(c) The external point charge is at a radial distance, r, from the wire. For simplicity,

suppose that it is moving with the same speed, v, and direction as the mobile electrons

in the wire. Remember that the direction of current is opposite to the direction of mo-

tion of the electrons. Therefore, this is the case when the external point charge moves

opposite to the current and thus experiences a repulsive magnetic force. Calculate this

magnetic force in the lab frame.

(d) In the frame S′, the external point charge and the electrons in the wire are at rest

(since they all have the same velocity). The positive ions are now moving backwards

with speed v. In S′, on average, how far apart along the wire are adjacent positive

ions? What about adjacent electrons? What is the net charge density measured in this

frame? [Note: Charge is a relativistic invariant.]

(e) Use your result from part (d) to calculate the force experienced by the external point

charge in the frame S′. Show that it also points radially away (i.e. is repulsive), but

is now purely electric rather than purely magnetic. You should find that the force in

S′ is bigger than in the lab frame. Can you think of a reason why this should be the

case? [Hint: Think time dilation. Note: You just need to argue why the force in S′

should be bigger than in S; you need not explain the factor.]

SOLUTION:

(a) The distance between adjacent positive ions is d+ = e/λ+ = e/λ . This is also the

distance between adjacent electrons: d− = −e/λ− = e/λ , since λ− = −λ.

(b) The linear density of the electrons is n = 1/d− = λ/e since there is one electron per

d− length of wire. The current is I = n(−e)v = −λv . The minus sign just reminds

us that the current is in the opposite direction to the motion of the electrons.

(c) Fm = qv µ0I2πr = µ0qλv2

2πr = β2qλ2πε0r

, where β ≡ v/c and we used the fact that µ0c2 = 1/ε0.

(d) Since the positive ions are at rest in the lab frame, d+ is their proper separation. Their

separation measured in S′ would be contracted by a factor of γ =[1 − (v/c)2

]−1/2.

Thus, d′+ = d+/γ = e/γλ . The situation for the electrons is exactly the opposite.

They are at rest in S′, so their separation in S′ is their proper separation. Their sep-

aration in the lab frame, which is d− = e/λ, is contracted relative to their proper

separation. Thus, the separation of the electrons in S′ is bigger than in the lab

frame: d′− = γd− = γe/λ . The wire is no longer neutral when viewed in S′: the

– 65 –

positive linear charge density is λ′+ = e/d′+ = γλ, while the negative charge density

is λ′− = −e/d′− = −λ/γ. These do not cancel anymore: the net charge density is

λ′net = λ+ + λ′− =(γ − 1

γ

)λ = γβ2λ .

(e) Even though the external charge does not feel any magnetic force in the frame S′,

the wire is now positively charged with charge density γβ2λ in this frame. Therefore,

it produces an electric field with magnitude γβ2λ2πε0r

radially outwards. This exerts a

repulsive force on the external point charge equal to the charge times the electric field:

F ′e = γβ2qλ2πε0r

. Note that the force is repulsive both in the lab frame and in the frame S′,

but the force in S′ is bigger by a factor of γ. This makes sense because S′ measures the

initial proper time for the external charge. The time in the lab frame will be dilated

relative to this by a factor of γ. In S′, the force measured on the charge is greater than

in S, but the time for acceleration is also shorter.

15.7. Pi Decay

Neutral meson π has a rest mass of 135 MeV/c2 and a half-life of 8.2 × 10−17 s. In

one experiment, high energy π mesons are generated. Then each meson decays into two

photons: π → γ + γ. Consider the following questions in the lab frame.

(a) After traveling 10−6 m, only one percent of the π mesons are left. Calculate the

velocity, kinetic energy and momentum of the generated π mesons.

(b) If the two γ photons are produced in the forward and backward directions, respectively,

what are the energies of the two photons?

SOLUTION:

(a) Let τ ′ = 8.2 × 10−17 s be the proper half-life of the mesons (measured in their own

rest frame). Let v be their speed measured in the lab frame, with corresponding

β and γ factors. Then, the apparent half-life measured in the lab frame is dilated

relative to the meson rest frame: τ = γτ ′. Suppose there are N0 mesons at the start.

Then, the number of un-decayed mesons remaining after time t in the lab frame is

N = 2−t/τN0. In the same time, the mesons will have traveled a distance d = βct

and so we can write N as a function of d instead of t by solving for t in terms of d

as t = d/βc; that is, N = 2−d/βcτN0. Let α = 1/100 be the fraction of remaining

mesons after the mesons travel a distance d = 10−6 m measured in the lab frame.

Then, α = 2−d/βcτ = 2−d/βγcτ′

= e−d ln 2/βγcτ ′ . Solving for βγ gives

v/c√1− (v/c)2

= βγ = − d ln 2

cτ ′ lnα= 6.12

Do not be thrown off by the minus sign; lnα is negative since α is a number less than

1. Now, solving for v gives

v =

− d ln 2cτ ′ lnα√

1 +(d ln 2cτ ′ lnα

)2 c = 0.987 c = 2.96× 108 m/s .

– 66 –

The momentum is

p = βγmπc = − d ln 2

cτ ′ lnαmπc = 826 MeV/c .

The kinetic energy is

T =√

(pc)2 + (mπc2)2 −mπc2 = 702 MeV .

(b) Method 1: Let E1 and E2 be the energies of the forward- and backward-moving

photons, respectively. The energy and momentum of the pion and the two photons are(Eπ/c

pπ

)=

(γmπc

βγmπc

),

(E1/c

p1

)=E1

c

(1

1

),

(E2/c

p2

)=E2

c

(1

−1

).

Conservation of energy and momentum reads

E1 + E2 = γmπc2, E1 − E2 = βγmπc

2.

Adding the two equations and dividing by 2 gives E1. Subtracting the second from the

first and dividing by 2 gives E2:

E1 =

√1 + β

1− βmπc

2

2= 831 MeV, E2 =

√1− β1 + β

mπc2

2= 5.48 MeV.

Method 2: We could use the energy conservation equation from the previous method,

which reads E1 + E2 = γmπc2, and couple it with the calculation of the relativistic

invariant for the pion. The relativistic invariant for the pion is

E2π − p2

πc2 = (γmπc

2)2 − (βγmπc)2c2 = m2

πc4.

The relativistic invariant for the two photons together is

(E1 + E2)2 − (p1 + p2)2c2 = (E1 + E2)2 − (E1 − E2)2 = 4E1E2.

Therefore,

m2πc

4 = 4E1E2.

Indeed, if you multiply the expressions for E1 and E2 found in method 1, you get14m

2πc

4, which is consistent with the above equation. So, our two equations, for the

two unknowns, E1 and E2, are

E1 + E2 = γmπc2, E1E2 =

(mπc2

2

)2.

Solve for E2 using the first equation: E2 = γmπc2 −E1, and plug this into the second

equation. One gets a quadratic equation, which, after rearranging things, reads

E21 − γmπc

2E1 +(mπc2

2

)2= 0.

– 67 –

Complete the square by adding and subtracting a term(γmπc2

2

)2:

E21−γmπc

2E1 +(γmπc2

2

)2−(γmπc22

)2+(mπc2

2

)2=(E1− γmπc2

2

)2−(mπc22

)2(γ2−1) = 0.

We write the factor γ2 − 1 as

γ2 − 1 = 11−β2 − 1 = 1−(1−β2)

1−β2 = β2

1−β2 = (βγ)2.

Therefore, our quadratic equation for E1 reads(E1 − γmπc2

2

)2 − (βγmπc22

)2= 0.

Using the standard factorization of the difference of two squares, we get[E1 − γ(1 + β)mπc

2

2

][E1 − γ(1− β)mπc

2

2

]= 0.

The first root is the same as the E1 we found in method 1 since γ(1 + β) =√

1+β1−β .

The second root is the E2 we found earlier. We know to pick the first solution for E1

because it is the larger of the two and the forward-moving photon had better have the

higher energy.

Method 3: In the rest frame of the pion, the energy and momentum of the pion and

the photons are(E′π/c

p′π

)= mπc

(1

1

),

(E′1/c

p′1

)=mπc

2

(1

1

),

(E′2/c

p′2

)=mπc

2

(1

−1

).

The photons must go off in opposite directions with the exact same magnitude of

momentum because, in the rest frame of the pion, the initial momentum is 0. Since

the photons have the same magnitude of momentum, they have the same energy, which

is half the rest-mass energy of the pion since that is the initial energy in the rest frame

of the pion.

We simply have to transform back to the lab frame. For example, for the forward-

moving photon: (E1/c

p1

)=

(γ βγ

βγ γ

)mπc

2

(1

1

)= γ(1 + β)

mπc

2

(1

1

).

Writing γ(1+β) =√

1+β1−β shows that this gives the same energy, E1, as found previously.

Similarly, one can find E2.

– 68 –

15.8. Relativistic Doppler Effect

Suppose you direct a laser beam with frequency f0 at an atom moving towards you with a

velocity u.

(a) What is the light frequency felt by the atom in its rest frame.

(b) The atom will be driven by the laser beam and re-radiate. What is the frequency of the

light radiated by the atom in its reference frame? If you observe this atom radiation,

what light frequency will you see? What is the corresponding light wavelength?

(c) Suppose you now direct the laser beam towards a mirror moving towards you with a

velocity u. What will be the light frequency that you observe? Briefly explain why.

SOLUTION:

(a) The frequency is Doppler shifted upwards (i.e. it should increase) because the atom is

moving towards you, the initial source. The atom sees a frequency, f ′, given by

f ′ =

√1 + u

c

1− uc

f0 .

(b) When the light hits the atom, the oscillating electric field will deform the electron

clouds giving the atom a dipole moment that is oscillating at the same frequency as

the light. The frequency of dipole radiation is the same as the frequency of oscillation

of the dipole. Hence, the atom radiation has frequency f ′rerad = f ′ .

Now, the atom becomes the source of the radiation and it is moving towards you,

which means that the frequency you observe is Doppler shifted upwards relative to f ′.

This shift has the same factor, assuming the atom slows down a negligible amount (due

to radiation pressure). You observe a re-radiation frequency, frerad, of

frerad =

√1 + u

c

1− uc

f ′rerad =

(1 + u

c

1− uc

)f0 .

The corresponding wavelength is

λrerad =c

frerad=

(1− u

c

1 + uc

)c

f0.

(c) The mirror is nothing more than a large collection of atoms all of which do exactly the

same thing as the one atom that this problem has been about until now. Thus, the

frequency observed will be exactly the same as that for just one atom. We could use the

fact that in the mirror’s rest frame, the reflected wave has the exact same frequency

(energy) as the incident wave, just with the exact opposite momentum. However,

that fact is derived microscopically from the induced oscillating dipoles on the mirror

surface, anyway, which is what we have done here.

– 69 –

15.9. Quantum Tunneling and Frustrated Total Internal Reflection

A plane wave with wavevector k is sent in from x = −∞ traveling to the right towards a

step potential,

V (x) =

{0, x < 0,

V0, x > 0.

(a) What is the value of V0 above which the region x > 0 is classically forbidden? Assume

that this is the case for the rest of the problem.

(b) The time-independent wavefunction in the region x < 0 (Region I) is given by

ψI(x) = Aeikx +Be−ikx,

where the A term represents the incoming plane wave moving to the right and the B

term represents the reflected plane wave moving to the left. Justify the nomenclature

here: why can we call these plane waves and why is the A term moving to the right and

theB term moving to the left? [Hint: Determine the full time-dependent wavefunction.]

(c) Determine the general form of the time-independent wavefunction in the region x > 0

(Region II), ψII(x).

(d) Calculate the transmission coefficient, defined to be the square of the ratio of the

amplitudes of the transmitted and incident plane waves:

T ≡∣∣∣∣FA∣∣∣∣2.

Also calculate the reflection coefficient R ≡ |B/A|2 and check that R+ T = 1.

CAUTION: Thanks to Carlin for reminding me of the following. The transmission

coefficient above should actually be multiplied by a factor of the ratio of the transmitted

and incident wavevectors, ktransmittedkincident

. For this problem, this doesn’t make a difference

because this ratio is equal to 1, but in general you have to keep this factor in. This is

because the transmission and reflection coefficients are defined to be the ratio of the

transmitted and reflected fluxes to the incident flux. That is, the ratio of the rates

of particle flow in the transmitted and reflected beams relative to the incident beam.

Therefore, we need to multiply the square amplitudes by the speed of propagation, and

then take the ratio. Since reflected and incident waves have the same wavevector and

speed, we never need to worry about this as far as R is concerned. However, if V in the

transmitted region is not the same as V in the incident region, then the wavevectors

in the transmitted and incident regions will be different. To be honest, partly because

of this complication, I hardly ever calculate T directly. Instead, I calculate R and then

T is just 1−R.

(e) Despite your answer to part (d), the wavefunction is not zero in the region x > 0. If

the barrier has finite extent, it is possible for the incoming particles to tunnel through

– 70 –

the barrier to the other side. Let the barrier have length L so that for x > L, the

potential once again vanishes. Write down the general form of the time-independent

wavefunction in the region 0 < x < L (Region II), ψII(x), and in the region x > L

(Region III), ψIII(x). Write down the equations you would need to solve in order to

calculate the transmission and reflection coefficients. (I’m not asking you to actually

solve them).

Note: This is similar to the phenomenon of frustrated total internal reflection. In this

case, if you bring a refractive material close to the boundary of another where you have

total internal reflection set up, then it is possible to get some light to tunnel through the

air gap and come out the other side! Below is an image showing this effect. The green

laser comes in from the right through the first prism at an angle that should make the

beam be totally internally reflected. There is a small air gap between the triangular and

the eye-shaped prisms. Nevertheless, some of the light is able to cross that gap and emerge

in the eye-shaped prism.

Image source: University of Vermont http://www.uvm.edu/~dahammon/

SOLUTION

(a) The kinetic energy is related to the wavevector via

T =~2k2

2m.

Since V = 0 in the region x < 0, the total energy is simply equal to the kinetic energy

there:

E =~2k2

2m.

The region x > 0 is classically forbidden if the potential energy there is greater than

the total energy E, since that technically means that the kinetic energy is negative!

– 71 –

http://www.uvm.edu/~dahammon/

Thus, the region x > 0 is classically forbidden if

V0 > E =~2k2

2m. (15.2)

(b) The full time-dependent wavefunction in region I is

ΨI(t, x) = e−iEt/~ψ(x) = Ae−i(ωt−kx) +Be−i(ωt+kx), (15.3)

where

ω =E

~=

~k2

2m. (15.4)

Now, these really are plane waves. If we track the zero phase (when the exponent is

zero), for the A term, as t increases the x position of the zero phase point also increases.

Therefore, this plane wave moves to the right. The opposite is true for the B term,

which therefore moves to the left.

(c) The solutions to the time-independent Schrodinger equation are real complex exponen-

tials, rather than complex exponentials. These are growing and decaying exponentials.

However, since the wavefunction cannot blow up as x→∞, the only allowed solution

is the exponentially decaying one:

ψII(x) = Ce−κx, where κ =

√2m(V0 − E)

~=

√2mV0

~2− k2 . (15.5)

(d) Both ψ and ψ′ (the derivative of ψ) must be continuous everywhere. In particular, we

must match ψ and ψ′ in regions I and II at x = 0:

A+B = C, (15.6a)

ik(A−B) = −κC. (15.6b)

Eliminate C and solve for B/A:

B

A= −κ+ ik

κ− ik=⇒ R =

∣∣∣∣BA∣∣∣∣2 = 1 .

That is, we have 100% reflection, despite the fact that ψII(x) 6= 0! The wavefunction

in the classically forbidden region is called the evanescent wave.

(e) Now, we are allowed to have both the exponentially decaying as well as the exponen-

tially growing solutions in region II because it is just a finite region. Meanwhile, the

solution in Region III is again the same plane wave solution as in region I. However,

we only want the transmitted wave; there is no incoming wave from the right. Thus,

ψII(x) = Ce−κx +Deκx, ψIII(x) = Feikx. (15.7)

– 72 –

Now, we have four equations: matching ψ and ψ′ both at x = 0 and at x = L:

A+B = C +D, (15.8a)

ik(A−B) = −κ(C −D), (15.8b)

Ce−κL +DeκL = FeikL, (15.8c)

−κ(Ce−κL −DeκL) = ikFeikL. (15.8d)

Eliminate C from the first two, the middle two, and the last two equations:

(κ+ ik)A+ (κ− ik)B = 2κD, (15.9a)

ik(A−B)e−κL + κFeikL = κD(eκL + e−κL), (15.9b)

2κDeκL = (κ+ ik)FeikL. (15.9c)

Eliminate D:

(κ+ ik)AeκL + (κ− ik)BeκL = (κ+ ik)FeikL, (15.10a)

ik(A−B)e−κL + κFeikL = (κ+ ik)FeikLeκL + e−κL

2. (15.10b)

Eliminate B and solve for F/A. After a lot of algebra,

F

A= − 4ikκeikL

(κ− ik)2eκL − (κ+ ik)2e−κL. (15.11)

After a lot more algebra, and plugging in the expressions for k and κ in terms of E

and V0, one finds the transmission coefficient

T =

∣∣∣∣FA∣∣∣∣2 =

[1 +

(mV0

~2

)2 sinh2(κL)

kκ

]−1

. (15.12)

This is the tunneling probability and it is not zero! It has the correct behavior that

T → 0 as V0 →∞ or L→∞.

15.10. Wavefunction Shapes

Below is picture of an infinite potential well with a non-flat bottom. Explain your answers

to the following questions.

(a) For some arbitrary allowed energy, E, rank positions A, B and C by the classical

kinetic energy of the particle at these positions from largest to smallest.

(b) Repeat for de Broglie wavelength.

(c) Repeat for the amount of time a classical particle spends traversing an interval of width

δx at each position.

– 73 –

(d) Repeat for the spacings between the zeros of the wavefunction in the regions near each

point. Assume that the energy level is sufficiently high that the wavefunction oscillates

many times between the two walls.

(e) Repeat for the amplitude of the wavefunction in the region near each point.

(f) Sketch a plausible wavefunction for some high energy level.

SOLUTION:

(a) B > C > A, since K = E − V .

(b) A > C > B, since K ∝ p2 and p ∝ λ−1.

(c) A > C > B, since the particle moves slower where it has less kinetic energy.

(d) A > C > B; same as (b).

(e) A > C > B; same as (c).

(f) We want the amplitude and wavelength to get slightly larger near the sides.

CAUTION: Thanks to Yufan for bringing the following to my attention. I said that

amplitudes and wavelengths tend be smaller in regions where the difference between

E and V is bigger. The statement about the wavelength is certainly correct. However,

the statement about the amplitude only holds for bound states; it does not hold for

plane wave states. Our intuitive arguments above for the amplitude technically require

that the particle be going back and forth many many times. This is only the case

for bound states. Then, it certainly is true that given two regions of space of the

same size, the particle is less likely to be found in the region in which it is traveling

faster. Therefore, the amplitude will tend to be smaller in those regions. However, for

– 74 –

one-dimensional scattering problems, where you send in a plane wave on the left and

then study the reflected and transmitted waves, this argument doesn’t really hold. The

particle has one pass; it does not go back and forth. So for example, the amplitude of

the transmitted wave for a step barrier is smaller than the amplitude of the incident

wave even though the wavevector, and therefore the speed, is smaller in the transmitted

region.

16. Final Exam Solutions

16.1. The Pole Vaulter Paradox

A pole vaulter is running with a pole at v =√

32 c. Her pole has a proper length of L. She

runs into a barn with proper length L2 with doors on the front and back. When the pole

vaulter runs into the barn, a farmer tries to close both front and back doors at the same

time, but only for an instant, and then reopens them.

(a) What is the length of the pole from the farmers perspective? What is the length of the

barn from the pole vaulter’s perspective? From the farmer’s perspective can he close

the barn doors at the same time? [15 pts]

(b) Are the doors closed at the same time for the pole vaulter? What is the expression

for the time interval of the door closings in the pole vaulters frame? What is the

interpretation of the sign of the expression? [10 pts]

(c) In the pole vaulters frame give an expression for what the time interval would have to

be to avoid an accident. Comparing the answers of (b) and (c), is there an accident?

[5 pts]

SOLUTION:

(a) The γ factor associated with the speed v =√

32 c is

γ =1√

1−(vc

)2 =1√

1− 34

= 2.

The length of the pole from the farmer’s perspective is

pole length to farmer =L

γ=L

2.

The length of the barn from the pole vaulter’s perspective is

barn length to pole vaulter =L/2

γ=L

4.

From the farmer’s perspective he can close the barn doors at the same time , neglect-

ing the fact that the pole is exactly the same length as the barn from his perspective

and neglecting timing and reaction time issues. In other words, there is one instant in

time in the farmer’s frame of reference when the pole is entirely within the barn.

– 75 –

(b) No, the doors are not closed at the same time for the pole vaulter . Let ∆t and ∆x

be the time interval and the spatial distance in the reference frame of the barn and

farmer between the two events (back door closing, then front door closing). Let ∆t′

and ∆x′ be the corresponding intervals in the reference frame of the pole vaulter.

In the reference frame of the barn and farmer, the two events are simultaneous and

are separated in space by the proper length of the barn:

∆t = 0, ∆x =L

2.

The invariant interval is

(∆s)2 = (c∆t)2 − (∆x)2 = 0−(L

2

)2

= −L2

4. (16.1)

In the reference frame of the pole vaulter, the two events are separated in space by the

proper length of the pole:

∆x′ = L.

Therefore,

(∆s)2 = (c∆t′)2 − (∆x′)2 = (c∆t′)2 − L2. (16.2)

Setting Eqns. (16.1) and (16.2) equal and solving for ∆t′ gives

∆t′ =

√3

2

L

c. (16.3)

We could have also used a Lorentz transformation:

c∆t′ = γc∆t+ βγ∆x = 0 +

√3

22L

2=

√3

2L =⇒ ∆t′ =

√3

2

L

c.

Our definition for ∆t′ means that if it is positive, then the back door closes before the

front door closes . This makes sense, of course, since the front of the pole reaches the

back of the barn before the back of the pole reaches the front of the barn.

(c) In the pole vaulter’s reference frame, when the front of the pole aligns with the back

of the barn, the length of the pole that is inside the barn is just the length of the barn

as measured by the pole vaulter, which is L4 . Therefore, the length of the pole outside

of the barn is 3L4 . Therefore, the minimum time interval between the back door of the

barn closing and the front door closing to avoid an accident is the time it takes for the

front of the barn to travel the remaining distance 3L4 to the front of the pole. This

time is

∆t′min =3L/4√3 c/2

=

√3

2

L

c. (16.4)

The time (16.3) is just equal to (16.4). Therefore, an accident is just about avoided .

– 76 –

16.2. Pion Decay

A positive pion decays into a muon and a neutrino, π+ → µ+ + ν. The pion rest mass

mπ = 140 MeV/c2, the muon rest mass is mµ = 106 MeV/c2, but the neutrino has a mass

mν ≈ 0. Assume that the pion starts off at rest.

(a) Using conservation of relativistic momentum and energy, find an expression for the

momentum of the muon that depends only on mπ and mµ. [10 pts]

(b) Show that the following expression is correct. [20 pts]

u

c=

(mπ/mµ)2 − 1

(mπ/mµ)2 + 1.

SOLUTION:

(a) Let pµ and pν be the magnitudes of the momenta of the muon and neutrino, respec-

tively. Since the pion starts off at rest, the initial momentum is zero. Therefore,

conservation of momentum implies that

pµ = pν . (16.5)

Since the neutrino is taken to be massless,

Eν =√p2νc

2 +m2νc

4 = pνc = pµc, (16.6)

where we plugged in (16.5) to get the final equality.

Energy conservation reads

mπc2 = Eµ + Eν =

√p2µc

2 +m2µc

4 + pµc. (16.7)

Isolating the square root on one side and squaring gives

��p2µc

2 − 2mπc3pµ +m2

πc4 =��p2µc

2 +m2µc

4.

Solving for pµ gives

pµ =m2πc

4 −m2µc

4

2mπc3=

[(mπ

mµ

)2

− 1

]mµ

2mπmµc . (16.8)

– 77 –

(b) The energy of the muon is

Eµ =√p2µc

2 +m2µc

4

=

√[(mπ

mµ

)2

− 1

]2( mµ

2mπ

)2

m2µc

4 +m2µc

4

=mµ

2mπmµc

2

√[(mπ

mµ

)2

− 1

]2

+

(2mπ

mµ

)2

=mµ

2mπmµc

2

√(mπ

mµ

)4

− 2

(mπ

mµ

)2

+ 1 + 4

(mπ

mµ

)2

=mµ

2mπmµc

2

√[(mπ

mµ

)2

+ 1

]2

=

[(mπ

mµ

)2

+ 1

]mµ

2mπmµc

2. (16.9)

Let u be the speed of the muon in the rest frame of the pion. Let β = uc and γ be the

associated gamma factor. Then,

Eµ = γmc2, pµ = γmu = βγmc.

Therefore,

u

c= β =

pµc

Eµ=

(mπ/mµ)2 − 1

(mπ/mµ)2 + 1.

– 78 –

Documents

Physics 7C Spring 2015 Discussion Section Notes · Physics 7C Spring 2015 Discussion Section Notes Kevin T. Grosvenora;b aBerkeley Center for Theoretical Physics and Department of