Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Physics 7C Spring 2015 Discussion Section Notes
Kevin T. Grosvenora,b
aBerkeley Center for Theoretical Physics and Department of Physics
University of California, Berkeley, CA, 94720-7300, USA
bTheoretical Physics Group, Lawrence Berkeley National Laboratory
Berkeley, CA 94720-8162, USA
Abstract: Some discussion section notes for Physics 7C.
Contents
1. Vectors 1
2. The Wave Equation 5
3. Solving the Wave Equation 7
3.1. Electromagnetic Plane Waves 10
4. Poynting Vector and Flux 11
4.1. Red Laser Pointer 12
5. Ray Tracing Diagrams for Mirrors 13
6. Ray Tracing Diagrams for Lenses 14
7. Compound Optical Systems 16
7.1. Two-Lens Problem 16
7.2. Two-Lens Demonstration 17
8. Midterm 1 Quiz 22
9. Interference 25
9.1. Laser Wavelength Measurement via Metal Ruler 25
10.Thin-Film Interference 27
11.Relativity 28
11.1. How to Measure the Length of a Moving Object 28
11.2. Relativistic Train 30
11.3. Passing Trains 32
12.Midterm 2 Quiz 36
13.Energy and Momentum 40
13.1. 4-Vectors 40
13.2. Colliding Photons 42
14.Quantum Mechanics 45
14.1. The Wacky World of the Double Slit 45
14.2. Blackbody Radiation and the Ultraviolet Catastrophe 47
14.3. Stephan-Boltzmann Law 48
14.4. Bohr Model 49
14.5. Time-Evolution in 1D Infinite Square Well 53
– i –
15.Final Review 57
15.1. Human Eye Optics 57
15.2. Optical Fiber 58
15.3. Modified Michelson Interferometer 59
15.4. Diffraction Grating 61
15.5. Optical Spectroscopy 62
15.6. Relativity and Current-Carrying Wires 64
15.7. Pi Decay 66
15.8. Relativistic Doppler Effect 69
15.9. Quantum Tunneling and Frustrated Total Internal Reflection 70
15.10.Wavefunction Shapes 73
16.Final Exam Solutions 75
16.1. The Pole Vaulter Paradox 75
16.2. Pion Decay 77
1. Vectors
For our purposes, a vector will be something that has several components (usually three;
or four in relativity). We must be able to add two vectors (component by component)
and we must be able to multiply a vector by a real number. There are a slew of other
requirements, but they are usually trivially satisfied, at least for the main vector space
we will care about: R3, or Rn for general dimension. The symbol Rn means the set of
all n-component expressions, A = (A1, A2, . . . , An), such that each component is a real
number (i.e. Ai ∈ R for i = 1, . . . , n.)
In three-dimensional space, let us replace x, y, z with x1, x2, x3 in order to make it
easier to generalize to any dimension. We often denote the unit vector in the direction of
xi by xi, whose components are all zero except for the ith one, which is a one. Then, A
may be written, in a general dimension, n,
A =n∑i=1
Aixi. (1.1)
So that we don’t have to keep writing summation signs everywhere, we will usually fol-
low Einstein’s convention that repeated indices are summed over, unless stated otherwise.
Then, (1.1) becomes neater:
A = Aixi. (1.2)
The dimensionality is nowhere to be found now, so you must make sure you know what it
is from context.
– 1 –
Next, we introduce the Kronecker delta symbol:
δij =
{1 if i = j,
0 if i 6= j.(1.3)
Rn has what’s called an inner product structure. This is a map (·, ·) : Rn×Rn → R. That
is, you take two vectors, put one in the first slot and the other in the second slot of (·, ·),and you will get a real number, called their inner product. It is also often called their dot
product, especially in three dimensions, and we will denote it by A ·B. It is defined by
A ·B = δijAiBj = AiBi, (1.4)
where you need to keep in mind that repeated indices are summed over.
In addition, R3 has a special structure called a cross product. This is given by a map
(· × ·) : Rn ×Rn → Rn, so it takes two vectors and spits out another vector.1 In order to
talk about cross products, we must introduce the Levi-Civita symbol, and to do that, we
must understand cyclic indices. There is a mnemonic for this. Think of a clock that goes
from 1 through 3 instead of 1 through 12. Starting at any of the numbers, if you traverse
the clock in a clockwise fashion, then the order is declared to be cyclic. If you traverse the
clock in the counter-clockwise direction, then the order is anti-cyclic. So, (123), (231) and
(312) are cyclic, whereas (132), (213), (321) are anti-cyclic. By the way, and for example,
(231) is the permutation that sends 1 (the first slot) to 2 (the first number appearing), 2
to 3 and 3 to 1.
The Levi-Civita symbol is defined to be
εijk =
1 if (ijk) is cyclic,
−1 if (ijk) is anti-cyclic,
0 if any index is repeated.
(1.5)
Roughly speaking, the Levi-Civita symbol is to the cross product what the Kronecker delta
is to the dot product:
A×B = εijkxiAjBk. (1.6)
Let us take a moment to ensure that this definition of the cross-product agrees with the
definition of the cross-product that we are likely to have learned before, namely
A×B = (AyBz −AzBy)x + (AzBx −AxBz)y + (AxBy −AyBx)z. (1.7)
To aid in comparison, let us first rewrite (1.7) in terms of our new notation where x is
replaced with x1, and y with x2 and so on. Also, the x-component is the 1-component,
the y-component is the 2-component and so on. Then,
A×B = (A2B3 −A3B2)x1 + (A3B1 −A1B3)x2 + (A1B2 −A2B1)x3. (1.8)
1Actually, the result of a cross product is what’s called a pseudovector, but no matter.
– 2 –
Now, let us expand out the right hand side of (1.6) to see that it is in fact the same as the
right hand side of (1.8):
εijkxiAjBk = ε123x1A2B3 + ε132x1A3B2 + ε231x2A3B1 + ε213x2A1B3
+ ε312x3A1B2 + ε321x3A2B1
= x1A2B3 − x1A3B2 + x2A3B1 − x2A1B3 + x3A1B2 − x3A2B1. (1.9)
Here, we used ε123 = ε231 = ε312 = 1 and ε132 = ε213 = ε312 = −1. Now, it is quite easy to
see that (1.8) is the same as (1.9) just organized slightly more neatly.
Okay, so far all we have done is introduce a bunch of notation in order to express the
dot product and the cross product more compactly. However, this notation is actually
useful once we have to deal with multiple products (like multiple cross-products). For such
purposes, the following identity is very useful:
εijkεi`m = δj`δkm − δjmδk`. (1.10)
This identity is actually reasonably easy to understand. Remember that i, j and k have
to take on different values or else the Levi-Civita symbol would be zero anyway (also i, `
and m have to take on different values). Well, since they can only take on three different
values, namely 1, 2 or 3, either j is equal to ` and k is equal to m or ` is equal to m and
k is equal to `. There just aren’t any other possibilities! That’s what the right hand side
of the equation says, except for the signs, which you can figure out by plugging in some
specific set of values for the indices, say i = 1, j = ` = 2 and k = m = 3, and then try
i = 1, j = m = 2 and k = ` = 3.
In fact, this can be extended to any dimension. In n+ 1 dimensions, we write
εii1···inεij1···jn =
∣∣∣∣∣∣∣∣∣∣δi1j1 δi1j2 · · · δi1jnδi2j1 δi2j2 · · · δi2jn
......
. . ....
δinj1 δinj2 · · · δinjn
∣∣∣∣∣∣∣∣∣∣, (1.11)
where the vertical lines surrounding the matrix means “take the determinant”.
The identity (1.7) will allow you to deal with situations involving multiple cross prod-
ucts. A number of very important vector identities can be proven using these formulas.
For example, let us prove the “BAC-CAB” rule:
A× (B×C) = B(A ·C)−C(A ·B). (1.12)
Of course, you could prove this identity component by component by literally expanding
both sides out completely. It will take some time and is tedious, but should be pretty
straightforward. On the other hand, it is quite easy to prove in our new notation. Define
D = B×C = εijkxiBjCk. (1.13)
Of course, as in (1.2), we can write D in terms of its components:
D = Dixi. (1.14)
– 3 –
Comparing (1.13) with (1.14) gives us the components of D in terms of the components of
B and C:
Di = εijkBjCk. (1.15)
Getting back to the left hand side of (1.12), we have
A× (B×C) = A×D = ε`mix`AmDi. (1.16)
Notice that I have used `,m, i instead of the customary i, j, k. These are just dummy
indices, so it does not matter what you call them. The reason why I have written i last is
because that index is the same as the index on D and I have already written Di in (1.15),
so might as well keep that index as i. The reason why I have used ` and m instead of j and
k is because j and k already appear in (1.15), so I don’t want to use them again or else
I will get confused as to which pair of i’s are supposed to be summed over and, similarly,
which pairs of j’s are supposed to be summed over.
Plugging (1.15) into (1.16) gives
A× (B×C) = ε`mix`AmεijkBjCk = εijkε`mix`AmBjCk. (1.17)
Now, we see why (1.10) might be useful since we have a product of two Levi-Civita symbols
here with a pair of indices being summed over, namely the index i. However, we have a bit
of a problem: in (1.10) the index being summed over, namely i, is the first index of both
Levi-Civita symbols. In (1.17) it is the first index in one Levi-Civita symbol, but the last
in the other. No matter: we can always cyclically permute the indices of the Levi-Civita
symbol without changing it:
ε`mi = εi`m = εmi`. (1.18)
Think about it: this is just like rotating our clock 3-hour clock in the clockwise direction.
That doesn’t change anything. On the other hand, if you switch any two indices, you get
a minus sign:
ε`mi = −ε`im = −εim` = −εm`i. (1.19)
In any case, we can safely replace ε`mi with εi`m in (1.17). The result is
A× (B×C) = εijkεi`mx`AmBjCk. (1.20)
Now, we can use (1.10) directly:
A× (B×C) = (δj`δkm − δjmδk`)x`AmBjCk= xjAkBjCk − xkAjBjCk
= (Bjxj)(AkCk)− (Ckxk)(AjBj)
= B(A ·C)−C(A ·B). (1.21)
That’s the “BAC-CAB” rule proven!
There are some vector product identities in the back of the front cover of Griffiths’
E&M textbook. It would be great if you can try to prove some of those using these same
techniques. Also, to apply the above result, try to prove the following derivative identity:
∇× (∇×A) = ∇(∇ ·A)−∇2A, (1.22)
– 4 –
where ∇ is the gradient operator
∇ ≡ xi∂i ≡ xi∂
∂xi, (1.23)
and ∇2 is the Laplacian,
∇2 = ∇ · ∇ = ∂i∂i =3∑i=1
∂2
∂x2i
. (1.24)
Remember the names of these derivatives: ∇f is the gradient of a function f , and ∇ ·Ais the divergence of the vector function A, and ∇×A is the curl. The identity (1.22) will
be useful in deriving the wave equation that electromagnetic waves satisfy from Maxwell’s
equations.
2. The Wave Equation
We will now derive the wave equation satisfied by electromagnetic waves traveling through
vacuum. Maxwell’s equations in vacuum read
∇ ·E = 0, ∇×E = −B,
∇ ·B = 0, ∇×B = 1c2
E.(2.1)
To compare these to equations involving µ0 and ε0, remember that µ0ε0 = 1c2
.
Using the two equations above involving the curl, we find
∇× (∇×E) = ∇× (−B) = −∂t(∇×B) = −∂t(
1c2
E)
= − 1c2
E. (2.2)
Note that in the second step, I swapped the order of the curl and the time derivative. That
is perfectly fine - you are free to take derivatives in which order you please.
On the other hand, we can also use the vector derivative identity (1.22) applied to E
along with Maxwell’s equation ∇ ·E = 0:
∇× (∇×E) = ∇(∇ ·E)−∇2E = −∇2E. (2.3)
Setting (2.2) and (2.3) equal to each other derives the wave equation:
− 1c2
E = −∇2E =⇒ �E = 0. (2.4)
The differential operator, �, often called the d’Alembertian (or just “box”), is defined as
� ≡ − 1
c2
∂2
∂t2+∇2. (2.5)
The same derivation shows that B satisfies the exact same wave equation,
�B = 0. (2.6)
In fact, you see this type of wave equation all over the place where you expect to see waves
of some sort - the wave equation is not special to electromagnetic waves. The things that
– 5 –
change from situation to situation are the quantity that satisfies the wave equation and
propagation speed. For electromagnetic waves in vacuum, the electric and magnetic fields
satisfy the wave equation and the speed c is the speed of light in vacuum. For sound in
air, c would be the speed of sound in air and the quantity satisfying the wave equation
would be the displacement of air molecules along the direction of propagation of the sound
wave relative to their equilibrium position. For transverse waves on a string, c would be
the speed of those particular waves, and the quantity satisfying the wave equation is the
displacement of the string up or down transverse to the direction in which it is stretched.
The propagation speed is related to properties of the material through which the wave
is propagating. For electromagnetic waves in vacuum, the speed is related to the magnetic
permeability, µ0, and electric permittivity, ε0, of vacuum via c = 1/√µ0ε0. In fact, the
discovery of this relationship between the speed of light and the electromagnetic properties
of vacuum led Maxwell to the discovery that light is an electromagnetic wave.
Let us assume for simplicity that air is a diatomic ideal gas. Let Ψ(t, x) be the dis-
placement relative to equilibrium of air molecules as a function of time along the direction
x, which is the direction of propagation of the sound wave. Then, one can show that Ψ
satisfies the wave equation
�Ψ =
(− 1
c2
∂2
∂t2+
∂2
∂x2
)Ψ(t, x) = 0, (2.7)
where the speed of sound in air, c, is related to the temperature, T , of the air and the
mass, m, of the air molecule via
c =
√7kT
5m. (2.8)
Here k is Boltzmann’s constant.
For transverse waves on a string, let Ψ(t, x) be the up and down (transverse) displace-
ment of a point at position x along the string as a function of time. One can show that
Ψ(t, x) satisfies the exact same wave equation as (2.7), but where the propagation speed is
related to the tension (force per unit length), T , in the string and the mass per unit length,
µ, of the string via
c =
√T
µ. (2.9)
Aside: You may have learned Maxwell’s equations in integral form and with charges and
currents: ∮E · da =
Qenc
ε0, (2.10a)∮
B · da = 0, (2.10b)∮E · d` = −ΦB, (2.10c)∮B · d` = µ0Ienc +
1
c2ΦE . (2.10d)
– 6 –
The first two integrals are done over a closed surface, which is the boundary enclosing
some volume of space. Then, Qenc is the charge inside that volume of space. The last two
integrals are done over a closed loop, which is the boundary of some open surface. Then,
ΦE and ΦB are the electric and magnetic fluxes through that open surface (i.e., the surface
integral of the electric and magnetic fields over that open surface), and Ienc is the current
piercing that open surface. Let ρ and J be the volume charge and current densities. The
volume integral of ρ gives the total charge inside that volume, and the surface integral of
J gives the total current piercing that surface. Denote a volume of space by V and the
closed surface (or collection of closed surfaces) which is its boundary by ∂V . Denote an
open surface by S and the closed loop (or collection of closed loops) which is its boundary
by ∂S. Then, we can write Maxwell’s equations as∮∂V
E · da =
∫V
ρ
ε0d3x, (2.11a)∮
∂VB · da = 0, (2.11b)∮
∂SE · d` = −
∫S
B · da, (2.11c)∮∂S
B · d` =
∫S
(µ0J +
1
c2E
)· da. (2.11d)
Now, we can make use of the divergence theorem and Stokes’ theorem,∮∂V
E · da =
∫V
(∇ ·E) d3x,
∮∂S
E · d` =
∫S
(∇×E) · da, (2.12)
to write Maxwell’s equations as∫V
(∇ ·E) d3x =
∫V
ρ
ε0d3x, (2.13a)∫
V(∇ ·B) d3x = 0, (2.13b)∫
S(∇×E) · da = −
∫S
B · da, (2.13c)∫S
(∇×B) · da =
∫S
(µ0J +
1
c2E
)· da. (2.13d)
Since these equations hold for arbitrary V and S, we must have
∇ ·E =ρ
ε0, ∇×E = −B,
∇ ·B = 0, ∇×B = µ0J + 1c2
E.(2.14)
These are Maxwell’s equations in differential form. Then, Maxwell’s equations in vacuum
are simply these without any charges or currents: ρ = 0 and J = 0. By the way, you should
now be able to derive
�E = µ0J + 1ε0∇ρ, �B = µ0∇× J. (2.15)
Indeed, these reduce to the wave equation in vacuum when we set ρ = 0 and J = 0.
– 7 –
3. Solving the Wave Equation
For simplicity, let us try to solve the one-dimensional wave equation first. Then we can
generalize our solution to three dimensions. One-dimensional wave equations look like
(2.7), where there is only one spatial dimension, which we have called x. This equation
is often described as a “linear differential equation”. That may strike you as peculiar; if
anything, the equation looks quadratic. People often get around this confusion by saying
that the more “advanced” meaning of the word “linear” differential equation is that the
sum of two arbitrary solutions to the equation is itself a solution. Even though there is
nothing wrong with that statement, there is no real need for such a redefinition. If a
differential equation is truly linear then there must exist a change of variables that really
truly makes the equation look linear. For the one-dimensional wave equation, the change
of variables is
τ ≡ 12(x+ ct), σ ≡ 1
2(x− ct). (3.1)
One can write the derivatives with respect to the old variables in terms of the new ones.
Define ∂t = ∂∂t and ∂x = ∂
∂x as well as ∂τ = ∂∂τ and ∂σ = ∂
∂σ . Then,
1
c∂t =
1
c
∂τ
∂t∂τ +
1
c
∂σ
∂t∂σ =
∂τ − ∂σ2
, (3.2a)
∂x =∂τ
∂x∂τ +
∂σ
∂x∂σ =
∂τ + ∂σ2
. (3.2b)
We could also invert these relations:
∂τ = ∂x + 1c∂t, ∂σ = ∂x − 1
c∂t. (3.3)
Therefore, the one-dimensional d’Alembertian can be written as
− 1c2∂2t + ∂2
x = (∂x + 1c∂t)(∂x −
1c∂t) = ∂τ∂σ. (3.4)
So, we see that the one-dimensional wave equation (2.7), can be written as
∂τ∂σΨ = 0. (3.5)
Now it is clear that this is a linear differential equation - it is linear in τ and σ separately!
It is also very easy to solve now - either ∂τΨ = 0 or ∂σΨ = 0. Of course, you could have
both, but that just means Ψ is a constant, which is uninteresting, and does not actually
describe a traveling wave. In other words, Ψ can be an arbitrary function of τ , as long as
it does not depend at all on σ, or Ψ can be an arbitrary function of σ, as long as it does
not depend at all on τ . In general, Ψ can be a sum of these two things. Thus, the most
general solution can be written as
Ψ(t, x) = ΨL(τ) + ΨR(σ). (3.6)
The part of Ψ which just depends on τ is called “left-moving” (hence the subscript L) and
the part which just depends on σ is called “right-moving” (hence the subscript R).
– 8 –
An interesting set of solutions are called plane wave solutions:
ΨL(τ) = A sin(2kτ + ϕ), ΨR(σ) = A sin(2kσ + ϕ). (3.7)
Here, |A| is the constant amplitude, k the constant wave number, and ϕ the constant phase
shift. I do not mean here that the left-moving part and the right-moving part of the general
solution have to have the same amplitude, wave number and phase shift - that certainly
need not be the case! If you want, you can put a subscript L or R on A, k and ϕ. I haven’t
done so because it will clutter these expressions unnecessarily. In general, one can treat
the left- and right-moving parts completely independently of each other.
As far as solving the wave equation is concerned, there is absolutely nothing special
about the sine function. The utility of these “plane wave” solutions is that any arbitrary
solution to the wave equation can be written as some sort of superposition of plane wave
solutions. Therefore, we don’t actually lose any generality by focusing only on these special
solutions.
In terms of the old time and position variables, (3.7) reads
ΨL(t, x) = A sin(kx+ ωt+ ϕ), ΨR(t, x) = A sin(kx− ωt+ ϕ), (3.8)
where
ω = kc. (3.9)
The wave number, k, is related to the wavelength via k = 2πλ . The angular frequency is ω,
which is related to the frequency, ν, via ω = 2πν. The relation (3.9), which relates ω and
k, is in general called a dispersion relation.
We could also have used cosines instead, but cosines and sines are related by a π/2
phase shift. So, with an arbitrary phase shift, ϕ, including cosines would be redundant.
Actually, when we increase the number of space dimensions from one, it becomes more
convenient to fix the sign of the ωt term to be negative and allow k to be positive or
negative:
Ψ(t, x) = A sin(kx− ωt+ ϕ). (3.10)
With this convention, we should write |k| = 2πλ and ω = |k|c because it is possible for k
to be negative. If k is positive, then this wave is propagating in the +x direction (right-
moving) and if k is negative, then it is moving in the −x direction (left-moving). For
example, below, I have graphed the Gaussian pulse function e−k2(x+ct)2 over x and for
various values of time from t = 0, 4, 8, 12. I have set k = 1 and c = 1 just for convenience.
You can see that the Gaussian pulse moves to the left over time. Indeed, e−k2(x+ct)2 is a
function of x+ ct, or of τ , which is the so-called left-moving coordinate, but not of x− ct,or of σ, which is the so-called right-moving coordinate.
-15 -10 -5 0
0.2
0.4
0.6
0.8
1.0
– 9 –
-15 -10 -5 0
0.2
0.4
0.6
0.8
1.0
-15 -10 -5 0
0.2
0.4
0.6
0.8
1.0
-15 -10 -5 0
0.2
0.4
0.6
0.8
1.0
The generalization of (3.10) to higher dimensions is
Ψ(t,x) = A sin(k · x− ωt+ ϕ), (3.11)
where the wavenumber, k, becomes a wavevector, k and x = (x, y, z) = xx = yy+ zz. The
direction of k, which is k ≡ k|k| , is the direction of propagation of the wave.
The cosine and sine functions may be written in terms of complex exponentials:
cosx =eix + e−ix
2, sinx =
eix − e−ix
2i. (3.12)
The inverse relationships are
e±ix = cosx± i sinx. (3.13)
Therefore, instead of considering plane waves of the form (3.11), it is often computationally
simpler to consider the form
Ψ(t,x) = Aei(k·x−ωt+ϕ). (3.14)
In this form, derivatives of plane waves simply turn into multiplication:
Ψ = −iωΨ, ∇Ψ = ikΨ. (3.15)
3.1. Electromagnetic Plane Waves
Let us consider plane wave solutions to the wave equation satisfied by E and B:
E = E0 ei(k·x−ωt+ϕ), B = B0 e
i(k′·x−ω′t+ϕ′). (3.16)
Here, |E0| and |B0| are the amplitudes of the electric and magnetic fields, respectively.
Since the fields separately satisfy the wave equation, at this point, there is no need for
their wavevectors, frequencies and phase shifts to be the same. That is why we have put
– 10 –
primes on those quantities in the magnetic field. However, Maxwell’s equations imply that
they are actually the same. Mawell’s equations read
k ·E0 ei(k·x−ωt+ϕ) = 0, (3.17a)
k′ ·B0 ei(k′·x−ω′t+ϕ′) = 0, (3.17b)
k×E0 ei(k·x−ωt+ϕ) = ω′B0 e
i(k′·x−ω′t+ϕ′), (3.17c)
k′ ×B0 ei(k′·x−ω′t+ϕ′) = − ω
c2E0 e
i(k·x−ωt+ϕ). (3.17d)
These equations must hold for all t and x. Either one of the last two equations immediately
implies that
k′ = k, ω′ = ω, ϕ′ = ϕ. (3.18)
Now, we can write Maxwell’s equations more simply as
k ·E = 0, k×E = ωB,
k ·B = 0, k×B = − ωc2
E.(3.19)
We have already noted that the wavevector, k, points in the direction of propagation of
the wave. Maxwell’s equations in the above form say that E and B are perpendicular to k.
Thus, electromagnetic waves are transverse waves. In addition, E and B are perpendicular
to each other with directions related via
E×B ∝ k, (3.20)
and magnitudes related via
|k||E| = ω|B| =⇒ |E| = c|B|. (3.21)
4. Poynting Vector and Flux
Suppose you had a pipe with cross-sectional area, A, and through which water of mass
density, ρ, is flowing at a speed, v. A reasonable question to ask would be “How much
water (i.e. mass) is passing through the pipe in a given amount of time?” If we divide
this quantity by the cross-sectional area through which the water flows, then we get the
flux (of water mass) with units of massarea·time . Well, in a given time, ∆t, the water travels a
distance ∆x = v∆t. This means that all the water a distance less than or equal to ∆x to
the left of a particular point along the pipe will pass through that point in the given time
∆t. The volume of this region is A∆x = Av∆t and so the mass of water contained in this
region is ρAv∆t. This is the total mass that passes through the area A in a time interval
∆t. Therefore, the flux is just this divided by the area, A, and the time interval, ∆t:
Φ = ρv. (4.1)
Let us make a formal analogy in the case of electromagnetic waves. In this case, we would
like to measure the energy flux (how much energy flows per unit area per unit time). For
– 11 –
the water example, when we wanted mass flux, we multiplied mass density by speed, as in
Eqn. (4.1). Therefore, if we want the energy flux, we need to multiply the energy density
by the speed. The speed of the electromagnetic wave is just c. The electric and magnetic
fields carry energy density given by
uE = 12ε0|E|
2, uB = 12µ0|B|2. (4.2)
The total energy density is just the sum of these two. Using Eqn. (3.21), we may write
u ≡ uE + uB = 1µ0c|E||B|. (4.3)
Therefore, the energy flux is
Φ = 1µ0c|E||B|c = 1
µ0|E||B|. (4.4)
Since E and B are perpendicular to each other, we could write Φ = 1µ0|E ×B|. Since we
also know that E×B ∝ k, which is the direction of propagation of the wave, we define the
Poynting vector,
S ≡ 1µ0
E×B, (4.5)
whose magnitude is simply the energy flux and whose direction is the propagation direction.
Since the momentum and energy of light are related via p = E/c, the momentum flux
is
P ≡ S/c. (4.6)
The magnitude, P, of this is the rate at which momentum is passing through some area,
per unit area. If it is perfectly absorbed by a surface, then it is equal to the radiation
pressure exerted on that surface. If it is perfectly reflected back by a surface, then the
radiation pressure exerted on the surface is twice as big.
4.1. Red Laser Pointer
The output of a red laser pointer (λ = 635 nm) has a beam power of 10.0 mW and a beam
diameter of 1.00 mm. It is propagating in vacuum in the +x direction and is polarized
in the y direction. Write down an expression for the electric and magnetic fields in the
beam and the Poynting vector as a function of time and position. If this beam illuminates
a surface that absorbs 40% and reflects 60%, find the net force on the surface due to ra-
diation pressure. [Assume uniform irradiance across the beam’s cross-section. Note: the
polarization is the direction of the electric field with +y and −y directions being counted
as the same.]
SOLUTION:
The wavevector is k = 2πλ x ≈ (107 rad/m)x. It is in the x direction because that is
the direction of propagation. Therefore, k · x =(107 rad/m
)x. The angular frequency is
related to the wavevector via ω = |k|c ≈ 3 × 1015 rad/s. Since the polarization is in the
y direction, the amplitude of the electric field is E0 = E0y. Now, E ×B ∝ k = x, which
– 12 –
implies that B ∝ z, or B0 = B0z. Since |B| = |E|/c, we can also write B0 = 1cE0z. The
amplitude of the Poynting vector is S0 = 1µ0
E0 ×B0 = 1µ0cE2
0 x = ε0cE20 x. As usal, since
E and B oscillate identically in time, the average Poynting vector is half of its amplitude:
S = 12ε0cE
20 x. The irradiance is simply the magnitude of the average Poynting vector:
I = |S| = 12ε0cE
20 . This is equal to the power per unit area:
1
2ε0cE
20 = I =
P
A=
P
πr2,
where r is the radius of the beam.
Solving for E0 and using ε0 = 8.9× 10−12 J ·V−2 ·m−1 gives
E0 =
√2P
πr2ε0c= 3.10× 103 V
m.
Note that r = 5× 10−4 m (half the diameter). We also therefore have
B0 =E0
c= 1.03× 10−5 T.
Here, T stands for Teslas, which is the metric unit for magnetic fields.
Therefore, the electric and magnetic fields are
E = (3.10× 103 V/m)y cos[(107 rad/m)x− (3× 1015 rad/s)t+ φ
]B = (1.03× 10−5 T)z cos
[(107 rad/m)x− (3× 1015 rad/s)t+ φ
]There is an arbitrary phase, φ, which we can dial to whatever we want depending on when
we choose t = 0 to be. Since φ is arbitrary, you could have used sines instead of cosines.
You could also have used the complex exponential form, if you wish, as long as you keep
in the back of your mind that the actual fields are the real parts or the imaginary parts
since the fields can’t be complex.
Recall that half of the amplitude of the Poynting vector is equal to the irradiance,
P/πr2 = 1.27× 104 W ·m−2. Thus,
S = (2.55× 104 W/m2)x cos2[(107 rad/m)x− (3× 1015 rad/s)t+ φ
].
The average of the cos2 term is just 1/2. The average momentum flux is
P = S/c = S0/2c = (4.24× 10−5 N/m2)x.
By momentum conservation, if this is absorbed by a surface, then this momentum is trans-
ferred to the surface. If it is perfectly reflected, then the momentum flux of the beam after
the reflection is −P . Conservation of momentum implies that 2P must be transferred to
the surface so that the total is still P . That is, the reflected case gives twice as much
pressure as the absorbed case. Therefore, the radiation pressure on the surface is
0.4|P |+ 0.6|2P | = 1.6|P | = 6.79× 10−5 N/m2.
The force would be this pressure multiplied by the beam area:
F = (6.79× 10−5 N/m2)[π × (5× 10−4 m)2] = 5.33× 10−11 N .
– 13 –
5. Ray Tracing Diagrams for Mirrors
Consider a concave mirror whose radius of curvature is 10.0 cm. Draw ray-tracing diagrams
when the object is
(a) real and sits 20.0 cm in front of the mirror;
(b) real and sits 7.0 cm in front of the mirror;
(c) real and sits 2.0 cm in front of the mirror;
(d) virtual and sits 10.0 cm behind the mirror.
I’ll leave it to you to check that these diagrams agree numerically with the results of the
formulae 1di
+ 1do
= 2R and m = − di
do.
– 14 –
6. Ray Tracing Diagrams for Lenses
Consider a lens whose focal length has magnitude 10.0 cm. Draw ray-tracing diagrams for
the following scenarios:
(a) The object is real and sits 20.0 cm in front of the converging lens;
(b) The object is real and sits 20.0 cm in front of the diverging lens;
(c) The object is virtual and sits 5.0 cm behind the converging lens;
(d) The object is virtual and sits 5.0 cm behind the diverging lens.
Again, I leave it to you to check that these diagrams agree with the results of the formulae1di
+ 1do
= 1f and m = − di
do.
– 15 –
7. Compound Optical Systems
Compound optical systems just have more than one lens and/or mirror, called optical
elements. Light goes from one optical element to the next. The image of optical element
1 becomes the object for optical element 2; the image of optical element 2 becomes the
object for optical element 3; and so on. It is only in this case that a virtual object can arise
because the image of the previous optical element may very well lie behind the following
optical element. Keep in mind that if the system contains mirrors, it is possible for one
physical lens or mirror to play the role of multiple optical elements. For example, if you
have a lens in front of and positioned parallel to a mirror, then light can come from an
object on one side of the lens, go through the lens, hit the mirror, bounce back, and then
go through the lens again! In this case, the lens plays the role of optical elements 1 and
3, while the mirror plays the role of optical element 2. In principle, you can imagine with
multiple mirrors, you can even have one physical mirror playing the role of infinitely many
optical elements. If you have ever been in a house of mirrors, you know well how this can
come about and what it’s like.
7.1. Two-Lens Problem
You have two lenses whose focal lengths have magnitude 10.0 cm, one converging and one
diverging. You want to place an object 20.0 cm in front of the first lens in such a way as
to produce a final image which is real, upright and twice as large as the object. Where can
you place the lenses in order to do this, and must you place the original object right side
up or up side down? Where is the final image?
SOLUTION:
Part (d) of the previous section produces a linear magnification of 2. Part (a) of the
previous section produces a linear magnification of −1. Combined appropriately, these
could produce a linear magnification of −2. We want the image to be upright. With a
negative linear magnification, the object would have to be up side down if we are going to
combine parts (a) and (d) to achieve the objective.
So, the first lens will be the converging one, and the object sits 20.0 cm to the left of
this lens and up side down. The image of this lens sits 20.0 cm to the right of this lens, is
right side up and equal in size with the object.
This image becomes the object for the second lens. According to part (d), we want
this object to be virtual and 5.0 cm to the right of the diverging lens. Therefore, place the
diverging lens 15.0 cm to the right of the converging lens. The final image will be 10.0 cm
to the right of the diverging lens, which is 25.0 cm to the right of the converging lens, or
45.0 cm to the right of the original object.
– 16 –
The black ray goes through the vertex of the first lens. It heads towards the would-be
image of the first lens, but hits the second lens first and gets bent upwards towards the
final image. It is not one of the rays with simple rules for the second lens, but nevertheless
we know that it must end up at the final image.
The blue ray goes parallel to the axis first, hits the first lens, gets bent towards the
focal point of the first lens. It heads towards the would-be image of the first lens, but hits
the second lens first and gets bent upwards towards the final image. Again, this is not one
of the rays with simple rules for the second lens, but nevertheless we know that it must
end up at the final image.
The red ray goes through the secondary focal point of the first lens, hits the first lens
and comes out parallel to the axis. It heads towards the would-be image of the first lens,
but hits the second lens first and gets bent upwards towards the final image. This is one of
the rays with simple rules for the second lens: the red ray on the right of the second lens
looks like it went through the focal point (the square dot) of the second lens.
The cyan and green rays are the other two with simple rules for the second lens. We
have just continued them backwards since we know they must originate from the object.
The green ray corresponds to the black ray in part (d) of the previous section and the cyan
ray corresponds to the red ray in part (d) of the previous section.
7.2. Two-Lens Demonstration
The previous problem is a warm-up to begin understanding the “demonstration” that I
brought in to section involving one converging and one diverging lens. The converging lens
will be “Lens 1” and the diverging lens will be “Lens 2”. The focal length of the converging
lens is f1 = +15.5 cm and the focal length of the diverging lens is f2 = −15.0 cm. I asked
you to hold the converging lens at arm’s length away from you and look at some object
somewhat far away across the room. What you found was that the image that you see
is smaller than and up-side-down relative to the original object. You can easily see this
with a ray-tracing diagram. Consider what happens to the diagram in part (a) of Section
6 when you move the object further and further to the left of the lens. The path of the
blue ray does not change! However, for example, the black ray gets closer and closer to the
horizontal axis. Therefore, where the black and blue rays intersect approaches the primary
focal point from the right side. The image remains up-side-down relative to the object and
– 17 –
it gets smaller and smaller. I had you hold the converging lens at arms length because the
image of that lens becomes the object for the lens of your eye. Therefore, it is as if your eye
is looking at a small up-side-down object whose distance in front of you is roughly equal
to your arm length minus a bit more than the focal length of the converging lens. If you
were to hold the converging lens too close to your eye, the image it produces would be very
close to your eye and your eye would have to strain as much as it does when you try to
look at any object very close to your eye.
The other reason why I had you hold the converging lens at arm’s length is that I then
wanted you to place the diverging lens very close to the converging lens but closer to you.
The main difference between this set-up and the one in the previous two-lens problem is
that the image of the first lens is a bit further way from the second lens than the secondary
focal point of the second lens, whereas in the previous problem, the image of the first lens is
within the secondary focal point of the second lens. At this point, you know see an upright
image that is maybe a little larger than the original object and certainly much larger than
the image you saw with just the converging lens.
Then, I asked you to keep the converging lens fixed, but very slowly move the diverging
lens closer and closer towards your eye. You described the image you saw as getting larger
and larger. Then, at some point the image gets smaller and smaller and is up-side-down.
How can we make sense of this phenomenon?
Let’s do it mathematically first. Let the original object be “Object 1” and let it have
distance do1 relative to the converging lens, which is “Lens 1”. I asked you to look at an
object “far away”. But, what does that mean? Far away compared to what? The object
distance is a DISTANCE; it has units, namely meters. It can’t just be “big”, it has to be
big compared to something. In this case, we want it to be big relative to the focal length
of Lens 1. That is, we want do1 � f1. Therefore, it makes sense to define the ratio
ao1 ≡do1
f1. (7.1)
This number is positive because the original object is real (and so do1 > 0) and the first
lens is converging (and so f1 > 0).
A far-away object means large ao1, or ao1 � 1. I can now literally say “large ao1” with
impunity because ao1 has no units; it’s just a number. Similarly, define
ai1 ≡di1
f1. (7.2)
Then, the lens equation for the first lens reads
1
do1+
1
di1=
1
f1=⇒ 1
ao1f1+
1
ai1f1=
1
f1=⇒ 1
ao1+
1
ai1= 1. (7.3)
Solving for ai1 gives
ai1 =1
1− 1ao1
. (7.4)
Now, 1ao1� 1 since ao1 � 1. Therefore, we can Taylor expand the above result:
ai1 = 1 +1
ao1+
(1
ao1
)2
+ · · · . (7.5)
– 18 –
Since di1 = ai1f1, we see that the image produced by the converging lens is real (that is,
di1 > 0) and the image is located just a bit further away from the lens than one focal
length. The further away the original object is, the bigger ao1 is, the closer ai1 gets to 1
(from above), the closer the image of the first lens gets to its focal point.
To see that the image is up-side-down and small, calculate the transverse magnification:
m1 = − di1
do1= − ai1f1
ao1f1= − ai1
ao1= −
1ao1
1− 1ao1
= − 1
ao1−(
1
ao1
)2
− · · · . (7.6)
Indeed, m1 is negative and small if ao1 is large.
Now, we place the diverging Lens 2, with focal length f2 < 0, after Lens. We will write
f2 as −|f2| instead so that we never forget that it is actually negative! Note that the image
of Lens 1 may be behind Lens 2 (i.e., on the opposite side of Lens 2 as the side from which
the light is coming towards Lens 2). Define
δ ≡ distance between Lens 1 and Lens 2
f1. (7.7)
At the start of the demo, δ is small. Then,
do2 = δf1 − di1 = (δ − ai1)f1. (7.8)
Note that if δ < ai1, then do2 < 0, but if δ > ai1, then do2 > 0. That is, if the distance
between the two lenses is less than the distance of the image of Lens 1 from Lens 1, then
image 1 is a virtual object 2 for the second lens. However, if the distance between the two
lenses is greater than the distance of the image of Lens 1 from Lens 1, then image 1 is a
real object 2 for the second lens.
Again, define
ao2 ≡do2
|f2|= (δ − ai1)
f1
|f2|, ai2 ≡
di2
|f2|. (7.9)
A priori, we do not yet know whether the final image of Lens 2 is real or virtual. Therefore,
ai2 may be positive, in which case the image is real, or negative, in which case the image
is virtual. Then, the lens equation for the Lens 2 reads
1
do2+
1
di2=
1
f2=⇒ 1
ao2|f2|+
1
ai2|f2|=
1
−|f2|
=⇒ 1
ao2+
1
ai2= −1. (7.10)
Solving for ai2 gives
ai2 = − 1
1 + 1ao2
= − 1
1 + 1(δ− 1
1− 1ao1
)f1|f2|
. (7.11)
At this point, let us plug in some appropriate numbers,
f1 = 15.5cm, |f2| = 15.0cm, ao1 ≈ 50. (7.12)
– 19 –
The value ao1 = 50 corresponds to an object 50× 15.5 cm = 7.75 m away from Lens 1.
Then, (7.11) becomes
ai2 =1.02− δδ − 0.053
=15.8 cm− δf1
δf1 − 0.82 cm, (7.13)
where we multiplied numerator and denominator by f1 = 15.5 cm to get the final expression.
When δf1 < 0.82 cm, meaning that the two lenses are less than 0.82 cm apart, we have
ai2 < 0, which means that the final image of the two-lens system is virtual. Also, ai2 has a
fairly large magnitude, starting out as about −19.4 at δ = 0 (when the lenses are right on
top of each other) and going to −∞ as we increase the separation of the two lenses towards
0.82 cm. If we separate the lenses a both further, then ai2 becomes huge and positive;
this is a real image now very far away from the lenses. As we increase the separation, ai2
decreases until we reach a separation of 15.8 cm, at which point the image is technically
exactly at the location of the second lens. If we increase the separation even further, the
image becomes virtual again since ai2 becomes negative again.
The transverse magnification of Lens 2 is
m2 = − di2
do2= − ai2
ao2=
1
1 +(δ − 1
1− 1ao1
) f1|f2|
. (7.14)
The total transverse magnification is
m = m1m2 = −1ao1
1− 1ao1
+[(
1− 1ao1
)δ − 1
] f1|f2|
. (7.15)
Again, plugging in the numbers gives
m =0.020
0.053− δ=
0.31 cm
0.82 cm− δf1. (7.16)
Indeed, this agrees with our previous description of our observations. When the lenses are
very close together (δ ≈ 0), the image is upright (m > 0). As we increase the separation
towards 0.82 cm, the image gets bigger and bigger (m increases). Just beyond 0.82 cm
separation, m becomes huge and negative, and then m remains negative and approaches
zero as the separation is increased further.
To summarize, when the lenses are very close together, the image is virtual and pretty
far away in front of you. As you increase the separation of the lenses by moving Lens 2
towards you, the image grows and appears to move further away from you. As you cross
0.82 cm of separation, the image goes from infinitely large and upright infinitely far in
front of you to infinitely large and up-side-down infinitely far behind you. As you increase
the separation, the image remains up-side-down but gets smaller and smaller in size. At
a separation of 15.8 cm, the image goes from being real to virtual, but the still keeps on
getting smaller and remains up-side-down.
As a side note, an image in front of you is a real object for the lens of your eye. An
image behind you is a virtual object for the lens of your eye. While you cannot see a real
object behind you, it may be possible to see a virtual object behind you. A virtual object
behind you is nothing more than the image of optical elements in front of you. Please make
sure you understand this; ask us about it if this is unclear.
– 20 –
Now, let us try to understand this phenomenon using ray tracing diagrams. We have
already discussed what the diagram looks like for the converging lens, Lens 1. An up-side-
down and small image is formed a bit further away from Lens 1 than one focal length.
We draw this image as dashed because it is a virtual object for the diverging lens, Lens 2.
Initially, this virtual object is a bit further away from Lens 2 than one focal length. The
diagram might look like
As the lens is moved further to the right, the diagram might look like
Indeed, the image is upright further to the left and bigger. The image keeps moving further
to the left and increases in size until the virtual object sits right at the secondary focal
point of the lens. At this point, the outgoing light rays are exactly parallel and therefore
look like they are coming from a very large image infinitely far to the left:
Notice that the red light ray doesn’t really change as the lens approaches the virtual object.
The black ray just gets steeper and steeper. This continues as we move the lens even further
to the right. It is clear to see that that means that the black and red rays will actually
converge to the right of the lens, at real and up-side-down image. This image is at first
very large and very far to the right, but gets smaller and smaller and closer and closer to
the lens as we move the lens to the right. The second “funky” point that we discovered
earlier, where the image again changes from being real to virtual is the point when Lens 2
passes the image formed by Lens 1 and the second object goes from being virtual to real.
– 21 –
8. Midterm 1 Quiz
(1) An electromagnetic wave is propagating in the +z direction. At some time and point in
space, the electric field points in the y direction. In which direction does the magnetic
field point?
Answer: B ∝ −x at this point in space and at this time.
(2) Why are electromagnetic plane waves called “plane waves”? Explain with a drawing.
Answer: The electric field are identical (as are the magnetic field) at points on the
same plane perpendicular to the direction of motion of the plane wave.
(3) A cube of index of refraction n sits in air. A light ray inside the cube hits one face
and gets totally internally reflected. It then hits an adjacent face and also gets totally
internally reflected. Calculate the minimum possible value of n.
Answer: If θ is the angle of incidence on the first face, then θ′ = π2 − θ is the
angle of incidence on the second face. We need both θ and θ′ to be greater than or
equal to the critical angle, sin−1(1/n). Thus, sin θ ≥ 1n and sin θ′ ≥ 1
n . However,
sin θ′ = cos θ =√
1− sin2 θ ≤√
1−(
1n
)2. Therefore, 1
n ≤√
1−(
1n
)2. We can square
both sides of the inequality without changing the direction of the inequality since both
sides are manifestly positive:(
1n
)2 ≤ 1−(
1n
)2or 1
n ≤1√2. Inverting gives n ≥
√2 ≈ 1.4.
(4) If a lens is cut in half through a plane perpendicular to its surface, does it show only
half an image?
Answer: It still shows a full image, just a dimmer one.
(5) If your near-point distance is N , how close can you stand to a mirror and still be able
to focus on your image?
Answer: The image is virtual and is the same distance behind the mirror as you are
in front of the mirror. Therefore, you should stand no closer than N/2 from the mirror.
(6) When you open your eyes underwater, everything looks blurry. Explain.
Answer: Your eyes have an index of refraction roughly equal to that of water. There-
fore, if submerged in water, they cannot refract light and will not be able to focus light
rays to form real images on the retina.
(7) Would you benefit more from a magnifying glass if your near-point distance is 25 cm
or if it is 15 cm? Explain.
Answer: The angular magnification of a magnifying glass is M = N/f , where N is
the near point and f is the focal length of the lens. Therefore, the larger your near
point is, the more you can benefit from the magnifying glass.
(8) When you use a simple magnifying glass, does it matter whether you hold the object
to be examined closer to the lens than its focal length or farther away? Explain.
– 22 –
Answer: Yes it matters crucially. You must keep the object just within one focal
length of the lens in order to produce a large virtual image very far in front of your
eyes. If the object is beyond one focal length from the lens then a real image is produced
on your side of the lens likely behind you and therefore you will not be able to see it
clearly.
(9) Is the final image produced by a telescope real or virtual? Explain.
Answer: Virtual and far away in front of you.
(10) Two people are stranded on a deserted island. Both people wear glasses, though one
is nearsighted and the other is farsighted. Which person’s glasses should be used to
focus the rays of the Sun and start a fire? Explain.
Answer: Whoever has the converging lenses. A far-sighted person is able to converge
parallel light rays (coming from faraway objects) just fine, but is unable to focus
diverging light rays (from nearby objects) strongly enough to form a clear image at
the retina. Therefore, the far-sighted person needs converging lenses to help “beef up”
their eyes’ converging power. A near-sighted person has strongly converging eyes able
to converge the diverging light rays from nearby objects, but too strongly converges
parallel light rays from faraway objects. Therefore, the near-sighted person needs
diverging lenses to “handicap” their eyes’ converging power.
(11) You have two lenses: lens 1 with a focal length of 0.45 cm and lens 2 with a focal
length of 1.9 cm. If you construct a microscope with these lenses, which one should
you use as the objective? Explain.
Answer: You want the one with a shorter focal length to act as the eyepiece because
that acts as a magnifying glass on the image of the objective and because the angular
magnification of a magnifying glass is inversely proportional to its focal length. There-
fore, you want the 1.9 cm focal length lens to act as the objective lens and the 0.45 cm
lens to act as the eyepiece.
(12) Why is it restful to your eyes to gaze off into the distance?
Answer: I don’t know! But here’s some information on the matter. Most of the
refraction in your eye is actually performed by the cornea, which is a pocket of fluid at
the front of the eye which covers the lens and iris (Wikipedia says that 23 of the eye’s
refractive power comes from the cornea). As far as I know, nothing happens to the
cornea as our eyes adjust between looking at nearby and faraway things. They can be
reshaped temporarily or permanently, but by external methods. On the other hand,
the adjustments we make to clearly image objects at various distances are made to the
lens via ciliary muscles which are connected to the edge of the lens by tendons called
zonules. Muscles can only pull (i.e., contract). Pulling on the lens flattens it out and
reduces its converging power. This is what you want to do when looking at faraway
things since the light rays are reaching your eye basically parallel. Relaxing the ciliary
muscles a bit allows the lens to bulge a bit more at the center, which increases its
– 23 –
converging power. This is what you want to do when looking at nearby objects since
the light rays are diverging when they reach your eye. In fact, it happens that the
ciliary muscles are most contracted when looking at distant objects. So, why does
your eye feel relaxed when gazing off into the distance, which is precisely when your
ciliary muscles are most contracted? I don’t know! The best I can guess is that that’s
just the state that our eyes are used to. Also, a nonzero lever arm between the ciliary
muscles and the lens might account for this as well, but I don’t think there is one.
However, there is one thing that this helps us understand: the fact that our eyes
can only properly focus diverging light rays, not converging ones. To properly focus
already-converging light rays, we must decrease the converging power of our eye’s lens
even further compared to when we are looking at faraway things. Well that would
require the ciliary muscles to pull even harder on the lens. But, they can’t because
for some reason the eye is “designed” so that the ciliary muscles are most contracted
when looking at faraway things. In other words, we were not “designed” to see images
produced behind our eyes. I suppose that might make sense evolutionarily. I can’t
imagine an environmental stressor that would select that ability since lenses and such
are recent human inventions.
– 24 –
9. Interference
9.1. Laser Wavelength Measurement via Metal Ruler
Devise and explain a method for measuring the wavelength of a laser pointer chiefly using
a finely graded metal ruler (e.g. with 1/32 inch markings or smaller).
SOLUTION:
Consider reflecting the laser off of the ruler at a shallow angle. If there were no notches
on the surface of the ruler, then each point on the ruler where the light hits becomes a
source for outwardly spreading spherical waves. The superposition of these waves produces
wavefronts that travel in the direction of specular reflection. That is, on a far-away screen,
we get constructive interference only around the point of specular reflection, as expected.
Imagine we make wide notices with narrow reflective bands in between. Then, consider
the following diagram showing two adjacent light beams headed towards a far-away screen
(e.g. a wall) having reflected off of two adjacent reflective bands.
The optical path length difference is d(cosα − cosβ), which must be set equal to mλ for
some integer m for constructive interference. For a fixed α (angle at which we shine the
laser on the ruler), this gives discrete values for β where bright spots occur (i.e. we get a
diffraction pattern).
The claim is that we see the same thing if instead we have narrow non-reflective notches
with wider reflective bands. Can you think of why? Hint: superposition. This goes under
the name of Babinet’s principle, by the way.
Consider the following setup
This gives us an expression for the wavelength
λ =d
m
[ 1√1 + (s0/L)2
− 1√1 + (sm/L)2
].
– 25 –
If you make α very small and use only low orders, then we can assume that sm/L << 1:
λ ≈ d(s2m − s2
0)
2mL2.
As an example, when I did this experiment at home, I used the marks on the ruler that
were d = 0.5 mm apart and the distance to the wall was L = 105 cm. I found s0 = 10.5
cm and s1 = 11.9 cm. Assuming small angles, this gives
λ =(5× 10−4 m)[(11.9 cm)2 − (10.5 cm)2]
2× 1× (105 cm)2≈ 711 nm.
That’s not bad! The wavelength should be around 635 nm. Before we rejoice, however,
we should note that a millimeter difference in any sm makes a huge difference in the fi-
nal answer. For example, if I change s1 to 11.8 cm, I get λ = 657 nm! So, unless I can
measure sm and L with very high precision, the uncertainties are likely to swamp the final
measurement of λ anyway.
Note: In section, I claimed that if the laser light reflects off of a smooth metalic surface,
then we only get specular reflection. This is true only if the region on the surface that
is illuminated is much wider than the wavelength of the light. Well, our laser beam has
a width of a few millimeters, which will obviously do since its wavelength is on the order
of 10−4 mm! As calculated above, the extra optical path length travelled by one beam
relative to another that hits the surface a distance x to the left of it is x(cosα− cosβ). So,
the phase shift is φ = 2π xλ(cosα− cosβ). Let a be the width of the illuminated region and
let x run from −a/2 to a/2 with the “zero phase” corresponding to x = 0. The intensity
is proportional to
I ∝∣∣∣∫ a/2
−a/2e2πi x
λ(cosα−cosβ) dx
∣∣∣2 ∝ cos2A
A2,
where A = π aλ(cosα− cosβ).
In the limit a/λ << 1, the intensity becomes a delta function:
Ia/λ→0−−−−→ δ(A).
Thus, the intensity vanishes everywhere except when A = 0, or when cosα = cosβ, or
α = β, which is the condition for specular reflection!
– 26 –
10. Thin-Film Interference
A piece of paper is wedged between the ends of two sheets of glass. The setup is illuminated
at normal incided by cyan laser light (λ = 500 nm). Excluding the point where the two
glass sheets meet, you count about 400 dark interference fringes. Calculate the thickness
of the sheet of paper.
SOLUTION:
Let t be the thickness of the paper. Let ` be the length of the glass sheets. Let x be
the horizontal coordinate starting at 0 at the point where the two glass sheets meet and
increasing to the right up to `, the length of the glass sheets. Let t(x) be the thickness
of the air gap between the two glass sheets as a function of the coordinate x. By similar
triangles,t(x)
x=t
`=⇒ t(x) =
t
`x.
The two beams whose interference we care about are the ones shown below.
Of course, these rays are actually right on top of each other since the incidence is normal and
since the paper is presumably very very thin, any refraction at the interfaces is negligible.
Since ray 1 reflects off of a glass-air interface (high to low index of refraction), it does
NOT receive a π reflection phase shift. On the other hand, ray 2 does because it reflects
off of an air-glass interface (low to high index of refraction). Thus,
ϕref,1 = 0 and ϕref,2 = π =⇒ ∆ϕref = π.
We will set ϕpath,1 = 0 since the only difference in path between 1 and 2 is that 2 goes
through a thickness, t(x), twice whereas 1 does not. Thus,
ϕpath,1 = 0 and ϕpath,2 =2π
λ/nair2t(x) =
4t(x)π
λ=⇒ ∆ϕpath =
4t(x)π
λ.
Therefore,
∆ϕtot = ∆ϕpath + ∆ϕref =
(4t(x)
λ+ 1
)π.
For the interference to be destructive (dark fringe), we must have
∆ϕtot =
(4t(x)
λ+ 1
)π = (2m+ 1)π,
– 27 –
where m is an integer. Remember: odd numbers of π are destructive while even numbers
of π are constructive. Thus,
t(x) =t
`x =
mλ
2.
The maximum value of t(x) is t, which occurs when x = `. The maximum value of m is
400 according to the problem statement. Therefore,
t =mmaxλ
2=
400× 5× 10−7 m
2= 10−4 m = 0.1 mm .
11. Relativity
Newtonian mechanics is incorrect! Time is not just time is not just time! The passage of
time depends on your state of motion. How much time it takes to get from one point to the
next depends on the path you take. How big something is depends on its state of motion.
The order of events depends on the state of motion of the observer. All of these statements
might seem absurd, and they surely would do to most physicists before Einstein. However,
they are all direct consequences of two seemingly innocuous postulates often summarized
by the pithy statement that “the laws of physics are identical between all inertial reference
frames.” Actually, the statement that the speed of light is constant in all inertial reference
frames is already counter-intuitive because it implies that the speed of light is the same no
matter the state of motion of the source of that light. Certainly, the same cannot be said
of other projectiles like balls and bullets!
11.1. How to Measure the Length of a Moving Object
Before we explore some of the counter-intuitive consequences of the principle of special
relativity, we will need to know how to measure the length of a moving object. Suppose
an object (like a train) of unknown length is moving left-to-right relative to you and your
friends at some unknown speed. Can you devise a plan with your friends to measure the
length of the train?
Here is a method which some of you suggested in discussion section. You and one
of your friends synchronize clocks and decided to stand some fixed known distance apart
along the direction in which the train is moving such that the train reaches you before your
friend. You note the time when the front of the train passes you and when the back passes
you and your friend notes when the front of the train passes them. Then, you come back
together. You can determine the speed of the train via
speed of train =distance between you and your friend
time front passes friend− time front passes you. (11.1)
Then, you can determine the length of the train via
length of train = speed of train× (time back passes you− time front passes you). (11.2)
This requires you and your friend to synchronize your clocks. This is trickier than it may
seem at first. Remember that the passage of time depends on your state of motion and
– 28 –
certainly you and/or your friend have to be moving at some point if you first synchronize
your clocks when you are together and only then move apart! Even if you didn’t believe
in this relativity business, you must agree that your experimental method had better not
depend on your own prejudices, whether they be ultimately correct or incorrect.
Fortuitously, there is a simple way for you and your friend to synchronize clocks after
you are already apart and no longer moving relative to each other. You shine a light at
your friend at some time that you set to be “zero”. The light moves relative to either of
you at the same speed of about 3× 108 m/s. Since you know how far apart you are, when
your friend receives the light, he knows exactly how long ago your clock read “zero” and
he can set his clock appropriately.
Actually, we can line up you and millions of your friends (all with synchronized clocks)
along a line parallel and very close to the path of the train. Then, each one of you can
record the times when the front and back of the train pass you. If you then come together
at the end, you will find that the “front” times pair up between pairs of friends a distance
apart equal to the length of the train (as measured by you and your friends). The same
can be said of the “back times”. That is, you can pick one specific time and at that
time exactly one of you or your friends will have recorded the back passing them and one
will have recorded the front passing them. The distance between these two people is the
measured length of the train. This method allows us to measure the length of the train
without technically measuring the speed of the train first, even though the speed of the
train could easily be determined from this data as well. The following problem in the next
subsection will show why I prefer this method.
This discussion of synchronization of clocks brings up an interesting subtlety in state-
ments about relativity. Often questions are asked like “what do you observe?” or “what
does someone in such and such reference frame observe?” The “what” might be a time or
position or whatever else. Such statements are lazy and possibly misleading. Usually, what
is meant is not what is observed by one person at one instant in time, but rather when this
person can infer once he gathers time and position records from very many observers scat-
tered everywhere (technically, the limit of infinitely many such observers packed infinitely
closely). That is, you imagine a lattice of synchronized clocks everywhere in space and you
are asked what you can infer if you could take the readings from all of those clocks after
the process of interest is over. This way, you do not have to take into consideration the
finite amount of time it might take light from some event of interest to reach you, which
would drastically complicate matters.
So, for example, when you are asked what is the length of a moving object that is
observed by the person standing still, the question is really what would be measured by
the army of friends as described previously. The question is not asking what do you (the one
person standing still) actually see. That would be very different and far more complicated
because light from different points along the object take different amounts of time to reach
your eyes!
– 29 –
11.2. Relativistic Train
People at rest relative to a train measure the length of the train to be L0 (this is the
train’s so-called proper length). Alice stands at the back of the train, Bob at the front
and Charlie at the middle. They have synchronized their clocks relative to each other.
The train travels at a speed v relative to the platform where the Stationmaster stands.
These people all decide to set the origin of space and time to be when Charlie passes the
Stationmaster.
(a) At the moment C passes S, C turns on a lightbulb. What time will A and B read on
their clocks when they see the light?
(b) What is the length of the train as measured in the reference frame of S? (You are not
being asked to derive the result. Just take it as an assumption.)
(c) In the reference frame of S, at what time(s) does the light reach A and B. What does
this tell you about simultaneity?
(d) In the reference frame of S, what time(s) registers on the clocks of A and B when the
light reaches A and B, respectively? What does this tell you about the synchronization
of clocks.
(e) A and B hold up mirrors to reflect the light back to C. What time does C measure
when he sees the reflections? What time is measured in the S reference frame? What
does this tell you about the ticking rate of the clocks on the train as observed by the
S reference frame?
SOLUTION:
(a) Let S′ be the rest frame of the train, which is the same as the reference frames of A,
B and C. Time and space coordinates measured in this frame will likewise be primed.
S can also stand for the reference frame of the Stationmaster and coordinates in this
frame will be unprimed.
Event 0 is when C passes S and turns on the lightbulb. By agreement, the spacetime
coordinates of this event in either reference frame is identically zero:
(ct0, x0) = (ct′0, x′0) = (0, 0). (11.3)
Event 1 is when the light reaches A. Relative to A, the light must travel a distance
of L0/2. Therefore,
(ct′1, x′1) =
(L02 ,−
L02
). (11.4)
Event 2 is when the light reaches B. Similarly,
(ct′2, x′2) =
(L02 ,
L02
). (11.5)
That is, the light reaches A and B at the same time, L2c , as measured on the train.
– 30 –
(b) In reference frame S, the length of the train is contracted by a factor of γ:
L =L0
γ, where γ =
1√1− β2
and β =v
c. (11.6)
(c) In reference frame S, A actually moves towards the light that is headed towards her.
The speed of the light is still c (postulate 2). Therefore, the relative speed between A
and the light is c+ v = (1 + β)c. The distance that must be covered is not L0/2, but
L/2. Therefore, the time is
t1 =L/2
c+ v=⇒ ct1 =
L0/2
(1 + β)γ=
√1− β1 + β
L0
2. (11.7)
By the same argument, the relative speed between the light and Bob as measured in
S is c − v = (1 − β)c. The distance that must be covered is still L/2. Therefore, the
time is
t2 =L/2
c− v=⇒ ct2 =
L0/2
(1− β)γ=
√1 + β
1− βL0
2. (11.8)
Note that, even though ct′1 = ct′2 so that these two events (the light reaching Alice and
the light reaching Bob) occur simultaneously in the S′ reference frame, they do not
occur simultaneously in the S reference frame. In S, event 1 happens first, then event
2. The time difference is
c∆t ≡ ct2 − ct1 =
√1 + β
1− βL0
2−
√1− β1 + β
L0
2= βγL0. (11.9)
Events that are simultaneous in one reference frame may not be simultaneous in another
reference frame. This is “loss of simultaneity”.
(d) Whether it is Alice looking at her watch or someone at rest on the platform immediately
by Alice when the light reaches her, they must agree on what Alice’s watch reads at
this moment, which we have already determined is ct′1 = L0/2. The same can be
said of Bob’s watch. Therefore, the reference frame S observes Alice’s watch read
ct′1 = L0/2 when their own watch reads ct1 =√
1−β1+β
L02 , and they observe Bob’s watch
read ct′2 = L0/2 when their own watch reads ct2 =√
1+β1−β
L02 . It follows that the clocks
of A and B are no longer synchronized in the S reference frame! The clock at the
back of the train (Alice’s) is systematically ahead of the clock at the front of the train
(Bob’s). The offset is c∆t = βγL0. This is “loss of synchronicity”.
(e) In S′, after the light has reflected off of A or B’s mirror, it has to travel a further L0/2
distance before returning to C. Therefore, the lights reach C at the same time. Let us
call this event 3. Then,
(ct′3, x′3) = (L0, 0). (11.10)
– 31 –
As measured in reference frame S, the return trip of the light from B to C takes as
much time as it took for the light to go from C to A. The return trip of the light from
A to C takes as much time as it took for the light to from from C to B. Therefore,
the light comes back to C also at the same time, namely
ct3 = ct1 + ct2 = γL0. (11.11)
That is, between event 0, when C turns on the light, and event 3, when the light
returns to C, a time L0 has elapsed in S′ while a time γL0 has elapsed in S. This is
“time dilation”. S observes clocks in S′ tick more slowly than his own.
11.3. Passing Trains
This problem is a combination of several problems taken from “Introduction to Classical
Mechanics” by David Morin.
Charlie stands on a platform while Alice, in one train, and Bob, in another train, pass by
going in the same direction. Both trains have a proper length L. A’s speed is 4c/5, and
B’s speed is 3c/5. A starts out behind B.
(a) In C’s reference frame, how long does it take for A to overtake B (i.e., the time between
the front of A passing the back of B, and the back of A passing the front of B)?
(b) Same question, but in the reference frame of A.
(c) Same question, but in the reference frame of B.
(d) David moves from the back of B’s train to the front at a constant speed, such that he
coincides with both the even of the front of A passing the back of B and the back of
A passing the front of B. How long does the overtaking process take in D’s reference
frame?
(e) Verify that the interval between the two events E1 = front of A passes back of B, and
E2 = back of A passes front of B, is the same in all reference frames A, B, C and D.
SOLUTION:
(a) Let xiS and ctiS denote the position and time of event Ei (i = 1, 2) in some reference
frame S, which can be A, B, C or D. Set the origin of coordinates to be event E1:
x1A = x1B = x1C = x1D = ct1A = ct1B = ct1C = ct1D = 0. (11.12)
Let γSS′ be the gamma factor associated with the motion of reference frame S as
viewed by reference frame S′. By definition, γSS = 1, of course, for any S. The gamma
factors of A and B as viewed by C are
γAC =1√
1− (4/5)2=
5
3, γBC =
1√1− (3/5)2
=5
4. (11.13)
– 32 –
Let LAS and LBS be the length of A’s and B’s train, respectively, in the reference
frame S. By definition, LAA = LBB = L are the proper lengths. LAC and LBC are
length contracted by the appropriate gamma factor:
LAC =L
γAC=
3L
5, LBC =
L
γBC=
4L
5. (11.14)
Let xfrontAS (tS) and xfront
BS (tS) be the position of the front of train A and B, respectively,
in reference frame S as a function of the time in reference S, and similarly define
xbackAS (tS) and xback
BS . Then,
xfrontAC (tC) =
4ctC5
, xbackBC (tC) =
3ctC5
, (11.15a)
xbackAC (tC) = −3L
5+
4ctC5
, xfrontBC (tC) =
4L
5+
3ctC5
. (11.15b)
Indeed, t1C is defined to be the time tC when xfrontAC = xback
BC , which is indeed t1C = 0
and happens when xfrontAC = xback
BC = 0. The overtake happens when xbackAC = xfront
BC :
− 3L
5+
4ct2C5
= xbackAC (t2C) = xfront
BC (t2C) =4L
5+
3ct2C5
. (11.16)
Solving for t2C gives
ct2C = 7L . (11.17)
Just plug this back into xbackAC or xfront
BC to get x2C , the position where the back of A
passes the front of B as viewed by C’s reference frame. This is
x2C = 5L . (11.18)
Aside: We can also determine ct2C as follows. A must travel farther than B by an
excess distance equal to the sum of their lengths (as viewed by C’s reference frame),
which is 7L/5. The relative speed between A and B as viewed by C’s reference frame
is c/5. Therefore, the overtaking time is
t2C =7L/5
c/5=
7L
c=⇒ ct2C = 7L. (11.19)
(b) We need to know the speed of B as viewed by A. From A’s perspective, C is moving
with velocity vCA = −4c/5. From C’s perspective, B is moving with velocity vBC =
3c/5. Therefore, from A’s perspective, B is moving with velocity
vBA =vBC + vCA1 + vBCvCA
c2=
3c5 +
(−4c
5
)1 + 3
5
(−4
5
) =− c
5
1− 1225
= −5c
13. (11.20)
The associated gamma factor is
γBA =1√
1−(− 5
13
)2 =13
12. (11.21)
– 33 –
Therefore, the length of train B as measured in A’s reference frame is
LBA =L
γBA=
12L
13. (11.22)
Then,
xfrontAA (tA) = 0, xback
BA (tA) = −5ctA13
, (11.23a)
xbackAA (tA) = −L, xfront
BA (tA) =12L
13− 5ctA
13. (11.23b)
Again, t2A is defined to be the time when xbackAA = xfront
BA :
− L = xbackAA (t2A) = xfront
BA (t2A) =12L
13− 5ct2A
13. (11.24)
Solving for t2A gives
ct2A = 5L . (11.25)
Furthermore,
x2A = −L . (11.26)
(c) From B’s perspective, A is moving with velocity
vAB =vAC + vCB1 + vACvCB
c2=
4c5 +
(−3c
5
)1 + 4
5
(−3
5
) =c5
1− 1225
=5c
13. (11.27)
It should not be a surprise that vAB = −vBA!
Therefore, in this reference frame,
ct2B = 5L , and x2B = L . (11.28)
(d) In C’s reference frame, D must travel the distance x2C = 5L in the time t2C = 7L/c.
Therefore, the velocity of D with respect to C is
vDC =5L
7L/c=
5c
7. (11.29)
The velocities of A and B as viewed by D are
vAD =vAC + vCD1 + vACvCD
c2=
4c5 +
(−5c
7
)1 + 4
5
(−5
7
) =c
5, (11.30a)
vBD =vBC + vCD1 + vBCvCD
c2=
3c5 +
(−5c
7
)1 + 4
5
(−5
7
) = − c5. (11.30b)
– 34 –
It should not be a surprise that vBD = −vAD (why not?) In fact, instead of determining
vDC as we did in (11.29), we could have determined vDC by insisting that vAD = −vBD.
The associated gamma factors are equal:
γAD = γBD =1√
1− (1/5)2=
5
2√
6. (11.31)
The lengths of A and B as viewed by D are equal and given by
LAD = LBD =2√
6L
5. (11.32)
From D’s perspective, each train travels a distance equal to each one’s length during
the overtaking process. Thus,
t2D =LADvAD
=2√
6L/5
c/5=
2√
6L
c. (11.33)
Both events occur at the position of D, which is just the origin in D’s own reference
frame. Therefore,
ct2D = 2√
6L , and x2D = 0 . (11.34)
(e) In A and B:
(ct2A)2 − (x2A)2 = (ct2B)2 − (x2B)2 = 25L2 − L2 = 24L2 . (11.35)
In C:
(ct2C)2 − (x2C)2 = 49L2 − 25L2 = 24L2 . (11.36)
In D:
(ct2D)2 − (x2D)2 = 24L2 − 0 = 24L2 . (11.37)
– 35 –
12. Midterm 2 Quiz
(1) The first missing order in the interference/diffraction pattern produced by a double-slit
setup is the fifth interference fringe. What is the ratio of the center-to-center distance
between the two slits and the slit width?
Answer: Center-to-center slit separation = d and slit width = a. We are asked forda . Angle relation for fifth interference maximum: d sin θ = 5λ. Angle relation for first
diffraction minimum: a sin θ = λ. Take the ratio of the two equations: da = 5.
(2) A thin layer of oil sits on top of some water in a beaker. Looking above, where are you
most likely to see the highest density of interference rings and why?
Answer: The interference is between reflected light off of the air-oil interface and the
light reflected off of the oil-water interface. The latter travels twice the thickness of the
oil more than the former (and you also have to take into account the different indices
of refraction, and possible reflection phase shifts).
If the thickness of the oil film were absolutely constant, then there wouldn’t be
interference fringes or rings. Instead, the entire layer would appear to be some constant
brightness somewhere between complete constructive or destructive interference.
You only get fringes or rings if the thickness of the oil changes as a function of
position. In a very clean sample, the thickness of the oil is probably going to be
changing most rapidly near the edge of the beaker due to surface tension (this is the
so-called meniscus). Therefore, one expects to the see the highest density of rings near
the edge.
(3) Fighter jets flying towards a Radar tower on a coast find that they can remain well-
hidden if they fly very low, close to the water. Why?
Answer: The wavelength of the radar signal is large compared to the characteristic
size of structure on the surface of the water, such as waves, etc. So, the water surface
pretty much looks flat from the radar signal’s perspective and acts like a flat mirror.
Direct radar signals can now interfere with reflected ones. The reflected ones look like
they are coming from a virtual radar tower at the same position of the original radar
tower, but just the same distance below the sea level as the original tower is above
sea level. However, the signal from this virtual tower appears to be phase shifted by
π relative to the original tower from the beginning because the actual radar signal is
phase shifted by π upon reflection on the air-water interface.
In summary, this is like a two-slit interference problem, where the slit separation is
about twice the height of the tower above sea level and where the light coming from
one of the slits is already phase shifted by π relative to the other slit right from the
start. In this case, the middle point of the interference pattern directly ahead of the
two slits would have destructive rather than constructive interference. The midway
point is the surface of the water. Thus, the radar signal is weak near the surface of the
water.
– 36 –
Of course, there are other angles at which the radar signal is weak, but if you are in
a fighter jet, you might now know the details of the radar towers set up on the enemy
shore (e.g., their locations, their heights, etc.). The radar signal will be low near the
surface of the water regardless of such details. Therefore, that’s the pilot’s safest bet.
(4) What effect does putting a quarter-wave plate in front of just one slit in a two-slit
setup have on the interference pattern?
Answer: I’m assuming here that the incident light is polarized along the direction
of the optical axis of the quarter-wave plate. This problem would be much harder
otherwise. In this case, there is in initial phase difference of π2 (or a quarter wave)
between the two slits. The interference/diffraction pattern will look the same, just
shifted in the direction towards the slit that was covered with the quarter-wave plate.
The shift is such that the new central maximum lies halfway between the old central
maximum and the old first minimum.
(5) Four lightning beams strike a passing train, two at the front and two at the back. In the
frame where the train is passing by, all four lightning strikes happen simultaneously?
Order the events in the train’s reference frame.
Answer: In the train’s reference frame the two lightning strikes at the back hap-
pen simultaneously and the two lightning strikes at the front happen simultaneously.
However, the two at the front happen before the two at the back. Remember that syn-
chronized clocks on the train are not synchronized from the prospective of the frame
in which the train is moving; the clock at the back of the train is systematically ahead
of the clock at the front of the train. In the frame where the train is passing, a clock
at the back of the train reads a later time than does the a clock at the front when the
lightning beams strike. Therefore, in the train reference frame, the lightning strikes
the front before the back.
(6) If I’m on a train traveling at 4c/5 relative to you and I shoot a rocket forwards at
speed 3c/5 relative to me, then at what speed is the rocket moving relative to you?
Answer: The train is reference frame S′ and you are in reference frame S. The speed
of S′ relative to S as measured in S is v = 4c5 . The speed of the rocket as measured in
S′ is u′ = 3c5 . The speed of the rocket as measured in S is
u =u′ + v
1 + u′vc2
=3c5 + 4c
5
1 + 35 ·
45
=7c5
1 + 1225
=35c
37.
(7) If I’m on a train traveling at 4c/5 relative to you and I shine light forward, then at
what speed is the light moving relative to you?
Answer: c. If you plug in u′ = c in the previous problem, you will get u = c.
(8) “Derive” time dilation using the relativistic clock example.
AnswerL A train (reference frame S′) moves at speed v relative to reference frame
S. Transverse to the direction of motion of the train, light is sent from one side of the
– 37 –
train to the other and reflected back, for a total distance of, say, 2h. The time it takes
for this round trip in S′ is t′ = 2hc .
In S, the total speed of the light beam is still c, but the component of this velocity
along the direction in which the train is moving is now v. Therefore, the transverse
component is√c2 − v2 = c
γ . The transverse distance that must be covered is still 2h.
Therefore, the round trip time in S is t = 2hc/γ = γt′.
(9) “Derive” length contraction using time dilation.
Answer: A light signal is sent from the back of the train to the front and back. Let
L0 be the proper length of the train (measured in its own rest frame). Then the round
trip time in S′ is t′ = 2L0c .
Let L be the length of the train in S. The relative speed between the light beam
and the front of the train in S is c − v. The relative speed between the back of the
train and the light beam after reflected is c+ v. Therefore, the round trip time in S is
t = Lc−v + L
c+v = 2Lcc2−v2 = 2γ2L
c .
From time dilation, we have 2γ2Lc = t = γt′ = 2γL0
c , which gives L = L0γ .
(10) What are the two effects involved in the relativistic Doppler effect?
Answer: (1) The standard Doppler effect, whereby the wavelength of the signal is
shortened if the source is moving towards you and lengthened if moving away; and
(2) Time dilation, whereby the time it takes for a new wavefront to be created by the
moving source increases relative to when it is at rest.
(11) A train and a tunnel both have proper lengths L. The train moves toward the tunnel
at speed v. A bomb is located at the front of the train. The bomb is designed to
explode when the front of the train passes the far end of the tunnel. A deactivation
sensor is located at the back of the train. When the back of the train passes the near
end of the tunnel, the sensor tells the bomb to disarm itself. Does the bomb explode?
Answer: Yes, the bomb explodes. Let us first consider the train reference frame, in
which the answer is obvious. In this frame, the train has length L and the tunnel has
length L/γ < L and is heading towards the train at speed v. Therefore, it is clear that
the back end of the tunnel will pass the front of the train before the front end of the
tunnel reaches the back of the train.
In the tunnel frame, the tunnel has length L and the train has length L/γ < L.
Therefore, the back of the train reaches the near end of the tunnel before the front of
the train reaches the back of the tunnel. You might be tempted to say that the bomb
is then deactivated before it can explode. However, you have to keep in mind that the
deactivator at the back of the train needs to send a signal to the bomb at the front
of the train saying that it has reached the front end of the tunnel and that the bomb
should therefore disarm itself. At best, that signal can travel at the speed of light. It
will take time for that signal to reach the bomb at the front of the train. If that time is
– 38 –
longer than the time it takes for the front of the train to reach the back of the tunnel,
then it will be too late and the bomb will explode.
Let the front of the tunnel correspond to x = 0 and let t = 0 be when the back of
the train passes the front of the tunnel. Henceforth, the signal sent by the deactivator
travels forward at the speed of light, its worldline described by xs = ct (the s subscript
stands for “signal”). At t = 0, the front of the train is at x = L/γ, since that is the
length of the train in the tunnel reference frame. The trajectory of the front of the
train is xb = Lγ + vt (the b subscript stands for “bomb”). Which one reaches x = L
(the back of the tunnel) first? Well, the time it takes for the signal is ts = L/c whereas
for the bomb takes tb =(L− L
γ
)/v = ts
β
(1− 1
γ
), where β ≡ v/c. We claim that tb < ts
and so the bomb explodes.
To prove this, start with the inequality β < 1, which just says that the train must be
moving at less than the speed of light. Multiply by 2β and add 1 to both sides to get
1 + 2β2 < 1 + 2β. Now, subtract 2β + β2 from both sides to get 1− 2β + β2 < 1− β2.
Rewrite the left hand side as (1−β)2, then take the positive square root of both sides to
get 1−β <√
1− β2 = 1γ . This final inequality can be rearranged to read 1
β
(1− 1
γ
)< 1.
But, the left hand side is just tb/ts, and so tb < ts.
Below are spacetime diagrams in both reference frames in the case β = 4/5. Note
that we have set t = t′ = 0 when the front of the train lines up with the front of the
tunnel. But, note that we have not set x = x′ = 0 to be the position of this event.
The spatial origins of the frames are different: x′ = 0 for the center of the train and
x = 0 for the center of the tunnel. In the train frame, the explosion happens before
the deactivation signal is sent. In the tunnel frame, those two events occur in the
opposite order. However, in both reference frames, the bomb explodes; it certainly
cannot be the case that the train explodes in one frame whereas it does not in the
other! Assuming that the explosion “signal” (i.e. the fires, etc.) travel at the speed of
light, the red shaded regions represent the region of the train that is engulfed in fire,
or at least the region that is aware of the fact that the explosion has occurred.
– 39 –
13. Energy and Momentum
Relativistic energy and momentum are given by
E = γmc2, p = γmv, (13.1)
where γ is the usual gamma factor associated with v.
You can argue these forms with the use of somewhat cryptic collision arguments and
energy and momentum conservation, as is done in your textbook. I would like to discuss
4-vectors instead.
13.1. 4-Vectors
We can combine the time and space coordinates of an event, as measured in some reference
frame S, into a column of four numbers: ct
x
y
z
.
Then, we know how these coordinates transform when we change reference frames: they
change via a Lorentz transformation. For example, if the reference frame S′ is moving
with speed v in the +x direction relative to S, then the primed coordinates for the event,
measured in S′, are related to the unprimed coordinates for the event, measured in S, viact
x
y
z
=
γ βγ 0 0
βγ γ 0 0
0 0 1 0
0 0 0 1
ct′
x′
y′
z′
. (13.2)
Incidentally, if S′ is moving with speed v in the +y direction instead, thenct
x
y
z
=
γ 0 βγ 0
0 1 0 0
βγ 0 γ 0
0 0 0 1
ct′
x′
y′
z′
.
and similarly if S′ is moving in the +z direction.
Any collection of four numbers that can be combined into a column and transforms in
this way from one reference frame to the next is called a 4-vector.
If we consider two events, each with its own set of coordinates (both measured in
the same reference frame S), then we can also write down the difference between those
coordinates. This is the spacetime displacement from event 1 to event 2:c∆t
∆x
∆y
∆z
=
c(t2 − t1)
x2 − x1
y2 − y1
z2 − z1
,
– 40 –
The same can be done for any 4-vector (i.e., one can consider differences in a pair of
4-vectors measured in the same reference frame). The same argument that leads to the
invariance of the interval (c∆t)2 − (∆x)2 − (∆y)2 − (∆z)2 implies that the same can be
said for any 4-vector.
Furthermore, if you multiply any 4-vector by a scalar, which is some number which
does not change from one reference frame to the next (e.g., mass), then the result is still
a 4-vector. For example, between any two events that are causally related (i.e., can be
connected by something traveling at a speed less than or equal to the speed of light), there
is one particular reference frame, S∗, in which the two events occur at the exact same
location in space. The time between the two events in that particular reference frame is
called the proper time, denoted ∆τ .
This proper time is invariant under Lorentz transformation. This statement is essen-
tially tautological: It is true by fiat, because the proper time is defined with respect to
the particular reference frame S∗. This is the same reason why mass is invariant: Mass is
defined as the energy (up to factors of c) in the rest frame of the object. If you were to ask
me what is the mass of an object that is moving, I would say it is the same mass that the
object would have if it were not moving. Note that we are talking about mass here, not
this bizarre thing called the relativistic mass, γm, which you should expeditiously excise
from your minds.
Therefore, the spacetime displacement between two events, measured in some reference
frame S, can be divided by the proper time between those two events and the result is still
a 4-vector, since the latter is a scalar: c∆t/∆τ
∆x/∆τ
∆y/∆τ
∆z/∆τ
.
The time between those two events measured in any other reference frame, S, is equal to
∆t = γ∆τ , where γ is the gamma factor associated with the velocity at which S∗ is moving
relative to S (this is time dilation). Thus,c∆t/∆τ
∆x/∆τ
∆y/∆τ
∆z/∆τ
=
γc
γ∆x/∆t
γ∆y/∆t
γ∆z/∆t
.
Taking the limit as all these deltas become really small turns the ratios into derivatives.
We recognize these derivatives to be the components of the velocity of S∗ relative to S.
This defines the 4-velocity: γc
γvxγvyγvz
. (13.3)
– 41 –
Finally, we can multiply by the scalar mass of some hypothetical object which moves
between the two events. The result is also a 4-vector and it is called the energy-momentum
4-vector: E/c
pxpypz
=
γmc
γmvxγmvyγmvz
, (13.4)
which are precisely the definitions (13.1) given at the start.
For free, we have the invariance of the interval associated with this 4-vector, which is
E2
c2− |p|2. (13.5)
Usually, this is actually multiplied by the constant c2 to get E2 − |p|2c2.
13.2. Colliding Photons
[Goldstein, Poole & Safko 7.22 ] A photon of energy E2 collides at angle θ with a photon
of energy E1. Determine the minimum value of E2 permitting the formation of a pair of
particles of mass m, as a function of E1, m and θ.
SOLUTION:
Expectations: We should expect that the we would need to pump in more energy if we
are to create heavier particles. Therefore, if m gets larger, we expect that E2 must get
larger as well: E2 ∼ m#, where # is some positive exponent. If the first photon already
has a lot of energy (E1 is large), then the second photon shouldn’t have to have so much
energy anymore, and vice versa. Therefore, if E1 is big, then E2 can be small, and if E1 is
small then E2 should be big: E2 ∼ 1
E#1
. In fact, we can do a bit better than that. Since
E2 must have units of energy, and mc2 and E1 have units of energy, we ought to have
E2 ∼ mc2(mc2
E1
)#, where # is some positive exponent. Actually, to be most conservative,
all we can really say is that E2 ∼ mc2f(mc2
E1
), where f(x) is an increasing function for x > 0
as x increases. If θ → 0, the collision is very weak and the incoming energies must be huge
in order to produce something. Thus, we expect that E2θ→0−−−→ ∞. The opposite scenario
is θ → π, which corresonds to a head-on collision. This is the “best-case scenario” since it
is the strongest collision. Thus, E2 should be minimal at this angle: E2θ→π−−−→ min E2. In
summary,
E2 =mc2f
(mc2
E1
)g(θ)
, (13.6)
– 42 –
where f is an increasing function and g is a function which goes to zero as θ goes to zero
and attains a maximum as θ goes to π.
Center of Momentum Frame Picture: It is difficult to describe exactly what happens
in the lab frame, S, which is the frame in which the drawing above is drawn. The two
masses are in general moving in all sorts of possible directions with all sorts of possible
energies. However, the picture is very simple in the center of momentum frame, S′. This is
the frame where the total momentum of the system is always exactly 0. So, in this frame,
the two photons undergo a head-on collision with both photons coming in with the same
energy and equal and opposite momenta. If there is insufficient energy to produce the two
masses, m, then the photons could just pass each other, or they could turn into something
else. If there is more energy than is the minimum required, then the two masses, m, will
be produced and the remaining energy is distributed evenly between the two of them as
their kinetic energies. So, the two masses fly off in opposite directions with equal energy
and equal and opposite momenta. At the absolute critical case, the two photons collide
and all of their energy is used up to produce two masses, m, just sitting there in the center
of momentum frame... not moving.
Method 1 (Relativistic Invariant): The relativistic invariant in the COM of frame, S′,
is E′2− p′2c2, where E′ is the total energy and p′ is the magnitude of the total momentum
vector in the COM frame. Well, by definition, p′ = 0. Thus, the relativistic invariant in
the COM frame is just E′2.
The total energy in the lab frame, S, is E = E1 + E2. We have to break up the
momentum vectors of each photon into their components to calculate the magnitude of
the total momentum vector, p. The horizontal component of p is E1c + E2
c cos θ and the
vertical component is E2c sin θ. Therefore,
p ≡ |p| =√(
E1c + E2
c cos θ)2
+(E2c sin θ
)2= 1
c
√E2
1 + E22 + 2E1E2 cos θ.
I would like to rewrite this by adding and subtracting 2E1E2 under the square root. Adding
2E1E2 to E21 + E2
2 completes the square to give (E1 + E2)2. Thus,
p = 1c
√(E1 + E2)2 − 2E1E2(1− cos θ) = 1
c
√(E1 + E2)2 − 4E1E2 sin2 θ
2 ,
where I used the trigonometric identity sin2 θ2 = 1−cos θ
2 . This last step is certainly not
necessary; it is just my habit to do this whenever I see 1 − cos θ, even though it is not
always useful.
The relativistic invariant calculated in the lab frame, S, is thus
E2 − p2c2 = (E1 + E2)2 − (E1 + E2)2 + 4E1E2 sin2 θ2 = 4E1E2 sin2 θ
2 .
This is equal to the relativistic invariant in the COM of frame, which we have already
determined to be just E′2 because p′ = 0. Thus,
E′ = 2√E1E2 sin θ
2 . (13.7)
– 43 –
As we have already discussed above, in the critical case, all of the total energy in the COM
frame, E′, is used up to form two masses, m, at rest. Thus,
E′ = 2√E1E2 sin θ
2 = 2mc2 =⇒ E2 =m2c4
E1 sin2(θ/2). (13.8)
Notice that this does satisfy all of the expectations we stated in the beginning!
Method 2 (Transform to COM frame): The COM frame is moving relative to the lab
frame along the direction of the total momentum vector, p, in the lab frame. Therefore, we
will set that direction to be the +x-direction. Note that this is not the horizontal direction,
which is what you might have been tempted to call the +x-direction instead. With this
choice of coordinates, the total momentum vector, p, does not have any y or z components
and thus we can neglect y and z altogether. The x-component of the total momentum
vector is therefore just the magnitude of the total momentum vector.
The top two components of the momentum 4-vector in the lab frame are
pµ =
(E/c
p
)=
(E1+E2
c
1c
√(E1 + E2)2 + 4E1E2 sin2 θ
2
).
Notice the notation here. The upper Greek index on pµ just indicates that this is a 4-
momentum vector and the components are p0 = E/c, p1 = px, p2 = py and p3 = pz.
Technically, I should write down the y and z components, but they are both 0.
Let us rewrite pµ by factoring out (E1 + E2)2 from the square root in p:
pµ =E1 + E2
c
(1√
1 + 4E1E2 sin2(θ/2)(E1+E2)2
)≡ E1 + E2
c
(1
A
). (13.9)
Note that I just called the whole mess in the square root A, so that I don’t have to keep
writing it over and over again.
All we know is that the COM frame moves in the +x-direction relative to the lab
frame. But, we don’t know how fast it is moving. Let us set its speed to be βc, with
corresponding γ factor. We will have to determine what β has to be for the COM frame.
We boost pµ to get the 4-momentum vector in the COM frame:(E′/c
p′
)= p′µ = γ
(1 −β−β 1
)E1 + E2
c
(1
A
)= γ
E1 + E2
c
(1− βAA− β
).
For the COM frame, we know that p′ = 0. But, we see above that p′ ∝ A− β. Therefore,
the β parameter that takes us from the lab frame to the COM frame must be β = A.
Plugging that back in to the equation above gives(E′/c
p′
)=
1√1−A2
E1 + E2
c
(1−A2
0
)=E1 + E2
c
√1−A2
(1
0
).
– 44 –
Plugging in the definition of A in Eqn. (13.9) gives(E′/c
p′
)=
2√E1E2 sin(θ/2)
c
(1
0
),
which gives precisely the same E′ as we found in method 1 in Eqn. (13.7).
14. Quantum Mechanics
For me, the double slit experiment is the gateway to quantum mechanics. This is not
historically how the field developed. I would say that that is closer to the way your textbook
presents the material, with Planck’s discovery of the Planck distribution, derived from his
clever insight that light came in discrete units called photons, whose energy was directly
proportional to the frequency of the light, the proportionality being Planck’s constant.
Then, Einstein ran with this idea to explain the photoelectric effect, etc.
However, I think that the double slit experiment, moreso that either the Planck distri-
bution or the photoelectric effect, really captures a broad scope of the weird and wonderful
phenomena that propelled quantum mechanics in the early days and which were the subject
of many a heated debate. I hope I’ll be able to convince you of this, but in the meantime,
please accept my apologies for presenting material now that is in a later chapter of your
textbook.
14.1. The Wacky World of the Double Slit
Imagine performing the double slit experiment with light that is weak enough so that
photons arrive at the screen at a low enough frequency that you (or, more accurately,
the detectors on a screen) can actually distinguish the arrival event of each single photon.
Surely, we would have to conclude that light is made up of bona fide particles in this case
since you can see when each one arrives at a particular point on the screen.
If you were to cover one of the slits, then photons pass through the other slit, theo-
retically one at a time, and they just go straight through to the screen. You would expect
to see dots form on the screen (if you used photographic film or something like that) right
around the point on the screen directly in front of the slit. These dots would pile up over
time as you exposed the film longer and longer.
If you were to have both slits open, you might think that you would just get two regions
on the screen, on directly in front of each one of the slits, where photons pile up over time.
After all, if one photon goes through the slits at a time, then it either goes straight in front
of one slit or the other, right? Surprisingly, that’s not what happens at all. Instead, you
will observe the same old interference pattern that you see when you shine a strong light
source through the slits, it just takes time for the pattern to build up as you expose the
film longer and longer!
In real life, these experiments were first done with electrons rather than photons. For
the time being, let us postpone discussion why you might or can use electrons instead of
photons in the double slit experiment. Below are pictures taken from the original papers of
– 45 –
(a) P. G. Merli, G. F. Missiroli and G. Pozzi.
“On the statistical aspect of electron inter-
ference phenomena.” American Journal of
Physics 44 306, (1976).
(b) A. Tonomura, J. Endo, T. Matsuda, T.
Kawasaki and H. Ezawa. “Demonstration
of single-electron build-up of an interference
pattern.” American Journal of Physics 57
117, (1989).
Figure 1: Time lapse exposures in the double-slit experiment performed using electrons.
the first experiments to actually observe this effect. If you want to see a video of this done in
2012, see http://iopscience.iop.org/1367-2630/15/3/033018/media/njp458349movie2.mov.
Please take a moment to contemplate how amazing this is. The electrons are passing
through the slits one at a time. What on earth are they interfering with? How do they
know to land with a greater probability in some regions of the screen more than others?
To me, this experiment is the definitive demonstration of the wave-particle duality.
How can something be a wave and a particle at the same time? Well, here it is, in all its
glory. To understand this phenomenon, we will develop the rudiments of the wavefunction
picture of quantum mechanics and the so-called Copenhagen interpretation. But, let us
leave that for another day. For now, consider the following thought experiment.
Suppose you put a light source behind the double slit shooting light across each of the
slits. There is then a detector on each side that detects this light. When an electron passes
through, it may interact with the light and cause decrease in the intensity of the light that
is measured at the detectors. Basically, the electron cuts off the light beam for an instant
as it passes by. The point of this whole setup is for us to experimentally verify which slit
each electron goes through. The question is: does this have any effect on the pattern that
you observe on the wall, and, if so, what is the effect?
If you think there might be an effect, you might wonder how great an effect this might
– 46 –
have. Could I not just make the observation light arbitrarily weak so as to perturb the
system minimally?
The answer turns out to be pretty catastrophic. If you can determine which slit each
electron passes through, then the interference pattern will be completely destroyed. You
will end up with a wash of electrons on the screen mostly concentrated at the two points
on the screen directly in front of the slits! You can imagine turning the observation light
on and off, effectively destroying and then reviving the interference pattern at will!
We will not resolve this seeming paradox at the moment. But, let me just tell you the
punchline, and you will see how it works later on. The point is that you cannot simply
make the observation light arbitrarily weak. If you do, you will not be able to determine
the position of the passing electrons with sufficient resolution to determine which slit each
passed through. Furthermore, you will find out that you cannot really use arbitrarily high
momentum electrons in this experiment. It turns out that the momentum of the electrons
and the momentum of the photons you would have to use to observe those electrons will be
comparable; they are both very small, but nevertheless comparable in magnitude with each
other. Therefore, when they interact (e.g., collide), the photon may have a large effect on
the final momentum of the electron and may deflect it significantly. This will completely
destroy the interference pattern. Therefore, there is no hidden mini-demon whose job it
is to confound your efforts to measure the electrons and observe interference at the same
time. You yourself are destroying the interference pattern by perturbing the system too
strongly.
14.2. Blackbody Radiation and the Ultraviolet Catastrophe
We learn from the photoelectric effect that light may be thought of as being built out of
particles called photons, even though it behaves like a wave in most familiar situations.
Somehow, very many photons conspire to produce wave-like behavior. This concept of
photons is what starts us down the road towards blackbody radiation, although historically
the ideas of Planck about blackbody radiation, which we are about to describe, preceded,
and in fact inspired, Einstein’s explanation of the photoelectric effect.
One thoroughly embarrassing problem that remained before Planck came on the scene
is called the ultraviolet catastrophe. Consider a thermally insulated cavity of volume V
containing radiation. The energy associated with an electric field is proportional to the
square of the electric field. The equipartition theorem states that, at thermal equilibrium
at temperature T , the average energy associated with a quadratic degree of freedom, such
as this, is ∼ kT (or kT/2; it really doesn’t matter for this discussion). However, there
are technically infinitely many possible modes of radiation inside a cavity, with arbitrarily
short wavelength. If each mode is to possess an average energy of kT , then the total energy
would be infinite! Schroeder describes just how embarrassing this conclusion is: if it were
correct, you would expect to be blasted with an infinite amount of radiation every time
you open the oven door to check the cookies!
The classical assumption is that each mode can have any non-negative energy, E.
From 7B, we know that the probability for a mode to have energy E is proportional to the
Boltzmann factor: P (E) ∝ e−E/kT . Calculating the average energy per mode as you did
– 47 –
for an ideal gas in 7B for such a continuous spectrum produces the equipartition theorem
and leads to the UV catastrophe as described above.
Planck’s neat idea was that electromagnetic energy is not continuously distributed, but
is quantized in integer units of hν, where ν is the frequency of radiation and h is Planck’s
constant. He proposed that light was absorbed and emitted by matter in quanta called
photons. So, a single mode with frequency ν can have an energy of 0, or hν, or 2hν, etc.
But it cannot have an energy between these values, like hν/2, since that would correspond
to half a photon!
This leads to the Planck distribution and eventually to the Stephan-Bolztmann law
of radiation, which states that the average intensity of radiation from a blackbody at
temperature T is proportional to T 4, with a proportionality constant given by the Stephan-
Bolztman constant.
14.3. Stephan-Boltzmann Law
The Stephan-Boltzmann law gives the irradiance of a graybody at temperature T :
I = εσT 4, where σ =2π5k4
B
15h3c2= 5.67× 10−8 W
m2K4 , (14.1)
and ε is the emissivity of the graybody, which is a number between 0 and 1, with 1
corresponding to a perfect blackbody. On the other hand, the absorptivity, a, of an object
measures the fraction of the light incident on the object that the object absorbs. At
equilibrium, a = ε, meaning that what radiation the object absorbs, it emits, so that it
neither heats up (if it emits less than it absorbs) or cools down (it emits more than it
absorbs).
Assume that the sun is a blackbody of temperature 5800 K and radius 7 × 108 m,
located 1.5× 1011 m from the earth. Assume that the earth is a graybody, which absorbs
part of the radiation incident upon it from the sun, and then re-radiates it isotropically.
Neglect any other effects which could heat the earth. Calculate the surface temperature of
the earth under these assumptions.
SOLUTION:
Let RS be the radius of the sun, RES be the earth-sun distance, RE the radius of the earth,
TS the temperature of the sun, TE the temperature of the earth, and ε the emissivity of
the earth, which, at equilibrium, is also the absorptivity of the earth. The power being
radiated by the sun is just its irradiance multiplied by its surface area:
PS = (σT 4S)(4πR2
S).
By the time this light reaches the distance of the earth, the power has spread over a sphere
with radius RES . Thus, the irradiance of sunlight at the earth, which we call IES , is
IES =PS
4πR2ES
= (σT 4S)
(RSRES
)2
.
– 48 –
This light irradiance is travelling radially outwards from the sun, and so only the cross-
sectional area of the earth from the view of the sun is actually absorbing the light. This
cross-sectional area is πR2E . Furthermore, not all of that light is absorbed: only ε of it is
absorbed. Thus, the power absorbed by the earth is
P(abs)E = επR2
EIES = πε(σT 4S)
(RSRERES
)2
.
On the other hand, the power re-radiated by the earth is isotropic and radiated by all the
surface area of the earth:
P(rad)E = (εσT 4
E)(4πR2E).
At equilibrium, P(abs)E = P
(rad)E , and solving for TE gives
TE =
√RS
2RESTS = 280 K ≈ 7◦ .
That’s quite cold, but it’s supposed to represent an average surface temperature for the
earth. However, even if it were a good value, we would have to take it with a heap of salt
since we didn’t even take into account the fact that the earth has an atmosphere!
14.4. Bohr Model
Some time after the Planck’s discovery of his model of blackbody radiation (1990) and
Einstein’s explanation of the photoelectric effect (1905), Niels Bohr proposed an explana-
tion for atomic spectra: the so-called Bohr model of the atom (1913). I will not reproduce
the derivation of the radii, speeds and energies of the electron in its various orbitals in the
Bohr model. However, I will mention the way I remember the orbital energy and radius.
A special case of the virial theorem says that for orbital paths in the presence of a
central force, which is proportional to the inverse square of the radial distance, the average
potential energy along the orbit, 〈V 〉, is−2 times the average kinetic energy, 〈T 〉. Therefore,
the average total energy is 〈E〉 = −〈T 〉 = 〈V 〉/2. The convention for potential energy here
is that Vr→∞−−−→ 0−. That is, the potential energy is negative and approaches zero from
below at large distances. This statement of the virial theorem is particularly powerful
for circular orbits because these orbits have constant kinetic and potential energies and
therefore, we can just get rid of the averages and the result still holds!
An electron orbiting a proton in hydrogen is in the presence of the Coulomb force,
which is an inverse square force and therefore satisfies the conditions of the special case of
the virial theorem discussed above. Therefore, the energy is simply negative of the kinetic
energy or half the potential energy:
E = −T = −L2
2I= − L2
2mr2, and E =
V
2= − e2
8πε0r= −α~c
2r. (14.2)
I have introduced the dimensionless fine structure constant,
α =e2
4πε0~c≈ 1
137. (14.3)
– 49 –
Set the above two expressions for E equal to each other and solve for r:
r =L2
α~mc. (14.4)
Finally, we use Bohr’s postulate: angular momentum comes in integer units of ~:
rn =n2~2
α~mc=
n2~αmc
. (14.5)
I actually prefer writing this as
rn =n2~cαmc2
, (14.6)
because I always remember that mc2 = 0.511 MeV for the electron. It’s not really that
important because I never remember what ~c is anyway. For future reference, the value of
~c is ~c = 1.24 µeV·m (that’s micro-electron volts times meters).
Plugging this back into the expression E = −α~c2r gives the energies
En = −α2mc2
2n2. (14.7)
This is the only expression for the orbital energy I ever remember because it is nice and
succinct. I always remember that the hydrogen energy levels go like E ∼ 1/n2. The only
sensible unit of energy in this problem is mc2, the rest mass energy of the electron. In fact,
we are assuming in our analysis above that the electron is non-relativistic. This means that
the energy levels should be very small compared to the rest mass energy of the electron.
That is, they should be measured in units of the electron rest mass energy and in those
units, they should be small. Indeed, this is the case because α2 is a small number.
I remember the factor of α2 via an argument from quantum field theory. Don’t worry,
you don’t have to understand the details of the argument or how it is derived in quantum
field theory. A qualitative picture will suffice. Worst case scenario: this can just serve as
a memory aid. The interaction between the electron and the proton, or indeed any two
charged objects, happens via the exchange of a photon. For example, the simplest such
exchange might look like
�
JJJ]
JJJ
]
�α α
e− p+
(14.8)
This is what is called a Feynman diagram. It is supposed to denote an electron and a
proton coming in, interacting via the exchange of a photon, and then going out. In the
diagram, time goes upwards. The diagram makes it seem as though the electron and proton
go away from each other after the interaction (i.e., repel). This is just the conventional
way this diagram is drawn; in fact, the diagram does not usually live in space anyway,
but rather in momentum space. But, if you like, there is nothing wrong with drawing the
outgoing electron and proton lines to be heading towards each other rather than away.
Each vertex denotes a local interaction between a charge and the photon and counts
as one factor of α. There are two vertices in the above diagram, and therefore the overall
– 50 –
interaction strength goes like α2. Of course, you can have ever-more complicated diagrams
with more and more photon lines. You can even have internal loops consisting of electrons
and positrons and all sorts of other particles. However, these will necessarily come in with
ever-more factors of α and since α is small, these are ever-smaller effects. The energy,
(14.7), is sometimes called the tree-level energy because it is derived from the above tree-
level Feynman diagram, which contains no loops.
Finally, there is the pesky factor of 2 in the denominator. If you remember everything
else and, in addition, remember that, for hydrogen, when you plug in n = 1, you are
supposed to get −13.6eV, then you can’t miss the factor of 2, since otherwise you would
get −27.2eV instead.
Now, consider the following problem:
(a) The power radiated by an accelerated charge e is given in classical physics by the
formula
P =1
4πε0
2e2
3c3a2 (SI units),
where a is the acceleration.
Using this formula, calculate the power radiated by an electron in a Bohr orbit
characterized by the quantum number n. (According to the correspondence principle,
when n is very large this should agree with a proper quantum mechanical calculation.)
(b) The decay rate for an electron in an orbit may be defined to be the power radiated,
P , divided by the energy emitted in the decay. (The decay rate is the inverse of the
lifetime). Use the Bohr theory expression for the energy radiated, and the expression
for P from part (a) to calculate the “correspondence” value of the decay rate when the
electron makes a transition from orbit n to orbit n−1. What is the value of this decay
rate when n = 2? (This will not agree exactly with the true quantum theory, since
the correspondence principle will not hold when n is not � 1.) What is the decay rate
when the transition is from an orbit n to an orbit n−m?
(c) Use the value of the “lifetime” of an electron in an n = 2 Bohr orbit, calculated in part
(b), to estimate the uncertainty in the energy of the n = 2 energy level. How does it
compare with the energy of that level?
SOLUTION:
(a) The acceleration in a circular orbit is related to tangential speed and radius via
a =v2
r.
The radius rn of the nth orbit is in (14.5). The speed in this orbit is given by solving
for v in the equation L = mvr and plugging in L = n~:
rn =n2~αmc
, vn =n~mrn
=αc
n. (14.9)
– 51 –
The expression for vn is particularly nice because it shows you that the electron is
pretty non-relativistic, since α is a small number, so v � c.
Therefore, the acceleration in the nth orbit is
an =v2n
rn=α3mc3
n4~. (14.10)
We can write the power as
P =2α~
3
(a
c
)2
.
Therefore, the power radiated by an electron in the nth Bohr orbital is
Pn =2α~
3
(α3mc2
n4~
)2
=2α7m2c4
3n8~. (14.11)
If we plug in n = 2, we will get
P2 =2(
1137
)7(0.511 MeV)2
3(28)(6.58× 10−16 eV · s)= 1.14× 109 eV
s.
(b) Classically, there would be a continuum of orbital states between n = 2 and n = 1
and the electron could radiate continuously and decay continuously. It’s orbit would
very quickly spiral inwards and the electron would crash into the proton. There would
be no stable atoms at all and no chemistry or life could possibly exist. Clearly, that’s
wrong! You could say that our very existence is evidence for quantum mechanics.
The model we are suggesting in this problem is that the electron sort of waits until
it would have radiated away the difference in energy between the n = 2 and n = 1
orbitals had it been radiating continuously at the rate P2, and then at that point it
radiates that whole energy difference at once. At the rate P2, the time it would take
to radiate away the energy difference between n = 2 and n = 1 is
∆t2 ≡∆E2→1
P2=−13.6 eV
22− −13.6 eV
12
1.14× 109 eVs
≈ 10−8 s. (14.12)
This is the average lifetime of the n = 2 orbital. The decay rate, γ, is just the inverse
of this.
(c) The energy-time uncertainty relation is
∆E∆t ≥ ~2.
If you use the smallest bound for our estimate and ∆t in (14.12), you get
∆E2 =~
2∆t2=
6.58× 10−16 eV · s2× 10−8 s
= 3.3× 10−8 eV. (14.13)
Since E2 is of order eV, we can say that we know the energy of the orbital to a high
precision since the uncertainty is so small in comparison.
– 52 –
14.5. Time-Evolution in 1D Infinite Square Well
First, let us prove that the wavefunctions of the one-dimensional infinite square well of
length L are orthonormal. Recall that the wavefunctions and the energies of a particle of
mass m occupying the corresponding states are labeled by a positive integer, n:
ψn(x) =
√2
Lsin
nπx
L, En =
n2π2~2
2mL2. (14.14)
We would like to prove that ∫ L
0ψ∗m(x)ψn(x) dx = δmn, (14.15)
where δmn equals 1 if m = n and zero if m 6= n. This is called the Kronecker delta. Note
that the complex conjugation is actually immaterial in this case because the wavefunctions
happen to be real. However, this is not always the case, so it’s a good idea to keep the
complex conjugation in when you write the orthonormality condition in general. Let us
write out the left hand side:∫ L
0ψ∗m(x)ψn(x) dx =
2
L
∫ L
0sin(mπx
L
)sin(nπxL
)dx = 2
∫ 1
0sin(mπξ) sin(nπξ) dξ,
where we changed the integration variable to ξ ≡ x/L, for convenience.
We can use the trigonometric identity
2 sinα sinβ = cos(α− β)− cos(α+ β).
Using this identity, we can write the integral we are calculating as∫ L
0ψ∗m(x)ψn(x) dx =
∫ 1
0
(cos[(m− n)πξ]− cos[(m+ n)πξ]
)dξ
=sin[(m− n)πξ]
(m− n)π
∣∣∣∣10
− sin[(m+ n)πξ]
(m+ n)π
∣∣∣∣10
= sinc[(m− n)π]− sinc[(m+ n)π]. (14.16)
Since m and n are both positive integers, so is m + n. Therefore, sin[(m + n)π] = 0 and
therefore sinc[(m + n)π] = 0, since the denominator, (m + n)π 6= 0. On the other hand,
m − n can be any integer - positive, negative, or zero. If m − n 6= 0, then we still have
sinc[(m − n)π] = 0, but if m − n = 0, then sinc[(m − n)π] = sinc 0 = 1. This proves the
desired relation: that this integral is equal to zero except when m = n, in which case it is
equal to 1. This is precisely the orthonormality condition, Eqn. (14.15).
It turns out that this orthonormality condition is all we need to prove that the wave-
functions, ψn(x), form a complete basis. The completeness condition says that any wave-
function that satisfies Schrodinger’s equation (in this case, for the one-dimensional infinite
square well potential) may be written as a superposition of the basis wavefunctions. We
– 53 –
would like to prove this now. Suppose we have an arbitrary wavefunction, ψ(x), that sat-
isfies the one-dimensional infinite square well potential Schrodinger equation. We would
like to write it as a superposition:
ψ(x) =
∞∑n=1
Cnψn(x). (14.17)
Let us multiply both sides by ψ∗m(x) and integrate from x = 0 to x = L:∫ L
0ψ∗m(x)ψ(x) dx =
∞∑n=1
Cn
∫ L
0ψ∗m(x)ψn(x) dx =
∞∑n=1
Cnδmn = Cm. (14.18)
This gives us a formula for calculating the expansion coefficients, Cm. There are some
technicalities regarding whether or not the integral expression on the LHS for Cm makes
any sense, but these are mathematical qualms and do not, to my knowledge, arise in
any meaningful physical situation. Thus, we have shown that the wavefunctions, ψn(x),
furnish a complete basis for all appropriate wavefunctions. [Note: this is very similar to
Fourier’s theorem, which claims that any “sufficiently nice” function may be written as a
superposition of sines and cosines or complex exponentials.]
Now, here comes the true utility of these basis wavefunctions. By construction, they
are what are called energy eigenstates because they have well-defined energies given in Eqn.
(14.14). This is useful because it is easy to write down the time evolution of a state that
has a well-defined energy. If ψ(x) is the wavefunction of a state that has energy E, then
the time evolution of that state is
ψ(x, t) = e−iEt/~ψ(x). (14.19)
Often, one defines ω ≡ E/~ so that the exponential can be written e−iωt.
We may apply this to the basis wavefunctions. The energies are En given earlier.
Define ωn ≡ En/~. Then,
ψn(x, t) = e−iωntψn(x). (14.20)
If ψ(x) does not have a well-defined energy, then this simple relation no longer holds.
However, we can write ψ(x) as a superposition of the basis states and evolve each term:
ψ(x, t) =∞∑n=1
Cne−iωntψn(x). (14.21)
Voila! We are able to time-evolve ψ(x) even though it does not have a well-defined energy!
Let’s work out an example. Suppose the particle is located somewhere on the left hand
half of the infinite square well, but most likely to be found in the middle of the left half.
Suppose its wavefunction is
ψ(x) =
2√L
sin(
2πxL
), 0 ≤ x ≤ L
2 ,
0, elsewhere.(14.22)
– 54 –
I have made sure that the integral of |ψ(x)|2 is 1, which has to be the case since |ψ(x)|2 dxis supposed to represent the probability for the particle to be located in a region of size dx
around the point x and so the integral is the probability for the particle to be anywhere,
which had better be 1. Note that this wavefunction looks very much like the n = 2 basis
wavefunction, but only on the left half of the well.
Let us use the formula for the expansion coefficient, Eqn. (14.18):
Cm =
∫ L
0ψ∗m(x)ψ(x) dx
=2√
2
L
∫ L/2
0sin(mπx
L
)sin
(2πx
L
)dx
=√
2
∫ 1
0sin(m
2πξ)
sin(πξ) dξ
= 1√2
(sinc
[(m2 − 1
)π]− sinc
[(m2 + 1
)π]). (14.23)
Note that we changed the variable of integration to ξ ≡ 2x/L. Note that for m even but
m 6= 2, this formula gives Cm = 0. We also have C2 = 1/√
2. There’s no real point to
simplifying this when m is odd. We have
ψ(x) =∞∑n=1
Cnψn(x). (14.24)
Below is a diagram of the wavefunction and its expansion. The blue is ψ(x) and the purple
is the result of adding the first 10 terms in the expansion. Of course, the fit is not perfect
because the sum must go to ∞, but it’s not bad for just the first 10 terms.
Now, we can evolve this state through time:
ψ(x, t) =∞∑n=1
Cne−iωntψn(x). (14.25)
We can take the complex square of this (i.e. multiply it with its complex conjugate) and
the result is supposed to be the probability density, P (x, t). Where P (x, t) is big is where
the particle is likely to be found if a measurement of its position is to be made. Below are
snapshots of P (x, t) at various moments in time. Notice that the particle tends to swish
– 55 –
back and forth from left to right and back again. We have shown half a period, where the
particle starts from being just on the left half to being just on the right half. This takes
t = π/ω1 worth of time.
By the way, this state has no well-defined energy. However, it does have an average energy.
The interpretation is that if one were to prepare a very very large number of identical
systems all in this initial state and then one were to take a measurement of the energy
for all the identical setups, one would get different measurements for each setup, but the
average energy is well-defined. This average energy is just the sum of the products of the
probability for the particle to be in the wavefunction ψn and the energy of that state, En:
〈E〉ψ =∞∑n=1
|Cn|2En. (14.26)
Since Cn is the coefficient of the wavefunction, ψn, in ψ, its complex square is the probability
for the particle to be in the wavefunction ψn. Note that this is an average energy as
described earlier. It is not the energy of the state. The state does not have a well-defined
energy. This is in contrast to the energy eigenstate with wavefunction ψn(x). If we prepared
a large number of identical systems all in the initial wavefunction ψn(x) and we measured
the energy of each system separately, we would always measure En.
It might be tempting to define 〈ω〉ψ ≡ 〈E〉ψ/~ and then say that the time evolution of
ψ(x) is simply
ψ(x, t) = e−i〈ω〉ψtψ(x) (INCORRECT!). (14.27)
However, this is incorrect because we cannot interpret 〈E〉ψ as the energy of the wavefunc-
tion ψ(x). This wavefunction does not have a well-defined energy. The only way we can
time-evolve the state is to write it as a superposition of the basis of energy eigenstates and
then time-evolve each piece separately, as in Eqn. (14.25).
– 56 –
15. Final Review
15.1. Human Eye Optics
Let’s consider the optics of human eyes. A human eye can be simplified as one convex lens
projecting images on to a screen (the retina). The focal length of the human eye lens is
variable. Let’s assume that the distance between the retina and the eyeball is 25 mm, and
the diameter of the pupil (which is the effective diameter of the lens of an eye) is 3 mm.
(a) If you are reading a book that is 300 mm away from your eyes and an arrow of 1 cm
size on the book forms a clear image on your retina, what is the focal length of the
lens of your eye? What is the actual image size of the arrow on your retina?
(b) What is the smallest object you can identify on the book based on the diffraction limit
of the eye? Assume the illumination light wavelength to be 600 nm.
(c) In order to achieve the diffraction limited resolution in 2, how small must the “pixel”
on your retina be?
SOLUTION:
(a) The lens equation reads 1f = 1
so+ 1
si, and so
f =sosiso + si
=(300 mm)(25 mm)
(300 + 25) mm= 23.1 mm .
The transverse magnification is
MT = − siso
= − 25 mm
300 mm= − 1
12= −0.0833.
The negative sign means the image is up-side-down. The size of the image on the retina
is
|yi| = |MT |yo =1
12· 1 cm =
1
12cm = 0.833 mm .
(b) The angular resolution is given by the Rayleigh criterion:
∆θ = 2.44λ
d,
where d is the diameter of the pupil.
This converts to a spatial resolution, ∆x, on the book by use of the small angle
approximation: ∆x = L∆θ, where L = 300 mm is the distance from the eye to the
book (also the object distance). Thus,
∆x = 2.44λL
d= 2.44
(6× 10−4 mm)(300 mm)
3 mm= 0.15 mm .
– 57 –
(c) According to part (a), the image size for an object the size in part (b) is
|yi| = |MT |yo =1
12· 0.15 mm = 12 µm .
That’s a very small pixel! However, if one rod is to serve as one pixel, measurements
done on some animals show that rods are on the order of microns in size.
15.2. Optical Fiber
An optical fiber can be considered as a glass waveguide guiding light through total internal
reflection. Light can be coupled in from the end surface of the fiber. What is the range of
angle θ of the input light so that it can be guided in the fiber? (nglass = 1.4).
SOLUTION:
Consider the following diagram:
In order to get total internal reflection at θ2, we must have
sin θ2 ≥ 1n ,
where n = 1.4 is the index of refraction of the glass. We have taken this optical fiber to be
surrounded by air with index of refraction ≈ 1. Then,
cos θ2 =√
1− sin2 θ2 ≤√
1− 1n2 = 1
n
√n2 − 1.
Since θ1 = π2 − θ2, we have
sin θ1 = cos θ2 ≤ 1n
√n2 − 1.
Using Snell’s law, we get
sin θ = n sin θ1 ≤√n2 − 1 = 0.98 =⇒ 0 ≤ θ ≤ 78◦ .
– 58 –
15.3. Modified Michelson Interferometer
The Michelson interferometer in the diagram has a birefringent plate in one arm. The
birefringent plate has a thickness of 20 µm and refractive index difference between the high
and low refractive index directions is 0.01.
A beam of y-polarized light with wavelength 800 nm is incident on the Michelson
interferometer. Initially, the high-refractive-index direction of the birefringent plate is
along the y-direction and an interference pattern shown in the diagram is generated at the
screen. Points A, B and C denote the interference first maximum, first minimum, and
second maximum in the pattern, respectively.
Describe the interference pattern under the following conditions? Give your reasoning.
(a) The birefringent plate is rotated by 45◦ from its original position.
(b) The birefringent plate is rotated by 90◦ from its original position.
Now the wavelength of the light is changed to 400 nm. Again, in terms of I0, what will
be the light intensity at positions A, B and C under the following conditions? Give your
reasoning.
(c) The birefringent plate is returned to its original position (high refractive index direction
along the y-direction).
(d) The birefringent plate is rotated by 45◦ from its original position.
(e) The birefringent plate is rotated by 90◦ from its original position.
[Hint: First determine what kind of waveplate the birefringent material is for 800 nm and
400 nm light, respectively.]
– 59 –
SOLUTION:
(a) Let us assume, for simplicity, that the bright bands near the center have roughly the
same intensity, so that the intensity at C in the diagram in the problem is also I0.
As hinted in the problem, we should first work out what type of wave-plate the
birefringent plate is for 800 nm and for 400 nm. The number of wavelengths that fit
in a plate of index of refraction n and thickness t is
N =t
λ/n=tn
λ.
Therefore, the difference in the number of wavelengths that fit in the plate for the fast
and slow directions of the waveplate is
∆N =t∆n
λ=
(2× 10−5 m)(10−2)
λ=
2
λ/(100 nm).
For λ = 800 nm, we have ∆N = 1/4 and so the waveplate is a quarter-waveplate (qwp)
and for λ = 400 nm, we have ∆N = 1/2 and so the waveplate is a half-waveplate (hwp).
After the qwp, the light is circularly polarized. Reflection off of the mirror does not
change the rotation of the circular polarization, but it does reverse the direction of
propagation of the light. Thus, left circular polarization (lcp) turns into right circular
polarization (rcp) and vice-versa. Thus, when the light passes through the qwp again,
it becomes linearly polarized light, but polarized in the x-direction rather than the
y-direction. Upon reflection off of the center half-silvered mirror, this x-polarization
becomes z-polarization. Meanwhile, the light from arm 1 remains y-polarized. Hence,
the two light beams are polarized in different directions and cannot interfere. We do
not observe rings, just one big bright spot.
(b) The beam in arm two will be phase shifted less than before by a quarter of a wavelength
for each time it passes through the qwp. That is a total of half a wavelength. Therefore,
where the interference used to be constructive, it will now be destructive and vice-versa.
A and C will now be dark and B will be bright.
(c) Twice passing through a qwp produces the image in the problem. Twice passing
through a hwp produces a phase shift that is twice as large as with the qwp. The
bright fringes in the pattern when the plate is a qwp occur when the relative phase
shift between the two arms is 2πn for an integer n and the dark fringes are when it
is 2π(n + 1
2
). If we multiply either one of these by 2, we will always get an integer
multiply of 2π. Thus, the bright fringes in the pattern when the plate is a hwp occur
at both the bright and the dark fringes in the pattern when the plate is a qwp. Of
course, there will be dark fringes in between. In other words, the pattern when the
plate is a hwp is twice is tight as when the plate is a qwp. A, B and C will all be
bright with one dark fringe in between A and B and in between B and C.
– 60 –
(d) Any component of the light perpendicular to the high-index-of-refraction axis of the
hwp will be phase shifted less relative to the component parallel to this axis by π
each time it passes through the hwp, which adds up to 2π (going twice through the
hwp). That’s the same as no phase shift at all. Therefore, there will be absolutely no
difference if we rotate the hwp by any angle: same as (c).
(e) Same as (c) for the reason stated in (d).
15.4. Diffraction Grating
Diffraction gratings can separate different wavelengths into different directions. They can
be understood as multiple slits structures. Consider a grating with 600 lines (i.e. slits)
per mm. Now we shine a red beam with wavelength 632 nm at normal incidence on to the
grating.
(a) How many strong outgoing beams will be observed? (Hint: the largest diffraction angle
will be 90◦ in this case). What are there respective outgoing angles?
(b) The outgoing beam with the smallest non-zero angle is called the first order diffraction
beam. Now, we have two incident light beams with wavelengths at 632.00 nm and
632.01 nm, respectively. What is the angular separation between their first order
diffraction beams?
SOLUTION:
(a) The condition for strong maxima for the diffraction grating is the same as that for the
double-slit: d sin θm = mλ, where d is the separation distance between the centers of
successive slits. Solve for m:
m =d sin θ
λ≤ d
λ=
(1/600) mm
632 nm= 2.64.
Note that we used the fact that, no matter what θ is, sin θ is always ≤ 1. Since m is
an integer, we conclude that the largest value for m is
mmax = 2 =⇒ There will be 5 strong outgoing beams.
These five correspond to m = 0, m = ±1 and m = ±2. The angles are θm =
sin−1(mλ/d):
θ0 = 0◦, θ±1 = ±22.3◦, θ±2 = ±49.36◦ .
(b) Since the differences are going to be very small, we would need quite a few significant
figures to calculate this on a calculator. Instead, we will write θ(1)1 for the angle of the
first order diffraction beam for wavelength λ(1) = 632.00 nm and θ(2)1 for λ(2) = 632.01
nm. Define ∆λ ≡ λ(2)−λ(1) = 0.01 nm = 10 pm (pico-meters). Define ∆θ1 ≡ θ(2)1 −θ
(1)1 ,
which we know is very small. Thus, we may Taylor expand sin θ(2)1 around θ
(1)1 :
sin θ(2)1 ≈ sin θ
(1)1 + (∆θ1) cos θ
(1)1 .
– 61 –
By our formula from part (a),
sin θ(1)1 =
λ(1)
d, sin θ
(2)1 =
λ(2)
d.
The second equation is expanded as
sin θ(1)1 + (∆θ1) cos θ
(1)1 =
λ
d+
∆λ
d.
Using the previous equation for θ(1)1 , we get
(∆θ1) cos θ(1)1 =
∆λ
d.
Thus,
∆θ1 =∆λ
d cos θ(1)1
=10−11 m
((1/600)× 10−3 m) cos 22.3◦= 6.5× 10−6 rad = (3.7× 10−4)◦ .
Your calculator could probably have given you the angles to within more than four
decimal places, in which case you could have just calculated θ(2)1 as you did θ
(1)1 and
taken the difference. If you do the calculation the way I did above, you need to
remember that the ∆θ1 you get just from plugging in the numbers is going to be in
radians, not in degrees. I had to convert to degrees at the end.
15.5. Optical Spectroscopy
(30 points) Optical spectroscopy is widely used to determine the properties of materials.
The figure below is a reflection spectrum from a thin transparent film. It displays the
reflectivity of the thin film as a function of light frequency for normal incident light. Based
on this spectrum, determine the thickness and refractive index of the thin film.
– 62 –
SOLUTION:
The thin film is surrounded by air. Let ray 1 be the ray that reflects off of the front
(air-to-film) interfrace and ray 2 the one that reflects off of the back (film-to-air) interface.
Then,
ϕref,1 = π, ϕref,2 = 0, ∆ϕref = −π.
The difference in path between 1 and 2 is that in addition to the path of 1, ray 2 goes
through the thickness, t, of the film twice. Let n be the index of refraction of the film.
Setting ϕpath,1 = 0, we have
∆ϕpath =2π
λ/n· 2t =⇒ ∆ϕtot =
(4nt
λ− 1
)π.
Let us write this in terms of frequency instead of wavelength. We write 1λ = ν
c . Thus,
∆ϕtot =
(4ntν
c− 1
)π.
For destructive interference, we set this equal to (2m− 1)π, for some integer m. Thus,
2ntν
c= m =⇒ 2nt∆ν
c= ∆m = 1,
where ∆ν = 12.5 × 1012 Hz, is the frequency separation between adjacent minima, and
∆m = 1 since m increases in unit steps (it is always an integer). Thus,
nt =c
2∆ν= 1.2× 10−5 m. (15.1)
We will work out the value of n by calculating the maximum reflectivity. We can think
of rays 1 and 2 as being two separate light sources, each of intensity RI0, where I0 is the
intensity of the incident light. Technically, the second light ray has intensity TRTI0 =
(1− R)2RI0. However, assuming RI0 instead, we will find R ≈ 0.1, which we will assume
is sufficiently small to justify the approximation. This is just a simplifying assumption;
you could very well do the more exact calculation if you wished. The rays are polarized
the same way, so we can write the total measured intensity from the superposition of the
two rays as
I = I1 + I2 + 2√I1I2 cos ∆ϕ ≈ 2RI0(1 + cos ∆ϕ),
where we calculated ∆ϕ earlier. Let us derive this result about adding intensities. Recall
that the intensity is proportional to the (complex) square of the total electric field: I =ε0c2 |E|
2. The proportionality constant will not really matter here, but there it is anyway.
The reflectance, r, tells you how much of the electric field gets reflected. Therefore, the
amplitude of the electric field in ray 1 is rE0 and in ray 2 is approximately also rE0, where
E0 is the incident electric field. However, the rays have different phases relative to each
other. The phase of ray 2 relative to ray 1 is ∆ϕ. Therefore, if the electric field in ray 1
– 63 –
is represented by rE0, then the electric field in ray 2 is rE0ei∆ϕ. The total electric field in
the sum of the two is rE0
(1 + ei∆ϕ
). The total intensity is proportional to the square of
this total electric field:
I =ε0c
2|r|2|E0|2
(1 + ei∆ϕ
)(1 + e−i∆ϕ
)= 2RI0
(1 + cos ∆ϕ
),
where the reflection coefficient, R, is the square of the reflectance, R = |r|2, and where we
have denoted ε0c2 |E0|2 as I0, the incident intensity.
At total constructive interference, ∆ϕ = 2mπ, for some integer π, and so I = 4RI0.
Therefore, the reflectivity of the film at constructive interference is 4R, that is four times
the reflectivity of just one of the air-film interfaces. According to the graph, this reflectivity
is equal to 0.4. Therefore, R = 0.1. But, R is just the square of the the reflectance, whose
formula we are given, and which simplifies at normal incidence to r = 1−n1+n :
(n− 1
n+ 1
)2
= r2 = R = 0.1 =⇒ n =1 +√
0.1
1−√
0.1= 1.925 .
Plugging this n into Eqn. (15.1) gives the thickness:
t = 6.23 µm .
15.6. Relativity and Current-Carrying Wires
Recall from E&M that an infinite straight wire containing a linear charge density, λ, gen-
erates an electric field whose magnitude, as a function of the radial distance, r, from the
wire, is E = λ2πε0r
.
Recall, as well, that an infinite straight wire carrying a current, I, generates a magnetic
field whose magnitude is B = µ0I2πr . The direction of the magnetic field rotates around the
wire in a right-handed fashion (if your right thumb points in the direction of the current,
then your right fingers wrap around the wire in the direction of the magnetic field).
If a positive charge, q, is moving with speed, v, parallel to a current-carrying wire a
distance r away, then the charge experiences a magnetic force with magnitude Fm = qvB =
qv µ0I2πr . The force is attractive if the point charge moves parallel to the current and repulsive
if it moves anti-parallel (opposite). In either case, the point charge starts accelerating in
the radial direction.
So far, we have been discussing the picture in the “lab frame”. What if we were to
consider the picture from the frame that is moving relative to the lab frame along with
the point charge, so that the point charge looks to be at rest in the horizontal direction
in this frame. Call this frame, S′, the horizontal rest frame of the point charge. In this
frame, the charge is not moving initially. Therefore, it cannot experience a magnetic force.
Yet, according to our analysis in the lab frame, it has to start accelerating in the radial
direction. Let’s see how.
(a) The wire is made up of an immobile lattice of heavy positive ions and a sea of free
mobile electrons. Suppose that each atom gives up one electron so that each ion has
– 64 –
charge +e. The wire is neutral in the lab frame and so the linear charge densities of
the positive ions and the electrons are λ and −λ, respectively. On average, how far
apart along the wire are adjacent positive ions or adjacent electrons as measured in
the lab frame?
(b) Let v be the speed of the electrons along the wire. Calculate the current.
(c) The external point charge is at a radial distance, r, from the wire. For simplicity,
suppose that it is moving with the same speed, v, and direction as the mobile electrons
in the wire. Remember that the direction of current is opposite to the direction of mo-
tion of the electrons. Therefore, this is the case when the external point charge moves
opposite to the current and thus experiences a repulsive magnetic force. Calculate this
magnetic force in the lab frame.
(d) In the frame S′, the external point charge and the electrons in the wire are at rest
(since they all have the same velocity). The positive ions are now moving backwards
with speed v. In S′, on average, how far apart along the wire are adjacent positive
ions? What about adjacent electrons? What is the net charge density measured in this
frame? [Note: Charge is a relativistic invariant.]
(e) Use your result from part (d) to calculate the force experienced by the external point
charge in the frame S′. Show that it also points radially away (i.e. is repulsive), but
is now purely electric rather than purely magnetic. You should find that the force in
S′ is bigger than in the lab frame. Can you think of a reason why this should be the
case? [Hint: Think time dilation. Note: You just need to argue why the force in S′
should be bigger than in S; you need not explain the factor.]
SOLUTION:
(a) The distance between adjacent positive ions is d+ = e/λ+ = e/λ . This is also the
distance between adjacent electrons: d− = −e/λ− = e/λ , since λ− = −λ.
(b) The linear density of the electrons is n = 1/d− = λ/e since there is one electron per
d− length of wire. The current is I = n(−e)v = −λv . The minus sign just reminds
us that the current is in the opposite direction to the motion of the electrons.
(c) Fm = qv µ0I2πr = µ0qλv2
2πr = β2qλ2πε0r
, where β ≡ v/c and we used the fact that µ0c2 = 1/ε0.
(d) Since the positive ions are at rest in the lab frame, d+ is their proper separation. Their
separation measured in S′ would be contracted by a factor of γ =[1 − (v/c)2
]−1/2.
Thus, d′+ = d+/γ = e/γλ . The situation for the electrons is exactly the opposite.
They are at rest in S′, so their separation in S′ is their proper separation. Their sep-
aration in the lab frame, which is d− = e/λ, is contracted relative to their proper
separation. Thus, the separation of the electrons in S′ is bigger than in the lab
frame: d′− = γd− = γe/λ . The wire is no longer neutral when viewed in S′: the
– 65 –
positive linear charge density is λ′+ = e/d′+ = γλ, while the negative charge density
is λ′− = −e/d′− = −λ/γ. These do not cancel anymore: the net charge density is
λ′net = λ+ + λ′− =(γ − 1
γ
)λ = γβ2λ .
(e) Even though the external charge does not feel any magnetic force in the frame S′,
the wire is now positively charged with charge density γβ2λ in this frame. Therefore,
it produces an electric field with magnitude γβ2λ2πε0r
radially outwards. This exerts a
repulsive force on the external point charge equal to the charge times the electric field:
F ′e = γβ2qλ2πε0r
. Note that the force is repulsive both in the lab frame and in the frame S′,
but the force in S′ is bigger by a factor of γ. This makes sense because S′ measures the
initial proper time for the external charge. The time in the lab frame will be dilated
relative to this by a factor of γ. In S′, the force measured on the charge is greater than
in S, but the time for acceleration is also shorter.
15.7. Pi Decay
Neutral meson π has a rest mass of 135 MeV/c2 and a half-life of 8.2 × 10−17 s. In
one experiment, high energy π mesons are generated. Then each meson decays into two
photons: π → γ + γ. Consider the following questions in the lab frame.
(a) After traveling 10−6 m, only one percent of the π mesons are left. Calculate the
velocity, kinetic energy and momentum of the generated π mesons.
(b) If the two γ photons are produced in the forward and backward directions, respectively,
what are the energies of the two photons?
SOLUTION:
(a) Let τ ′ = 8.2 × 10−17 s be the proper half-life of the mesons (measured in their own
rest frame). Let v be their speed measured in the lab frame, with corresponding
β and γ factors. Then, the apparent half-life measured in the lab frame is dilated
relative to the meson rest frame: τ = γτ ′. Suppose there are N0 mesons at the start.
Then, the number of un-decayed mesons remaining after time t in the lab frame is
N = 2−t/τN0. In the same time, the mesons will have traveled a distance d = βct
and so we can write N as a function of d instead of t by solving for t in terms of d
as t = d/βc; that is, N = 2−d/βcτN0. Let α = 1/100 be the fraction of remaining
mesons after the mesons travel a distance d = 10−6 m measured in the lab frame.
Then, α = 2−d/βcτ = 2−d/βγcτ′
= e−d ln 2/βγcτ ′ . Solving for βγ gives
v/c√1− (v/c)2
= βγ = − d ln 2
cτ ′ lnα= 6.12
Do not be thrown off by the minus sign; lnα is negative since α is a number less than
1. Now, solving for v gives
v =
− d ln 2cτ ′ lnα√
1 +(d ln 2cτ ′ lnα
)2 c = 0.987 c = 2.96× 108 m/s .
– 66 –
The momentum is
p = βγmπc = − d ln 2
cτ ′ lnαmπc = 826 MeV/c .
The kinetic energy is
T =√
(pc)2 + (mπc2)2 −mπc2 = 702 MeV .
(b) Method 1: Let E1 and E2 be the energies of the forward- and backward-moving
photons, respectively. The energy and momentum of the pion and the two photons are(Eπ/c
pπ
)=
(γmπc
βγmπc
),
(E1/c
p1
)=E1
c
(1
1
),
(E2/c
p2
)=E2
c
(1
−1
).
Conservation of energy and momentum reads
E1 + E2 = γmπc2, E1 − E2 = βγmπc
2.
Adding the two equations and dividing by 2 gives E1. Subtracting the second from the
first and dividing by 2 gives E2:
E1 =
√1 + β
1− βmπc
2
2= 831 MeV, E2 =
√1− β1 + β
mπc2
2= 5.48 MeV.
Method 2: We could use the energy conservation equation from the previous method,
which reads E1 + E2 = γmπc2, and couple it with the calculation of the relativistic
invariant for the pion. The relativistic invariant for the pion is
E2π − p2
πc2 = (γmπc
2)2 − (βγmπc)2c2 = m2
πc4.
The relativistic invariant for the two photons together is
(E1 + E2)2 − (p1 + p2)2c2 = (E1 + E2)2 − (E1 − E2)2 = 4E1E2.
Therefore,
m2πc
4 = 4E1E2.
Indeed, if you multiply the expressions for E1 and E2 found in method 1, you get14m
2πc
4, which is consistent with the above equation. So, our two equations, for the
two unknowns, E1 and E2, are
E1 + E2 = γmπc2, E1E2 =
(mπc2
2
)2.
Solve for E2 using the first equation: E2 = γmπc2 −E1, and plug this into the second
equation. One gets a quadratic equation, which, after rearranging things, reads
E21 − γmπc
2E1 +(mπc2
2
)2= 0.
– 67 –
Complete the square by adding and subtracting a term(γmπc2
2
)2:
E21−γmπc
2E1 +(γmπc2
2
)2−(γmπc22
)2+(mπc2
2
)2=(E1− γmπc2
2
)2−(mπc22
)2(γ2−1) = 0.
We write the factor γ2 − 1 as
γ2 − 1 = 11−β2 − 1 = 1−(1−β2)
1−β2 = β2
1−β2 = (βγ)2.
Therefore, our quadratic equation for E1 reads(E1 − γmπc2
2
)2 − (βγmπc22
)2= 0.
Using the standard factorization of the difference of two squares, we get[E1 − γ(1 + β)mπc
2
2
][E1 − γ(1− β)mπc
2
2
]= 0.
The first root is the same as the E1 we found in method 1 since γ(1 + β) =√
1+β1−β .
The second root is the E2 we found earlier. We know to pick the first solution for E1
because it is the larger of the two and the forward-moving photon had better have the
higher energy.
Method 3: In the rest frame of the pion, the energy and momentum of the pion and
the photons are(E′π/c
p′π
)= mπc
(1
1
),
(E′1/c
p′1
)=mπc
2
(1
1
),
(E′2/c
p′2
)=mπc
2
(1
−1
).
The photons must go off in opposite directions with the exact same magnitude of
momentum because, in the rest frame of the pion, the initial momentum is 0. Since
the photons have the same magnitude of momentum, they have the same energy, which
is half the rest-mass energy of the pion since that is the initial energy in the rest frame
of the pion.
We simply have to transform back to the lab frame. For example, for the forward-
moving photon: (E1/c
p1
)=
(γ βγ
βγ γ
)mπc
2
(1
1
)= γ(1 + β)
mπc
2
(1
1
).
Writing γ(1+β) =√
1+β1−β shows that this gives the same energy, E1, as found previously.
Similarly, one can find E2.
– 68 –
15.8. Relativistic Doppler Effect
Suppose you direct a laser beam with frequency f0 at an atom moving towards you with a
velocity u.
(a) What is the light frequency felt by the atom in its rest frame.
(b) The atom will be driven by the laser beam and re-radiate. What is the frequency of the
light radiated by the atom in its reference frame? If you observe this atom radiation,
what light frequency will you see? What is the corresponding light wavelength?
(c) Suppose you now direct the laser beam towards a mirror moving towards you with a
velocity u. What will be the light frequency that you observe? Briefly explain why.
SOLUTION:
(a) The frequency is Doppler shifted upwards (i.e. it should increase) because the atom is
moving towards you, the initial source. The atom sees a frequency, f ′, given by
f ′ =
√1 + u
c
1− uc
f0 .
(b) When the light hits the atom, the oscillating electric field will deform the electron
clouds giving the atom a dipole moment that is oscillating at the same frequency as
the light. The frequency of dipole radiation is the same as the frequency of oscillation
of the dipole. Hence, the atom radiation has frequency f ′rerad = f ′ .
Now, the atom becomes the source of the radiation and it is moving towards you,
which means that the frequency you observe is Doppler shifted upwards relative to f ′.
This shift has the same factor, assuming the atom slows down a negligible amount (due
to radiation pressure). You observe a re-radiation frequency, frerad, of
frerad =
√1 + u
c
1− uc
f ′rerad =
(1 + u
c
1− uc
)f0 .
The corresponding wavelength is
λrerad =c
frerad=
(1− u
c
1 + uc
)c
f0.
(c) The mirror is nothing more than a large collection of atoms all of which do exactly the
same thing as the one atom that this problem has been about until now. Thus, the
frequency observed will be exactly the same as that for just one atom. We could use the
fact that in the mirror’s rest frame, the reflected wave has the exact same frequency
(energy) as the incident wave, just with the exact opposite momentum. However,
that fact is derived microscopically from the induced oscillating dipoles on the mirror
surface, anyway, which is what we have done here.
– 69 –
15.9. Quantum Tunneling and Frustrated Total Internal Reflection
A plane wave with wavevector k is sent in from x = −∞ traveling to the right towards a
step potential,
V (x) =
{0, x < 0,
V0, x > 0.
(a) What is the value of V0 above which the region x > 0 is classically forbidden? Assume
that this is the case for the rest of the problem.
(b) The time-independent wavefunction in the region x < 0 (Region I) is given by
ψI(x) = Aeikx +Be−ikx,
where the A term represents the incoming plane wave moving to the right and the B
term represents the reflected plane wave moving to the left. Justify the nomenclature
here: why can we call these plane waves and why is the A term moving to the right and
theB term moving to the left? [Hint: Determine the full time-dependent wavefunction.]
(c) Determine the general form of the time-independent wavefunction in the region x > 0
(Region II), ψII(x).
(d) Calculate the transmission coefficient, defined to be the square of the ratio of the
amplitudes of the transmitted and incident plane waves:
T ≡∣∣∣∣FA∣∣∣∣2.
Also calculate the reflection coefficient R ≡ |B/A|2 and check that R+ T = 1.
CAUTION: Thanks to Carlin for reminding me of the following. The transmission
coefficient above should actually be multiplied by a factor of the ratio of the transmitted
and incident wavevectors, ktransmittedkincident
. For this problem, this doesn’t make a difference
because this ratio is equal to 1, but in general you have to keep this factor in. This is
because the transmission and reflection coefficients are defined to be the ratio of the
transmitted and reflected fluxes to the incident flux. That is, the ratio of the rates
of particle flow in the transmitted and reflected beams relative to the incident beam.
Therefore, we need to multiply the square amplitudes by the speed of propagation, and
then take the ratio. Since reflected and incident waves have the same wavevector and
speed, we never need to worry about this as far as R is concerned. However, if V in the
transmitted region is not the same as V in the incident region, then the wavevectors
in the transmitted and incident regions will be different. To be honest, partly because
of this complication, I hardly ever calculate T directly. Instead, I calculate R and then
T is just 1−R.
(e) Despite your answer to part (d), the wavefunction is not zero in the region x > 0. If
the barrier has finite extent, it is possible for the incoming particles to tunnel through
– 70 –
the barrier to the other side. Let the barrier have length L so that for x > L, the
potential once again vanishes. Write down the general form of the time-independent
wavefunction in the region 0 < x < L (Region II), ψII(x), and in the region x > L
(Region III), ψIII(x). Write down the equations you would need to solve in order to
calculate the transmission and reflection coefficients. (I’m not asking you to actually
solve them).
Note: This is similar to the phenomenon of frustrated total internal reflection. In this
case, if you bring a refractive material close to the boundary of another where you have
total internal reflection set up, then it is possible to get some light to tunnel through the
air gap and come out the other side! Below is an image showing this effect. The green
laser comes in from the right through the first prism at an angle that should make the
beam be totally internally reflected. There is a small air gap between the triangular and
the eye-shaped prisms. Nevertheless, some of the light is able to cross that gap and emerge
in the eye-shaped prism.
Image source: University of Vermont http://www.uvm.edu/~dahammon/
SOLUTION
(a) The kinetic energy is related to the wavevector via
T =~2k2
2m.
Since V = 0 in the region x < 0, the total energy is simply equal to the kinetic energy
there:
E =~2k2
2m.
The region x > 0 is classically forbidden if the potential energy there is greater than
the total energy E, since that technically means that the kinetic energy is negative!
– 71 –
Thus, the region x > 0 is classically forbidden if
V0 > E =~2k2
2m. (15.2)
(b) The full time-dependent wavefunction in region I is
ΨI(t, x) = e−iEt/~ψ(x) = Ae−i(ωt−kx) +Be−i(ωt+kx), (15.3)
where
ω =E
~=
~k2
2m. (15.4)
Now, these really are plane waves. If we track the zero phase (when the exponent is
zero), for the A term, as t increases the x position of the zero phase point also increases.
Therefore, this plane wave moves to the right. The opposite is true for the B term,
which therefore moves to the left.
(c) The solutions to the time-independent Schrodinger equation are real complex exponen-
tials, rather than complex exponentials. These are growing and decaying exponentials.
However, since the wavefunction cannot blow up as x→∞, the only allowed solution
is the exponentially decaying one:
ψII(x) = Ce−κx, where κ =
√2m(V0 − E)
~=
√2mV0
~2− k2 . (15.5)
(d) Both ψ and ψ′ (the derivative of ψ) must be continuous everywhere. In particular, we
must match ψ and ψ′ in regions I and II at x = 0:
A+B = C, (15.6a)
ik(A−B) = −κC. (15.6b)
Eliminate C and solve for B/A:
B
A= −κ+ ik
κ− ik=⇒ R =
∣∣∣∣BA∣∣∣∣2 = 1 .
That is, we have 100% reflection, despite the fact that ψII(x) 6= 0! The wavefunction
in the classically forbidden region is called the evanescent wave.
(e) Now, we are allowed to have both the exponentially decaying as well as the exponen-
tially growing solutions in region II because it is just a finite region. Meanwhile, the
solution in Region III is again the same plane wave solution as in region I. However,
we only want the transmitted wave; there is no incoming wave from the right. Thus,
ψII(x) = Ce−κx +Deκx, ψIII(x) = Feikx. (15.7)
– 72 –
Now, we have four equations: matching ψ and ψ′ both at x = 0 and at x = L:
A+B = C +D, (15.8a)
ik(A−B) = −κ(C −D), (15.8b)
Ce−κL +DeκL = FeikL, (15.8c)
−κ(Ce−κL −DeκL) = ikFeikL. (15.8d)
Eliminate C from the first two, the middle two, and the last two equations:
(κ+ ik)A+ (κ− ik)B = 2κD, (15.9a)
ik(A−B)e−κL + κFeikL = κD(eκL + e−κL), (15.9b)
2κDeκL = (κ+ ik)FeikL. (15.9c)
Eliminate D:
(κ+ ik)AeκL + (κ− ik)BeκL = (κ+ ik)FeikL, (15.10a)
ik(A−B)e−κL + κFeikL = (κ+ ik)FeikLeκL + e−κL
2. (15.10b)
Eliminate B and solve for F/A. After a lot of algebra,
F
A= − 4ikκeikL
(κ− ik)2eκL − (κ+ ik)2e−κL. (15.11)
After a lot more algebra, and plugging in the expressions for k and κ in terms of E
and V0, one finds the transmission coefficient
T =
∣∣∣∣FA∣∣∣∣2 =
[1 +
(mV0
~2
)2 sinh2(κL)
kκ
]−1
. (15.12)
This is the tunneling probability and it is not zero! It has the correct behavior that
T → 0 as V0 →∞ or L→∞.
15.10. Wavefunction Shapes
Below is picture of an infinite potential well with a non-flat bottom. Explain your answers
to the following questions.
(a) For some arbitrary allowed energy, E, rank positions A, B and C by the classical
kinetic energy of the particle at these positions from largest to smallest.
(b) Repeat for de Broglie wavelength.
(c) Repeat for the amount of time a classical particle spends traversing an interval of width
δx at each position.
– 73 –
(d) Repeat for the spacings between the zeros of the wavefunction in the regions near each
point. Assume that the energy level is sufficiently high that the wavefunction oscillates
many times between the two walls.
(e) Repeat for the amplitude of the wavefunction in the region near each point.
(f) Sketch a plausible wavefunction for some high energy level.
SOLUTION:
(a) B > C > A, since K = E − V .
(b) A > C > B, since K ∝ p2 and p ∝ λ−1.
(c) A > C > B, since the particle moves slower where it has less kinetic energy.
(d) A > C > B; same as (b).
(e) A > C > B; same as (c).
(f) We want the amplitude and wavelength to get slightly larger near the sides.
CAUTION: Thanks to Yufan for bringing the following to my attention. I said that
amplitudes and wavelengths tend be smaller in regions where the difference between
E and V is bigger. The statement about the wavelength is certainly correct. However,
the statement about the amplitude only holds for bound states; it does not hold for
plane wave states. Our intuitive arguments above for the amplitude technically require
that the particle be going back and forth many many times. This is only the case
for bound states. Then, it certainly is true that given two regions of space of the
same size, the particle is less likely to be found in the region in which it is traveling
faster. Therefore, the amplitude will tend to be smaller in those regions. However, for
– 74 –
one-dimensional scattering problems, where you send in a plane wave on the left and
then study the reflected and transmitted waves, this argument doesn’t really hold. The
particle has one pass; it does not go back and forth. So for example, the amplitude of
the transmitted wave for a step barrier is smaller than the amplitude of the incident
wave even though the wavevector, and therefore the speed, is smaller in the transmitted
region.
16. Final Exam Solutions
16.1. The Pole Vaulter Paradox
A pole vaulter is running with a pole at v =√
32 c. Her pole has a proper length of L. She
runs into a barn with proper length L2 with doors on the front and back. When the pole
vaulter runs into the barn, a farmer tries to close both front and back doors at the same
time, but only for an instant, and then reopens them.
(a) What is the length of the pole from the farmers perspective? What is the length of the
barn from the pole vaulter’s perspective? From the farmer’s perspective can he close
the barn doors at the same time? [15 pts]
(b) Are the doors closed at the same time for the pole vaulter? What is the expression
for the time interval of the door closings in the pole vaulters frame? What is the
interpretation of the sign of the expression? [10 pts]
(c) In the pole vaulters frame give an expression for what the time interval would have to
be to avoid an accident. Comparing the answers of (b) and (c), is there an accident?
[5 pts]
SOLUTION:
(a) The γ factor associated with the speed v =√
32 c is
γ =1√
1−(vc
)2 =1√
1− 34
= 2.
The length of the pole from the farmer’s perspective is
pole length to farmer =L
γ=L
2.
The length of the barn from the pole vaulter’s perspective is
barn length to pole vaulter =L/2
γ=L
4.
From the farmer’s perspective he can close the barn doors at the same time , neglect-
ing the fact that the pole is exactly the same length as the barn from his perspective
and neglecting timing and reaction time issues. In other words, there is one instant in
time in the farmer’s frame of reference when the pole is entirely within the barn.
– 75 –
(b) No, the doors are not closed at the same time for the pole vaulter . Let ∆t and ∆x
be the time interval and the spatial distance in the reference frame of the barn and
farmer between the two events (back door closing, then front door closing). Let ∆t′
and ∆x′ be the corresponding intervals in the reference frame of the pole vaulter.
In the reference frame of the barn and farmer, the two events are simultaneous and
are separated in space by the proper length of the barn:
∆t = 0, ∆x =L
2.
The invariant interval is
(∆s)2 = (c∆t)2 − (∆x)2 = 0−(L
2
)2
= −L2
4. (16.1)
In the reference frame of the pole vaulter, the two events are separated in space by the
proper length of the pole:
∆x′ = L.
Therefore,
(∆s)2 = (c∆t′)2 − (∆x′)2 = (c∆t′)2 − L2. (16.2)
Setting Eqns. (16.1) and (16.2) equal and solving for ∆t′ gives
∆t′ =
√3
2
L
c. (16.3)
We could have also used a Lorentz transformation:
c∆t′ = γc∆t+ βγ∆x = 0 +
√3
22L
2=
√3
2L =⇒ ∆t′ =
√3
2
L
c.
Our definition for ∆t′ means that if it is positive, then the back door closes before the
front door closes . This makes sense, of course, since the front of the pole reaches the
back of the barn before the back of the pole reaches the front of the barn.
(c) In the pole vaulter’s reference frame, when the front of the pole aligns with the back
of the barn, the length of the pole that is inside the barn is just the length of the barn
as measured by the pole vaulter, which is L4 . Therefore, the length of the pole outside
of the barn is 3L4 . Therefore, the minimum time interval between the back door of the
barn closing and the front door closing to avoid an accident is the time it takes for the
front of the barn to travel the remaining distance 3L4 to the front of the pole. This
time is
∆t′min =3L/4√3 c/2
=
√3
2
L
c. (16.4)
The time (16.3) is just equal to (16.4). Therefore, an accident is just about avoided .
– 76 –
16.2. Pion Decay
A positive pion decays into a muon and a neutrino, π+ → µ+ + ν. The pion rest mass
mπ = 140 MeV/c2, the muon rest mass is mµ = 106 MeV/c2, but the neutrino has a mass
mν ≈ 0. Assume that the pion starts off at rest.
(a) Using conservation of relativistic momentum and energy, find an expression for the
momentum of the muon that depends only on mπ and mµ. [10 pts]
(b) Show that the following expression is correct. [20 pts]
u
c=
(mπ/mµ)2 − 1
(mπ/mµ)2 + 1.
SOLUTION:
(a) Let pµ and pν be the magnitudes of the momenta of the muon and neutrino, respec-
tively. Since the pion starts off at rest, the initial momentum is zero. Therefore,
conservation of momentum implies that
pµ = pν . (16.5)
Since the neutrino is taken to be massless,
Eν =√p2νc
2 +m2νc
4 = pνc = pµc, (16.6)
where we plugged in (16.5) to get the final equality.
Energy conservation reads
mπc2 = Eµ + Eν =
√p2µc
2 +m2µc
4 + pµc. (16.7)
Isolating the square root on one side and squaring gives
���p2µc
2 − 2mπc3pµ +m2
πc4 =���p2µc
2 +m2µc
4.
Solving for pµ gives
pµ =m2πc
4 −m2µc
4
2mπc3=
[(mπ
mµ
)2
− 1
]mµ
2mπmµc . (16.8)
– 77 –
(b) The energy of the muon is
Eµ =√p2µc
2 +m2µc
4
=
√[(mπ
mµ
)2
− 1
]2( mµ
2mπ
)2
m2µc
4 +m2µc
4
=mµ
2mπmµc
2
√[(mπ
mµ
)2
− 1
]2
+
(2mπ
mµ
)2
=mµ
2mπmµc
2
√(mπ
mµ
)4
− 2
(mπ
mµ
)2
+ 1 + 4
(mπ
mµ
)2
=mµ
2mπmµc
2
√[(mπ
mµ
)2
+ 1
]2
=
[(mπ
mµ
)2
+ 1
]mµ
2mπmµc
2. (16.9)
Let u be the speed of the muon in the rest frame of the pion. Let β = uc and γ be the
associated gamma factor. Then,
Eµ = γmc2, pµ = γmu = βγmc.
Therefore,
u
c= β =
pµc
Eµ=
(mπ/mµ)2 − 1
(mπ/mµ)2 + 1.
– 78 –