53
Lecture 1: Fundamentals Matthew Rognlie August 12, 2013 Contents 1 Logic 3 1.1 Basic definitions ................................... 3 1.2 Some rules of logic ................................. 4 1.3 Quantifiers ...................................... 6 2 Proof techniques and concepts 9 2.1 Direct proof ..................................... 9 2.2 Proving the contrapositive ............................. 9 2.3 Proof by contradiction ............................... 10 2.4 Equivalence proofs ................................. 11 2.5 Proof by exhaustion ................................. 11 2.6 Proof by induction ................................. 12 2.7 Constructive and non-constructive proofs .................... 13 2.8 Necessity and sufficiency .............................. 14 3 Set theory 15 3.1 Set basics ....................................... 15 3.2 Algebra of sets .................................... 16 3.3 Relations ....................................... 18 3.4 Orders ........................................ 20 3.5 Functions ....................................... 22 3.6 Axiom of choice ................................... 25 3.7 Cardinality ...................................... 26 4 Fields 31 4.1 Fields, orders, and ordered fields ......................... 31 4.2 Rational numbers .................................. 34 4.3 Properties of the rational numbers ........................ 36 4.4 Real numbers .................................... 38 4.5 Properties of the real numbers ........................... 39 4.6 Complex numbers .................................. 41 1

01 Fundamentals Lecture

  • Upload
    538995

  • View
    243

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 01 Fundamentals Lecture

Lecture 1: Fundamentals

Matthew Rognlie

August 12, 2013

Contents

1 Logic 31.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Some rules of logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Proof techniques and concepts 92.1 Direct proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Proving the contrapositive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Equivalence proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Proof by exhaustion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.7 Constructive and non-constructive proofs . . . . . . . . . . . . . . . . . . . . 132.8 Necessity and sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Set theory 153.1 Set basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Algebra of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4 Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.6 Axiom of choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.7 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Fields 314.1 Fields, orders, and ordered fields . . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Properties of the rational numbers . . . . . . . . . . . . . . . . . . . . . . . . 364.4 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5 Properties of the real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.6 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1

Page 2: 01 Fundamentals Lecture

A Dedekind cut construction of the reals. 53

2

Page 3: 01 Fundamentals Lecture

1 Logic

1.1 Basic definitions

Definition 1.1 (Statement). If a sentence can be classified as true or false, it is called astatement. (For a sentence to be a statement, it is not necessary that we actually knowwhether it is true or false, but it must be true that the sentence is clearly either one or theother.)

Examples.

• 2 + 2 = 4. (True statement)

• Every continuous function is differentiable. (False statement)

• x2− 5x + 6 = 0. (Not a statement, since we haven’t defined x, and it could be eithertrue or false depending on the value of x. We will later be able to adapt this into astatement by using quantifiers.)

Definition 1.2 (Negation). Let p stand for a given statement. Then ¬p (read as “not p”)represents the logical opposite. This operation is called the negation of p. When p is true,then ¬p is false, and when p is false, then ¬p is true.

Example.

p : Today is Monday¬p : Today is not Monday

Note that double negation leaves us with the original statement: ¬(¬p)⇐⇒ p.

Definition 1.3 (Conjunction). If p and q are statements, then the statement p and q iscalled the conjunction of p and q. It is denoted by p ∧ q.

Example. Today is Monday︸ ︷︷ ︸p

and︸︷︷︸∧

the Math Camp is fun︸ ︷︷ ︸q

.

Definition 1.4 (Disjunction). If p and q are statements, then the statement p or q is calledthe disjunction of p and q. It is denoted by p ∨ q.

Example. The pencil is blue︸ ︷︷ ︸p

or︸︷︷︸∨

red︸︷︷︸q

.

Definition 1.5. [Implication] A statement of the form “if p, then q” is called an implicationor conditional statement. It is denoted by p⇒ q. 1 Formally, it is defined to be equivalentto the statement (¬p) ∨ q, so that the only way the implication p⇒ q can fail to be true isif p is true and q is false.

1Sometimes p → q is used instead, but in math we like to use the→ notation to denote limits and forvarious other purposes.

3

Page 4: 01 Fundamentals Lecture

Example. If I regularly attend Math Camp︸ ︷︷ ︸p

, then︸︷︷︸⇒

in three weeks I will be an expert at math︸ ︷︷ ︸q

.

Definition 1.6 (Equivalence). The statement “p if and only if q” is the conjunction of thetwo implications p ⇒ q and q ⇒ p; formally, it is (p ⇒ q) ∧ (q ⇒ p). A statement of thisform is called an equivalence and is denoted by p ⇐⇒ q. It is also sometimes written as“p iff q”.

Example.

You are the best basketball player in the world︸ ︷︷ ︸p

iff︸︷︷︸⇐⇒

your name is Lebron James and you play for the Miami Heat︸ ︷︷ ︸q

The following table, called a truth table, shows when the preceding five concepts yieldvalues of true or false, depending on the truth or falsity of p and q. (The left two columnsshow combinations of values of p and q, and the right columns show the values taken byeach formula given by the values p and q in the same row on the left.)

p q ¬p p ∧ q p ∨ q p⇒ q p⇐⇒ qT T F T T T TF T T F T T FT F F F T F FF F T F F T T

1.2 Some rules of logic

It is possible to derive many nice rules characterizing the logical language described inthe previous section. Here we mention a few of the most useful.

Proposition 1.7. An implication p ⇒ q is equivalent to its contrapositive ¬q ⇒ ¬p. Theconverse q⇒ p is equivalent to the inverse ¬p⇒ ¬q:

(p⇒ q)⇐⇒ (¬q⇒ ¬p) (1)(q⇒ p)⇐⇒ (¬p⇒ ¬q) (2)

Proof. We prove (1), the equivalence of the original implication and its contrapositive,using a truth table:

p q p⇒ q ¬p ¬q ¬q⇒ ¬pT T T F F TF T T F T TT F F T F FF F T T T T

4

Page 5: 01 Fundamentals Lecture

To prove (2), the equivalence of converse and inverse, replace p and q in (1) with ¬pand ¬q. Then cancel all double negations to obtain (2):

((¬p)⇒ (¬q))⇐⇒ ((¬(¬q))⇒ (¬(¬p)))((¬p)⇒ (¬q))⇐⇒ q⇒ p

We will discuss the contrapositive further in Section 2.2, where it will be used as amethod of proof.

Proposition 1.8 (De Morgan’s laws). The negation of a conjunction is equivalent to the dis-junction of the negations, and the negation of a disjunction is equivalent to the conjunction of thenegations:

¬(p ∧ q)⇐⇒ (¬p) ∨ (¬q) (3)¬(p ∨ q)⇐⇒ (¬p) ∧ (¬q) (4)

Proof. We prove (3) using truth tables:

p q p ∧ q ¬(p ∧ q) ¬p ¬q (¬p) ∨ (¬q)T T T F F F FF T F T T F TT F F T F T TF F F T T T T

To prove (4), replace p and q in (3) with ¬p and ¬q. Take the negation of both sides2,then cancel all double negations to obtain (4):

¬((¬p) ∧ (¬q))⇐⇒ (¬(¬p)) ∨ (¬(¬q))¬(¬((¬p) ∧ (¬q)))⇐⇒ ¬((¬(¬p)) ∨ (¬(¬q)))

((¬p) ∧ (¬q))⇐⇒ ¬(p ∨ q)

These laws are quite intuitive: they say that it is not true that the car is fast and blue isequivalent to the car is not fast or not blue. Similarly, it is not true that the car is fast or blue isequivalent to the car is not fast and not blue.

De Morgan’s laws in logic have a generalization using quantifiers, as we will see in thenext section, as well as a natural counterpart in set theory, as we will see in Proposition3.8. These rules turn out to be essential in dealing with sets.

Proposition 1.9 (Distributivity). Conjunction and disjunction are distributive over both them-selves and each other:

(p ∧ (q ∧ r))⇐⇒ (p ∧ q) ∧ (p ∧ r) (5)(p ∧ (q ∨ r))⇐⇒ (p ∧ q) ∨ (p ∧ r) (6)(p ∨ (q ∧ r))⇐⇒ (p ∨ q) ∧ (p ∨ r) (7)(p ∨ (q ∨ r))⇐⇒ (p ∨ q) ∨ (p ∨ r) (8)

2Note that here we are implicitly using the fact that (a ⇐⇒ b) ⇐⇒ ((¬a) ⇐⇒ (¬b)). We could easilyprove this, and many other such “obvious” facts, using truth tables. Ideally, we would develop and provea rich vocabulary of rules that can be combined to obtain virtually any other rule; but we’re teaching amath-for-economists course here, not Logic 1, and we don’t have the time.

5

Page 6: 01 Fundamentals Lecture

Equivalence (6), for instance, states that the car is blue and the car is fast or broken isequivalent to the car is blue and fast or the car is blue and broken. (Try to think about theintuition behind the other lines.) Like De Morgan’s laws, the distributivity property hasboth a generalization using quantifiers and a natural counterpart in set theory.

Finally, we have the following unsurprising proposition.

Proposition 1.10. Conjunction and disjunction are both commutative:

(p ∧ q)⇐⇒ (q ∧ p) (9)(p ∨ q)⇐⇒ (q ∨ p) (10)

and also both associative:

((p ∧ q) ∧ r)⇐⇒ (p ∧ (q ∧ r)) (11)((p ∨ q) ∨ r)⇐⇒ (p ∨ (q ∨ r)) (12)

1.3 Quantifiers

Earlier we observed that the sentence

x2 − 5x + 6 = 0

is not a statement, because x is not defined. We need to consider it in a particular contextfor it to become a statement. We usually write:

p(x) : x2 − 5x + 6 = 0

For a specific value of x, p(x) becomes a statement that is either true or false. For example,p(2) is true and p(4) is false.

We may remove the ambiguity by using a quantifier. For instance, we may write:

For every x, x2 − 5x + 6 = 0

This is a statement, since it is false. In symbols, we write

∀x : p(x)

where the universal quantifier ∀ is read “for every...”.The sentence

There exists an x such that x2 − 5x + 6 = 0

is also a statement, and it is true. In symbols we write

∃x : p(x)

where the existential quantifier ∃ is read “there exists...”. (The colon : is shorthand forthe phrase “such that”. The vertical bar | is also often used for this purpose.)

6

Page 7: 01 Fundamentals Lecture

Occasionally, we need to express uniqueness along with existence. The sentence

There exists a unique x such that x2 − 5x + 6 = 0

is a statement, and it is false, since the equation holds for both x = 2 and x = 3. Insymbols, we write:

∃!x : p(x)

There are also many ways to write the statement above without making use of ∃! notation.For instance, the following statement is equivalent to the statement above (think aboutwhy):

∃x ∀y (p(y)⇐⇒ x = y)

If we switch the order of a universal or existential quantifier and the negation opera-tion, we get the opposite kind of quantifier. (This is the quantifier generalization of DeMorgan’s laws from Proposition 1.8.)

Proposition 1.11 (De Morgan’s laws, quantifier version). The following equivalences hold:

∃x : ¬p(x)⇐⇒ ¬(∀x : p(x)) (13)∀x : ¬p(x)⇐⇒ ¬(∃x : p(x)) (14)

Example. Let p(x) be “student x loves macro”. Then (13) says that there exists a studentwho does not love macro is equivalent to it is not true that all students love macro. Similarly,(14) says that all students do not love macro is equivalent to it is not true that there exists astudent who loves macro. This is quite intuitive.

Additionally, if a statement does not depend on the variables being quantified, wecan freely move it in and out of the quantifiers. (This is the quantifier generalization ofdistributivity from Proposition 1.9.)

Proposition 1.12 (Distributivity, quantifier version). The following equivalences hold:

q ∧ (∀x : p(x))⇐⇒ (∀x : q ∧ p(x)) (15)q ∧ (∃x : p(x))⇐⇒ (∃x : q ∧ p(x)) (16)q ∨ (∀x : p(x))⇐⇒ (∀x : q ∨ p(x)) (17)q ∨ (∃x : p(x))⇐⇒ (∃x : q ∨ p(x)) (18)

Finally, the order of multiple quantifiers is an extremely common point of confusionin mathematical proofs, and indeed logical reasoning in general. Consider the followingsentence:

∃y ∀x Student x hates day y of math camp

This says that there exists some specific day y of math camp that every student x hates.This is as opposed to the following sentence, created from switching the order of the twoquantifiers:

∀x ∃y Student x hates day y of math camp

7

Page 8: 01 Fundamentals Lecture

This says that for every student x, there exists some day y of math camp that x hates.But unlike before, there could be a different day y for each student x! This statement,beginning with ∀x∃y, is in fact weaker than the first statement, which begins with ∃y ∀x. Itis implied by the first statement, but not vice versa. Such problems do not exist when weare dealing with two of the same kind of quantifier, as switching ∀x with ∀y (or ∃x with∃y) leaves the meaning of a statement unchanged. Formally:

The following implication holds for any p(x, y):

∃y ∀x : p(x, y)⇒ ∀x ∃y : p(x, y) (19)

The opposite implication does not always hold for p(x, y). In particular, for some p(x, y)we have:

∀x ∃y : p(x, y) ; ∃y ∀x : p(x, y) (20)

The following equivalences hold for any p(x, y):

∀x ∀y : p(x, y)⇐⇒ ∀y ∀x : p(x, y) (21)∃x ∃y : p(x, y)⇐⇒ ∃y ∃x : p(x, y) (22)

As your lecturer, I am more worried by the first statement about math camp than thesecond one. It’s frightening to imagine that I will give a single talk that everyone dislikes.On the other hand, it’s perfectly reasonable that everyone will dislike some talk that I give.After all, there are 11 days of this stuff.

8

Page 9: 01 Fundamentals Lecture

2 Proof techniques and concepts

We will now discuss various proof techniques. The goal here is not to define distinct“proof techniques” in some rigorous way. (Indeed, in a few instances the boundariesare fuzzy, or one technique is arguably just a special case of the other.) Instead, the ideais to convey various patterns for mathematical arguments—patterns that you will findyourself reusing over and over again.

2.1 Direct proof

Suppose we already know that A ⇒ A1, A1 ⇒ A2, . . . , An−1 ⇒ An. We may concludethat A ⇒ An. This “direct” proof combines “if, then” statements (implications) that wealready know to be true and produces a new “if, then” statement.

Example. For instance, suppose we want to prove the following statement.

If a divides b︸ ︷︷ ︸A1

and︸︷︷︸∧

b divides c︸ ︷︷ ︸A2

then︸︷︷︸⇒

a divides c︸ ︷︷ ︸A3

Proof.By definition, a dividing b means that b = ak1 for some natural number k1, and b di-

viding c means that c = bk2 for some integer k2. Combining these results, c = bk2 = ak1k2.Let k = k1k2. Now k is a natural number and c = ak, so by the definition of divisibility, adivides c.

Even this very simple proof combines several “if, then” statements, some of them implicit,to arrive at its conclusion. We apply the definition of divisibility three times (x divides y⇐⇒∃m : y = xm), along with the associative law of arithmetic when we define k. Note thatwe make no attempt here to write the actual proof in the language of formal logic; that is apain even when a proposition is simple, and becomes extremely impractical whenever weare proving anything complicated. It is important, however, to write proofs sufficientlyclear-cut that a reader could translate them into formal terms if necessary.

2.2 Proving the contrapositive

Sometimes, in order to prove an implication p ⇒ q, we prove its contrapositive ¬q ⇒¬p instead. This is possible thanks to the equivalence of p ⇒ q and ¬q ⇒ ¬p, as wedemonstrated in Proposition 1.7.

For example, the implication

If an apple is tasty︸ ︷︷ ︸p

, then︸︷︷︸⇒

it is green︸ ︷︷ ︸q

has contrapositive

If an apple is not green︸ ︷︷ ︸¬q

, then︸︷︷︸⇒

it is not tasty︸ ︷︷ ︸¬p

9

Page 10: 01 Fundamentals Lecture

The equivalence of these two statements probably seems obvious, and indeed people of-ten pass between an implication and its contrapositive without ever having seen the for-mal notion of “contrapositive”. In mathematical settings, however, sometimes the equiv-alence between an implication and its contrapositive is not so obvious, and it is useful toremind yourself of the idea.

Example. Suppose we want to prove that

If xand yare two integers for which x + y is even︸ ︷︷ ︸p

, then︸︷︷︸⇒

x and y both are either even or odd.︸ ︷︷ ︸q

Proof. It is easier to think about the contrapositive.

If one of x and y is even and one is odd︸ ︷︷ ︸¬q

, then︸︷︷︸⇒

x + y is odd.︸ ︷︷ ︸¬q

Now, either x or y is even, and the other is odd. There is no loss of generality to supposethat x is even and y is odd. Then by definition there are integers k and m for whichx = 2k + 1 and y = 2m. Then we compute the sum x + y = 2k + 2m + 1 = 2(k + m) + 1,which is odd by definition.

2.3 Proof by contradiction

Often, in order to prove an implication p⇒ q, we show that p ∧ ¬q is impossible. Gener-ally in this situation, we think of ourselves as “assuming” p and ¬q and trying to obtaina contradiction from that assumption. This is permissible because, as the following truthtable demonstrates, (p⇒ q)⇐⇒ ¬(p ∧ ¬q):

p q p⇒ q ¬q p ∧ ¬qT T T F FF T T T FT F F F TF F T T F

Example. There are infinitely many primes.

Proof. Suppose to the contrary that there are only finitely many primes, which we listas r1, r2, . . . , rn. We construct their product plus one, s = r1r2 · · · rn + 1. s is not divisibleby any of the primes r1, r2, . . . , rn and therefore must be a distinct prime itself, whichcontradicts the assumption that r1, r2, . . . , rn was the full list of primes.3

3This example is slightly confusing, because at first glance we don’t seem to be proving any implicationp ⇒ q; rather, we are simply showing that the statement q “there are infinitely many primes” is true. In asense, proof by contradiction here is just proof by double negation—showing that it is impossible that q isnot true, or ¬(¬q). But we can interpret p as a statement consisting of the other axioms or facts of number

10

Page 11: 01 Fundamentals Lecture

2.4 Equivalence proofs

When asked to prove an equivalence p ⇐⇒ q, there are two implications to prove, andusually we will provide separate demonstrations for each of them, showing that p ⇒ qand q⇒ p. Sometimes one step is much easier than the other, and proofs of the two mightbe very different.

Often we must prove that a set of more than two statements, p1, . . . , pn, are all equiv-alent. Many different approaches can be fruitful depending on the circumstances. Forinstance, if there are three statements p1, p2, p3, we might show that p1 ⇐⇒ p2 (by prov-ing each implication separately) and then p2 ⇐⇒ p3 (again by proving each implicationseparately), finally combining the two to obtain p1 ⇐⇒ p3. Another common approachis to establish a “circular” chain of implications, proving that p1 ⇒ p2, p2 ⇒ p3, and thatp3 ⇒ p1.

Even more elaborate combinations are sometimes useful. For instance, if A, B, C, D,and E below are statements, proving the six implications represented by the arrows is oneway to show that the five statements are equivalent. (This format for representing impli-cations is a special case of a more general concept called a directed graph. All we need tocheck is that we can follow the arrows from any statement to any other statement.)

A

B

C

D E

2.5 Proof by exhaustion

A proof by exhaustion, sometimes called a proof by cases, splits up the statement tobe proved into a finite number of cases and then shows that the statement holds in eachparticular case. Generally, this method of proof involves two stages:

1. A proof that the cases are truly exhaustive—that every possible instance of the state-ment to be proved matches the conditions of (at least) one of the cases.

2. A proof of the statement in each of the cases.

theory. Then the contrast between direct proof and proof by contradiction is very clear: with direct proof,we start with the truth of p and combine already-proven implications to show the truth of q, while withproof by contradiction we start with the truth of p and the falsity of q and show that something must gowrong.

11

Page 12: 01 Fundamentals Lecture

Example. Show that for any integer square n, we can write (for some integer k) eithern2 = 4k or n2 = 4k + 1.

Proof. We split into cases. (In this instance, the exhaustiveness of the cases is obvious.)

• n is even. Then we can write n = 2m for some m, and we obtain n2 = 4m2. Then theproposition is true for k = m2.

• n is odd. Then we can write n = 2m + 1 for some m, and we obtain n2 = 4(m2 +m) + 1. Then the proposition is true for k = m2 + m.

2.6 Proof by induction

In the simplest case, a proof by induction may be used to show that some property P(n)is true for all natural numbers.

Algorithm.

• First stage: Prove that the statement P(1) is true. (Base case)

• Second stage: Assume that P(n) is true, and use this to prove that P(n + 1) is alsotrue. This completes the proof. (Induction step)

The idea here is analogous to showing that a line of dominos will fall. First you needto prove that the first domino will fall, and second you need to prove that when eachdomino falls, it topples the next domino.

Example. Show that n < 2n for any integer n ≥ 1.

Proof. The base case 1 < 2 is easy. For the induction step, we assume that n < 2n

for arbitrary n ≥ 1, and then obtain 2n+1 = 2 · 2n > 2 · n ≥ n + 1. This completes theinduction, and we conclude that n < 2n for all n.

We also sometimes use a slight twist on this concept, called strong induction. (Some-times our original definition of induction is called weak induction to emphasize the dif-ference.4) Here the algorithm is

• First stage: Prove that the statement P(1) is true. (Base case)

• Second stage: Assume that P(k) is true for all k = 1, . . . , n, and use this to prove thatP(n + 1), is also true. (Induction step)

Example. Show that each positive integer can be written as a sum of distinct powers of2.

4If you’re very clever, you might realize that strong induction is actually a special case of weak induction,if we apply weak induction to the statement Q(n) : P(k) is true for all k ≤ n.

12

Page 13: 01 Fundamentals Lecture

Proof. Clearly this is true for n = 1 = 20. Now suppose that it is true for all k = 1, . . . , n,and consider n + 1. There exists some m such that 2m ≤ n + 1 < 2m+1. By the stronginduction assumption, we can represent (n + 1) − 2m as a sum of distinct powers of 2.Moreover, none of these powers will be 2m, since (n + 1)− 2m < 2m. Therefore we canalso represent n + 1 as a sum of distinct powers of 2, appending 2m to the representationof (n + 1)− 2m. This completes the induction, and we conclude that any n can be writtenas a sum of distinct powers of 2.

Although proving that some P(n) is true for all natural numbers N is the canonicalcase of induction, the principle may be adapted to other, related settings as well. Forinstance, we can start from a number other than 1 (say, 3) and after proving the n = 3base case and the induction step for any n ≥ 3, conclude that a statement is true for anyn ≥ 3. We can also move downward, establishing a base case k and an induction stepP(n)⇒ P(n− 1), and concluding that P(n) is true for all n ≤ k.

More generally, we can do induction on a sequence a1, a2, . . ., proving that P(a1) istrue for the base case an and that P(an)⇒ P(an+1) for the induction step, and concludingthat P(an) is true for any an in the sequence. (This is an immediate consequence of thebasic induction principle, tantamount to doing induction on Q(n) ≡ P(an).) Inductioncan be finite; perhaps we start at base case n = k, and our induction step P(n)⇒ P(n− 1)only works when n ≥ 1. In this case, induction still shows that P(n) holds for all 0 ≤ n ≤k. There are many other clever ways to adapt the principle of induction as well.

2.7 Constructive and non-constructive proofs

A constructive proof, or a proof by construction, demonstrates the existence of a math-ematical object with certain properties by explicitly creating or providing a method tocreate that object. In contrast, a non-constructive proof proves the validity of a proposi-tion without considering a example.

It may not initially be obvious that there is any way to “demonstrate the existence ofa mathematical object with certain properties” without actually providing such an object.In fact, this is a very common idea.

Example 1. Prove that there exists some x > 0 solving f (x) = x2 − 4x + 1 = 0.

Constructive proof. Evaluate the expression for x = 2 +√

3.

Non-constructive proof. f (2) = −3 and f (4) = 1. Since f is a continuous function,the Intermediate Value Theorem from calculus (which we will formally cover later) thenimplies that there is some x ∈ (2, 4) such that f (x) = 0.

This proof is non-constructive because it tells us that some x satisfying the conditionsin the statement exists, but does not specify (or offer a way to specify) a particular x.

Example 2 (taken from Wikipedia). Prove that there exist irrational numbers a and bsuch that ab is rational.

13

Page 14: 01 Fundamentals Lecture

Constructive proof. Take a =√

2 and b = log2 9, and observe that ab = 3. We knowthat a =

√2 is irrational. (We will prove this in a subsequent section.) log2 9 is irrational

because if it were equal to mn , then we would have 9n = 2m, which is impossible because

the left is odd while the right is even. 3 is clearly rational, making this a suitable example.

Non-constructive proof. Again use the fact that√

2 is irrational. Consider the number

q =√

2√

2. Either it is rational or it is irrational. If q is rational, then the theorem is true for

a =√

2, b =√

2. If q is not rational, then the theorem is true for a =√

2√

2and b =

√2,

since (√2√

2)√2

=√

2√

2·√

2=√

22= 2

This proof is non-constructive because we haven’t constructed a single example a, b of thetheorem’s correctness. Instead, we have demonstrated that one of two possible examplesmust be valid.5

2.8 Necessity and sufficiency

The concepts of necessity and sufficiency are not proof “techniques” per se, but they arevery important for mathematical reasoning.

Definition 2.1 (Necessary and sufficient conditions). A necessary condition for a state-ment must be satisfied for the statement to be true. Formally, a statement p is a necessarycondition for q if q ⇒ p. A sufficient condition for a statement is one that, if satisfied,ensures that the statement is true. Formally, a statement p is a sufficient condition for q ifp⇒ q.

For instance, n ≥ 0 is a necessary condition for n to be a square. Conversely, n being asquare is a sufficient condition for n ≥ 0.

We will often talk about necessity and sufficiency when discussing optimization ineconomics. Often a certain set of conditions will be necessary for optimality, and a strictlylarger set will be both necessary and sufficient for optimality. Other times, some conditionswill be merely sufficient.

5For the curious, it turns out that√

2√

2is irrational.

14

Page 15: 01 Fundamentals Lecture

3 Set theory

3.1 Set basics

We will not try to formally or axiomatically define sets. (That becomes very complicated,very quickly.) Instead, we will intuitively think of a set as any collection of well definedand distinct objects. The objects in a set are called its elements. If x is an element of setA, we write

x ∈ A

and otherwise we writex /∈ A

Some details of set notation:

1. We describe a set by writing its elements between two curly brackets. Order andrepetition do not matter.

A = {x, y} = {y, x}B = {x, x} = {x}

{{x}} 6= {x}

2. We can specify a set by a property

S = {x : P(x) is a true statement}

A = {x : x ∈N and even}= {2, 4, 6, . . . , }= {2n : n ∈N}

Some useful sets:

• N = {0, 1, 2, 3, . . .} or {1, 2, 3, . . .} set of natural numbers

• Z = {0,+1,−1,+2,−2, . . .} set of integers

• Q = {p/q : p ∈ Z, q ∈N} set of rational numbers

• R = set of real numbers, R+ = [0, ∞), R++ = (0, ∞)

• ∅ = empty set

Definition 3.1 (Subset). If every member of A is also a member of B, then we say thatA is a subset of B and write A ⊂ B. Formally, to prove that A ⊂ B, we must showx ∈ A⇒ x ∈ B.

15

Page 16: 01 Fundamentals Lecture

Definition 3.2 (Set equality). Two sets A and B are equal if A ⊂ B and B ⊂ A. We writethis as A = B. Equivalently, two sets A and B are equal if x ∈ A⇐⇒ x ∈ B.

Definition 3.3 (Power set). The set of all subsets of S is called the power set of S and isdenoted by

2S = {T : T ⊂ S}

Definition 3.4 (Index set). Suppose that for each element α in a nonempty set I therecorresponds an object xα. Then we say that the collection of all objects xα,

{xα : α ∈ I}

is an indexed collection of objects, with I as the index set.Often the objects xα under consideration are themselves sets. (Usually, in this case,

we use notation of the form Aα rather than xα to denote sets.) If so, we say we have anindexed collection of sets.

Formally, we may view an indexed collection of objects as a function from the indexset I to the set of possible objects. (This will be clearer once we define the concept offunction, later in this section.)

Recall that when we are dealing with finitely many objects, we often number themx1, . . . , xn. In doing so, we are using {1, . . . , n} as an index set. The formal concept ofindex set allows us to generalize this notation to cover collections of arbitrarily manyobjects, even when there are infinitely many, possibly too many for us to even use thenatural numbers N as an index set.

3.2 Algebra of sets

Definition 3.5 (Intersection). If A and B are two sets, then the intersection A ∩ B consistsof those elements that are in both A and B.

x ∈ A ∩ B⇐⇒ x ∈ A ∧ x ∈ B

If A ∩ B = ∅, we say that A and B are disjoint.More generally, if {Aα} is a collection of sets indexed by I, we define the intersection

to consist of those elements that are in all Aα:

x ∈⋂α∈I

Aα ⇐⇒ ∀α ∈ I : x ∈ Aα

Definition 3.6 (Union). If A and B are two sets, then the union A ∪ B consists of thoseelements that are either in A or in B.

x ∈ A ∪ B⇐⇒ x ∈ A ∨ x ∈ B

More generally, if {Aα} is a collection of sets indexed by I, we define the union to consistof those elements that are in at least one Aα:

x ∈⋂α∈I

Aα ⇐⇒ ∃α ∈ I : x ∈ Aα

16

Page 17: 01 Fundamentals Lecture

Definition 3.7 (Complement). If A and B are two sets, then the complement of B in A,also called the set-theoretic difference of A and B, is denoted by A \ B and consists ofthose elements that are contained in A but not B:

x ∈ A \ B⇐⇒ x ∈ A ∧ x /∈ B

Some useful properties for the algebra of sets follow from logical rules.

Proposition 3.8 (De Morgan’s laws, set version). Let S, T1 and T2 be sets. Then:

S \ (T1 ∩ T2) = (S \ T1) ∪ (S \ T2) (23)S \ (T1 ∪ T2) = (S \ T1) ∩ (S \ T2) (24)

More generally, if {Tα} is a collection of sets, then:

S \(⋂

α

)=⋃α

S \ Tα (25)

S \(⋃

α

)=⋂α

S \ Tα (26)

Proof. We will content ourselves with proving (23) and (24). (To prove (25) and (26), wewould use the relevant quantifier rules (13) and (14) instead.)

For (1), we observe that by definition x ∈ S \ (T1 ∩ T2) if (x ∈ S)∧¬(x ∈ T1 ∧ x ∈ T2).This is equivalent to:

(x ∈ S) ∧ (¬(x ∈ T1) ∨ ¬(x ∈ T2)) (by (3) in Proposition 1.8)((x ∈ S) ∧ ¬(x ∈ T1)) ∨ ((x ∈ S) ∧ ¬(x ∈ T2)) (by (6) in Proposition 1.9)

x ∈ (S \ T1) ∪ (S \ T2) (by definition of \and ∪)

Similarly, for (2), we observe that by definition x ∈ S \ (T1 ∪ T2) if (x ∈ S) ∧ ¬(x ∈T1 ∨ x ∈ T2). This is equivalent to:

(x ∈ S) ∧ (¬(x ∈ T1) ∧ ¬(x ∈ T2)) (by (4) in Proposition 1.8)((x ∈ S) ∧ ¬(x ∈ T1)) ∧ ((x ∈ S) ∧ ¬(x ∈ T2)) (by (5) in Proposition 1.9)

x ∈ (S \ T1) ∩ (S \ T2) (by definition of \and ∩)

Proposition 3.9 (Distributivity, set version). Let A, B, and C be sets. Then:

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) (27)A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) (28)

More generally, if {Bα} is a collection of sets, then:

A ∩(⋃

α

)=⋃α

(A ∩ Bα) (29)

A ∪(⋂

α

)=⋂α

(A ∪ Bα) (30)

17

Page 18: 01 Fundamentals Lecture

Proof. For (27), by definition x ∈ A ∩ (B ∪ C) if (x ∈ A) ∧ ((x ∈ B) ∨ (x ∈ C)), whichby (6) in Proposition 1.9 is equivalent to ((x ∈ A) ∧ (x ∈ B)) ∨ ((x ∈ A) ∧ (x ∈ B)). Bydefinition, this is equivalent to x ∈ (A ∩ B) ∪ (A ∩ C), as desired.

For (28), by definition x ∈ A ∪ (B ∩ C) if (x ∈ A) ∨ ((x ∈ B) ∧ (x ∈ C)), which by(7) in Proposition 1.9 is equivalent to ((x ∈ A) ∨ (x ∈ B)) ∧ ((x ∈ A) ∨ (x ∈ C)). Bydefinition, this is equivalent to x ∈ (A ∪ B) ∩ (A ∪ C), as desired.

The proofs of (29) and (30) are similar, except that we use the quantifier distributivityrules from Proposition 1.12.

Finally, we define the concept of a partition.

Definition 3.10 (Partition). A partition of a set A is a set I of non-empty subsets of Awhose union is A and that are pairwise disjoint. Formally, I is a partition of A iff

• a 6= ∅ for all a ∈ I.

• ∪a∈Ia = A.

• a ∩ b = ∅ for any a, b ∈ I where a 6= b.

A partition J is finer than partition I if ∀a ∈ J, ∃a′ ∈ I such that a ⊂ a′. J is coarser than Iif I is finer than J.

Note that {{a} : a ∈ A}, the partition of A into a subset for each element of A, is thefinest partition. {A} is the coarsest partition.

Partitions are important in studies of information and measure, and sets are beneathmost of the mathematics used in ecnoomics. You will encounter them directly in classeslike 14.122 and 14.126.

3.3 Relations

When we defined sets, we said that the order does not matter:

{a, b} = {b, a}There are times, however, when the order is important. For example, in analytic geometry,the coordinates of a point (x, y) represent an ordered pair.

(1, 3) 6= (3, 1)

When we wish to indicate that the order matters, we enclose the elements in parentheses:(a, b).

Definition 3.11 (Ordered pairs and tuples). An ordered pair (a, b) is an ordered list of two(not necessarily distinct) objects. More generally, an n-tuple (a1, a2, . . . , an) is an orderedlist of n objects.6 (An ordered pair is a 2-tuple.) Two n-tuples are equal if correspondingelements are equal:

(a1, a2, . . . , an) = (b1, b2, . . . , bn) iff ai = bi for all i = 1, 2, . . . , n6In more axiomatic treatments these concepts are built up from the foundation of set theory, with an

ordered pair defined as a certain kind of set and an n-tuple defined as a nested sequence of ordered pairs.This formalism is unlikely to be useful to us.

18

Page 19: 01 Fundamentals Lecture

Definition 3.12 (Cartesian product). If A and B are sets, then the Cartesian product of Aand B, written A× B, is the set of all ordered pairs (a, b) such that a ∈ A and b ∈ B. Insymbols,

A× B = {(a, b) : a ∈ A and b ∈ B}

Definition 3.13 (Relation). Let A and B be sets. A relation between A and B is any subsetR of A × B. We say that a ∈ A and b ∈ B are related by R if (a, b) ∈ R, and we oftendenote this by writing “aRb”. If B = A, then we can speak of a relation R ⊂ A× A beinga relation on A.

Definition 3.14 (Properties of relations). A relation R on a nonempty set X is said to be

• reflexive if xRx for each x ∈ X

• total if for any x, y ∈ X, either xRy or yRx holds

• symmetric if for any x, y ∈ X, xRy implies yRx

• antisymmetric if for any x, y ∈ X, xRy and yRx imply x = y

• transitive if xRy and yRz imply xRz for any x, y, z ∈ X

Definition 3.15 (Equivalence relation). A relation R on a set S is an equivalence relationif it has the following properties for all x, y, z in S.

• Reflexive: xRx

• Symmetric: If xRy, then yRx

• Transitive: If xRy and yRz then xRz

Definition 3.16 (Equivalence class). An equivalence class of x ∈ X is given by

[x]R = {y ∈ X : yRx}

The set of all equivalence classes in X forms a partition of X. (Try proving this.)

Example. Suppose that we define X and the relation R on X to be

X = {(a, b) : a, b ∈ {1, 2, . . .}}(a, b)R(c, d)⇐⇒ ad = bc

The equivalence class of some (a, b) ∈ X is the set

[(a, b)]R ={(c, d) ∈ X :

cd=

ab

}(In fact, we will define the rational numbers in a very similar manner.)

19

Page 20: 01 Fundamentals Lecture

3.4 Orders

Definition 3.17 (Preorder). A relation R on X is called a preorder if it is reflexive andtransitive.

Definition 3.18 (Partial order). A relation R on X is called a partial order if it is a preorderthat is also antisymmetric. In other words, R is a partial order if it is reflexive, transitive,and antisymmetric.

Definition 3.19 (Total order). A relation R on X is called a total order if it is a partial orderthat is also total. In other words, R is a partial order if it is transitive, antisymmetric, andtotal.7 If we denote a total order R by ≤, then we often write x < y to mean that x ≤ yand x 6= y.

Examples:

• In individual choice theory, a preference relation � on a nonempty set X is definedas a total preorder on X. For any two alternatives x, y ∈ X we must either weaklyprefer x to y (x � y) or y to x (y � x), but we are also allowed to be indifferentbetween x and y (x � y and y � x, written as x ∼ y) even if x 6= y. Preferences aretransitive: if we weakly prefer x to y and y to z, then we weakly prefer x to z as well.

• The inclusion relation ⊂ on sets is a partial order. It is reflexive (A ⊂ A), transitive(A ⊂ B and B ⊂ C imply A ⊂ C), and antisymmetric (A ⊂ B and B ⊂ A implyA = B). It is generally not, however, a total order, because often we cannot writeeither A ⊂ B or B ⊂ A.

• A natural partial order on pairs (a, b) of real numbers is given by (a1, b1) � (a2, b2)⇐⇒(a1 ≥ a2)∧ (b1 ≥ b2). This is not a total order because we cannot compare, say, (1, 4)and (2, 3).

• The less-than-or-equal relation ≤ on the integers Z is a total order.

Sometimes rather than defining a total order, we define a strict total order, which re-places the antisymmetric and total requirements with a slightly modified version calledtrichotomy.

Definition 3.20 (Strict total order). A relation R on X, which we will denote by <, iscalled a strict total order if it is transitive and trichotomous (meaning that for any x andy, exactly one of the following is true: x < y, y < x, or x = y).

It is easy to go back and forth between total orders and strict total orders, and we willfrequently do this.

Proposition 3.21. If ≤ is a total order, then the relation < defined by x < y ⇐⇒ ((x ≤y) ∧ (x 6= y)) is a strict total order. If < is a strict total order, then the relation ≤ defined byx ≤ y⇐⇒ ((x < y) ∨ (x = y)) is a total order. This defines a relation that associates each totalorder with a unique strict total order, and vice versa.

7Note that reflexivity is implied by totality, so we can leave it off the list.

20

Page 21: 01 Fundamentals Lecture

Upper and lower bounds.

Definition 3.22 (Upper and lower bounds). Let ≤ be a partial order on X. Then a subsetA ⊂ X has an upper bound in X if there exists some x ∈ X such that a ≤ x for alla ∈ A. Similarly, A ⊂ X has a lower bound if there exists some x ∈ X such that a ≥ xfor all a ∈ A. In these cases, x itself is often called a upper bound and a lower bound,respectively.

Definition 3.23 (Least upper bound and greatest lower bound). Let ≤ be a partial orderon X, and A be a subset of X. Then a least upper bound of A, or supremum, is an elementx ∈ X such that a ≤ x for all a ∈ A, and also y ≥ x for any y ∈ X such that a ≤ y for alla ∈ A. If a least upper bound of A exists, we denote it by sup A.

A greatest lower bound of A, or infimum, is an element x ∈ X such that a ≥ x for alla ∈ A, and also y ≤ x for any y ∈ X such that a ≥ y for all a ∈ A. If a greatest lowerbound of A exists, we denote it by inf A.

Note that when sup A or inf A exists, the definition plus antisymmetry implies that itis unique.

Definition 3.24 (Least upper bound and greatest lower bound properties). Let ≤ be apartial order on X. ≤ has the least upper bound property if every nonempty subset Athat has an upper bound also has a least upper bound. ≤ has the greatest lower boundproperty if every nonempty subset A that has a lower bound also has a greatest lowerbound.

Usually we only hear about the least upper bound property, not its greatest lowerbound counterpart. Why? It turns out that these two properties are equivalent, and theconvention is to use the name “least upper bound property” when referring to this pairof equivalent properties.

Proposition 3.25. Let ≤ be a partial order on X. Then if ≤ has the least upper bound property,it has the greatest lower bound property, and vice versa.

Proof. Suppose ≤ has the least upper bound property on X, and let A be a set with alower bound. Define B = {y : y ≤ a for all a ∈ A} as the set of all lower bounds of A. Bis nonempty and has an upper bound (take any element in A), and therefore it has a leastupper bound, which we denote by b = sup B. We will show that b is, in fact, the greatestlower bound of A.

By construction, b ≥ y for all y ∈ B: if b is a lower bound, it is the greatest lowerbound. Moreover, since every a ∈ A is an upper bound for B, we must have b ≤ a for alla ∈ A by the definition of least upper bound. Thus b is indeed a lower bound, completingthe proof.

The converse is proven by modifying the argument in the obvious way: flipping allinstances of ≤ with ≥ and all instances of “upper” and “least upper” with “lower” and“greatest lower”, respectively.

21

Page 22: 01 Fundamentals Lecture

Examples.

• If A = (0, 1) (i.e. the set of all real numbers between 0 and 1, excluding 0 and 1themselves), then any real number x ≥ 1 is an upper bound for A, and any realnumber x ≤ 0 is a lower bound for A. The least upper bound and greatest lowerbound, however, are unique: sup A = 1 and inf A = 0.

• The integers Z satisfy the least upper bound property. As we will show, the realnumbers R also satisfy the property, while the rationals Q do not.

3.5 Functions

Definition 3.26 (Function). Let X and Y be any two nonempty sets. By a function f thatmaps X into Y, denoted as f : X → Y, we mean a relation f ⊂ X×Y such that

1. for every x ∈ X, there exists a y ∈ Y such that x f y

2. for every y, z ∈ Y with x f y and x f z, we have y = z.

We generally write the relation x f y as y = f (x).

Some further details:

• Here X is called the domain of f and Y the codomain of f . the range of f is definedas

f (X) = {y ∈ Y : y = f (x) for some x ∈ X}

• The set of all functions that map X into Y is denoted by YX. For example, R[0,1] isthe set of all real-valued functions on [0, 1].

When a function consists of just a few ordered pairs, it can be described simply by listingthem. Typically, however, there are too many pairs to list, and we must describe thefunction by specifying the domain and codomain and providing a rule for obtaining thesecond element in the ordered pair from the first element.

For instance, if we want to define the function f corresponding to all ordered pairs{(x, x2) : x ∈ R}, we can’t possibly “list” the infinite number of pairs. Instead we specifya rule that f takes each x ∈ R to its square, i.e. f (x) = x2.

Definition 3.27 (Surjective function). A function f : A → B is called surjective (or onto)if B = f (A).

Definition 3.28 (Injective function). A function f : A → B is called injective (or one-to-one) if for all a and a′ in A, f (a) = f (a′) implies that a = a′.

Definition 3.29 (Bijective function). A function f : A → B is called bijective if it is bothsurjective and injective.

Definition 3.30 (Image). Suppose that f : A → B. If C ⊂ A, then we define the imagef (C) to be the subset f (C) = { f (x) : x ∈ C} of B.

22

Page 23: 01 Fundamentals Lecture

Surjectivenot injective

A B

1

2

3

4

w

x

y

Injectivenot surjective

A B

1

2

3

w

x

y

z

Bijective

A B

1

2

3

4

w

x

y

z

Not surjectiveor injective

A B

1

2

3

4

w

x

y

z

Figure 3.1: Surjective, injective, and bijective functions.

Definition 3.31 (Inverse image). Suppose that f : A → B. If D ⊂ B, then we define theinverse image f−1(D) (also called preimage), to be the subset {y : f (y) ∈ D} of A.

The concept of inverse image is distinct from the concept of an inverse function. Forinstance, given some point y ∈ B, the inverse image f−1({y}) of the singleton set con-taining y may be empty, or it may have more than one member. Only when f is bijectivecan we define a true inverse function.

Note that f−1( f (B)) = B is not necessarily true when f is not injective. (For instance,consider B = {1, 2, 3} in the “surjective, not injective” example in Figure 3.1.) We can onlysay that B ⊂ f−1( f (B)). Similarly, f ( f−1(B)) = B is not necessarily true when f is notsurjective. Indeed, there are many such subtleties when taking the images and inverseimages of sets. We list several relevant facts in the following proposition.

Proposition 3.32. Let f : X → Y be a function, and let {Aα} be a collection of subsets of Xindexed by I and {Bβ} be a collection of subsets of Y indexed by J. Also let A1, A2 be subsets of X

23

Page 24: 01 Fundamentals Lecture

and B1, B2 be subsets of Y. Then

f−1( f (B)) ⊃ B (31)

f ( f−1(B)) ⊂ B (32)

f (A) ⊂ B⇒ A ⊂ f−1(B) (33)A1 ⊂ A2 ⇒ f (A1) ⊂ f (A2) (34)

B1 ⊂ B2 ⇒ f−1(B1) ⊂ f−1(B2) (35)

f

(⋃α∈I

)=⋃α∈I

f (Aα) (36)

f

(⋂α∈I

)⊂⋂α∈I

f (Aα) (37)

f−1

⋃β∈J

=⋃β∈J

f−1(Bβ) (38)

f−1

⋂β∈J

=⋂β∈J

f−1(Bβ) (39)

f−1(Y \ B1) = X \ f−1(B1) (40)

In (31), (32), and (37), where there are inclusion signs ⊂ rather than equalities =, there existexamples where there is strict inclusion, so that the statement with an equality sign would be false.Furthermore, there is no analog of (40) with images f rather than inverse images f−1: there arecases both where f (X \ A1) 6⊂ Y \ f (A1) (when f is not injective) and where f (X \ A1) 6⊃Y \ f (A1) (when f is not surjective).

Definition 3.33 (Inverse function). Suppose that f : A → B is bijective. The inversefunction of f is the function f−1 given by f−1(y) = x ⇐⇒ y = f (x).8

Definition 3.34 (Composition). Suppose that g : A→ B and f : B→ C are functions. Thecomposition of f and g is denoted by f ◦ g and given by ( f ◦ g)(x) = f (g(x)).

The concepts of inverse function and composition can be combined nicely: both ( f ◦f−1)(x) and ( f−1 ◦ f )(x) equal simply x.

Proposition 3.35 (Associativity of function composition). Let h : A → B, g : B → C, andf : C → D be functions. Then (h ◦ g) ◦ f = h ◦ (g ◦ f ).

Proof. This follows from the definition of composition. For any x ∈ A, ((h ◦ g) ◦ f )(x) =(h ◦ g)( f (x)) = h(g( f (x)) = h((g ◦ f )(x)) = (h ◦ (g ◦ f ))(x).

8The requirement that f is bijective is necessary and sufficient for this function to be uniquely definedfor every y ∈ B.

24

Page 25: 01 Fundamentals Lecture

3.6 Axiom of choice9

In Definition 3.12, we defined the Cartesian product of two sets A1 and A2 as the set ofall ordered pairs (a1, a2), with a1 ∈ A1 and a2 ∈ A2. What if we want to extend thisnotion and define the Cartesian product of an arbitrary collection {Aα} of sets, indexedby I? The natural way to generalize the Cartesian product is to define it as the set of allfunctions f , called choice functions, from the index set I to the union ∪α∈I Aα such thatf (α) ∈ Aα for every α ∈ I. Our earlier definition of Cartesian product for two sets A1 andA2 is a special case of this general definition, for index set I = {1, 2}.

Yet it is not immediately clear that this definition is meaningful. Without additionalassumptions, we cannot guarantee that such a function f will even exist. This implies thatwe cannot guarantee that the Cartesian product of {Aα} is nonempty, even when each setAα is nonempty.

Now, when the index set is finite (for instance, I = {1, 2, . . . , n}), there is a clear pro-cedure to construct a choice function f . We just list the value of f (i) for each i = 1, . . . , n,picking some element of Ai each time. Indeed, for the finite case we can formally provethe existence of a choice function by induction, assuming the existence of a choice func-tion for index sets of size n− 1, then defining the function for one additional element tocomplete the proof for index sets of size n. But when the index set is infinite, no suchproof is possible. The situation seems particularly hopeless when the set has too manyelements to count10, like R.

Although this may seem like a technical nitpick, it turns out to be a remarkably deepissue in set theory, and mathematicians ultimately decided that the best route is simplyto take the existence of f as an axiom, called the axiom of choice. They even proved thatthere is an entirely consistent way of defining set theory without this axiom; the axiom ofchoice can therefore never be “proven” from the other axioms of set theory. The axiom ofchoice is a key step in many non-constructive proofs, since the axiom of choice tells us thata choice function exists, but does not tell us how to find it.

Definition 3.36 (Axiom of choice). Let {Aα} be a collection of nonempty sets indexedby I. Then there exists a function, called a choice function, f : I → ∪α∈I Aα such thatf (α) ∈ Aα for every α ∈ I.

Now we can safely define the generalization of Cartesian product, knowing that theCartesian product of nonempty sets will always be nonempty.

Definition 3.37 (Cartesian product). Let {Aα} be a collection of nonempty sets indexedby I. Then the Cartesian product ∏α∈I Aα is the set of all choice functions f on {Aα}.

Despite its apparent simplicity, the axiom of choice has a number of remarkable impli-cations, some of which are deeply unintuitive.11 Two of the most useful are the following,which are actually equivalent to the axiom of choice (they can be proven from the axiomof choice, or used to prove the axiom of choice).

9This is more technical than other sections (though it does not go into much detail), and it is possibly tolead a perfectly satisfying life as an economist without internalizing it.

10We will define precisely what this means in the next section.11Look up the “Banach-Tarski paradox”

25

Page 26: 01 Fundamentals Lecture

Proposition 3.38 (Zorn’s lemma). Let (P,≤) be a partially ordered set, and suppose that forany subset T ⊂ P such that T is totally ordered (i.e. ∀x, y : x ≥ y ∨ y ≥ x), T has an upperbound. Then P has at least one maximal element, defined as being an element x ∈ P such thatx ≥ y for all y ∈ P.

Zorn’s lemma is a key step for proving foundational results in many subfields of math-ematics. In linear algebra, it is used to prove that every vector space has a basis; intopology, it is used to prove that the product space of compact spaces is always compact(Tychonoff’s theorem); and in functional analysis, it is used to prove the Hahn-Banachtheorem, which will make a disguised cameo in 14.121 as the “separating hyperplanetheorem”.

Definition 3.39 (Well-order). A well-order ≤ on a set S is a total order such that everynon-empty subset T ⊂ S has a least element. (In other words, for any subset T ⊂ S, thereexists some x ∈ T such that x ≤ y for all y ∈ T. A least element of a set is a lower boundcontained within the set itself.) We say that the set S is well-ordered by ≤.

The fact that the usual order ≤ on the natural numbers N is a well-order is fairlyobvious (given a set of positive integers, there is always a smallest one), and it turns outto be the key property that allows induction on N. Less obvious is the fact that given anyset S, there exists some order ≤ that is a well-order on S .

Proposition 3.40 (Well-ordering theorem). Any set can be well-ordered.

This sometimes allows us to do induction on arbitrary sets (called transfinite induc-tion), although we must be careful because the order≤ that well-orders a set may be very,very different from the order that we usually use on that set. The well-ordering theoremis another example of a non-constructive result: we know from the theorem that a well-ordering exists on R, yet mathematicians have never been able to actually find such anordering. Most of them have trouble even imagining one.

3.7 Cardinality

Suppose that we want to compare the size of two sets A and B. If A and B are both finite,this is easy: we just look at the number of elements m in A and n in B, and see whichnumber is larger. But what if A and B are both infinite? There isn’t any clear rule todecide when one infinity is larger than another infinity.

Mathematicians like to generalize the notion of comparing size using functions. Ifthere is a bijective function f between A and B, they say that A and B have the samesize. If, on the other hand, there is an injective function from A to B but not an injectivefunction from B to A, they say that A is smaller than B.

This notion of size is consistent with the standard definition for finite sets. For finite Aand B, there is a bijective function between A and B iff A and B have the same number ofelements, and there is an injective function from A to B iff A has weakly fewer elementsthan B. For instance, there is a bijection between {1, 2, 3} and {2, 3, 4} ( f (n) = n+ 1), sincethese sets both have 3 elements. There is an injective function from {1, 2, 3} to {1, 2, 3, 4},

26

Page 27: 01 Fundamentals Lecture

but not an injective function from {1, 2, 3, 4} to {1, 2, 3}; in the latter case, a function f willinevitably send at least two elements in {1, 2, 3, 4} to the same element in {1, 2, 3}.

There are also unintuitive aspects. We might think that the set of positive even num-bers {2, 4, 6, . . .} is clearly “smaller” than the set of all positive integers {1, 2, 3, . . .}, sincethe former is a strict subset of the latter. But since there is an obvious bijection betweenthe sets ( f (n) = n/2), these sets are actually the same size according to our new definition.

Above all, if this is to be a rigorous notion of comparative size, we need to show thatit satisfies the properties of an order. First, we define what it means for two sets to be thesame size, and show that it is an equivalence relation.

Definition 3.41. Two sets A and B are said to be equinumerous, to have the same cardi-nality, if there is a bijection f : A→ B. We then write A ∼ B.

Proposition 3.42. ∼ is an equivalence relation.

Proof. To prove that ∼ is an equivalence relation, we must show that it is reflexive, sym-metric, and transitive. For reflexivity, note that the identity function f : A → A definedby f (x) = x is a bijection. For symmetry, recall that for any bijective function f : A → Bwe can define an inverse f : B → A, which is also bijective. For transitivity, observe thatthe composition f ◦ g : A → C of any two bijections g : A → B and f : B → C is also abijection.

Now that we have show that ∼ is an equivalence relation, we define the cardinality|A| of a set A to be its equivalence class under ∼.1213 We then define an order ≤ thatcompares cardinalities.

Definition 3.43 (Definition of order ≤ on cardinalities). Let A and B be sets. Then wedefine |A| ≤ |B| if there exists an injection from A to B.

First, we need to be sure that this definition is consistent. We define |A| ≤ |B| if thereexists an injection from A to B, but A is only one member of the equivalence class |A| andB is only one member of the equivalence class |B|. What if we took some other memberC of the equivalence class |A| (i.e. C ∼ A) and D of the equivalence class |B| (i.e. D ∼ B)?For the definition to be consistent, we should get the same answer. There should be aninjection from C to D iff there is an injection from A to B.

Fortunately, this is true. If C ∼ A and D ∼ B, then let g1 : A → C and g2 : B → D bethe bijection between A and C and the bijection between B and D, respectively. Now, iff : A→ B is an injection, h : C → D defined by h(x) = g2 ◦ f ◦ g−1

1 (x) is also an injection.Similarly, we can take an injection from C to D and obtain an injection from A to B.

12Recall that any element x in a set with an equivalence relation is a member of an equivalence class,which consists of all members of the set that are equivalent to it. The equivalence classes of a set form apartition of that set. In this case, the “elements” x are themselves sets, which are members of the “set of allsets”. (Defining the “set of all sets” is difficult, and we run into paradoxes unless we limit the sets that areallowed to be members. For now we simply assume that this has been done properly. Look up the “Russellparadox” for more details if you are interested.)

13In more in-depth treatments, objects called “cardinals” are defined that serve as representatives of theseequivalence classes. This doesn’t really matter for us, though; we just care about the fact that we cancompare the cardinality of any two sets.

27

Page 28: 01 Fundamentals Lecture

Although we have shown that the definition of≤ is consistent, we still have not shownthat it satisfies the properties of an order, either partial or total. As it turns out, ≤ is a totalorder. To prove this, we need to verify the three properties of a total order.

Proposition 3.44. ≤ is a total order on cardinalities.

Proof. We need to verify transitivity, antisymmetry, and totality.

• Transitivity. This is the easiest of the three. Suppose that |A| ≤ |B| and |B| ≤ |C|.Then there exist injections g : A → B and f : B → C. The composition f ◦ g : A →C is then an injection from A to C, so that |A| ≤ |C|.

• Antisymmetry. We need to show that if |A| ≤ |B| and |B| ≤ |A|, then |A| = |B|.In other words, we need to show that if there is an injection from A to B and aninjection from B to A, then there is a bijection from A to B. This is not too hard toprove, but it is enough that we will not do it here. This result is called the Cantor-Bernstein-Schroeder theorem, and a proof is available in its Wikipedia article.

• Totality. We need to show that any A and B are comparable; that for any A andB, either |A| ≤ |B| or |B| ≤ |A|. Like antisymmetry, this is highly nonobvious; infact, it is even more difficult to prove, and unlike transitivity and antisymmetry itrequires us to assume the axiom of choice. The usual method of proof is to showthat any two well-ordered sets are comparable (making use of the nice propertiesof a well-ordering), and then to use the well-ordering theorem (Proposition 3.40) towell-order arbitrary A and B.

What practical use is all this? Although it seems unnecessarily complicated, showingthat |S1| 6= |S2| is sometimes one of the easiest ways to show that S1 6= S2. (This is avery nice way to show that Q 6= R, as we will see.) Also note that ≤ behaves nicely withrespect to set inclusion:

Proposition 3.45. Suppose A ⊂ B. Then |A| ≤ |B|.

Proof. Let f : A → B be the “inclusion” function defined by f (x) = x, which sends anelement x in A to the same element x in the larger set B. f is injective, and therefore bydefinition |A| ≤ |B|.

We can equivalently define ≤ in terms of surjections rather than injections:

Proposition 3.46. |A| ≤ |B| iff there is a surjection from B to A.

Another very important application is to compare the cardinality of arbitrary sets Swith the cardinality of the natural numbers N. Often the validity of an argument dependson how |S| compares to |N|; for instance, a proof strategy may work if |S| ≤ |N|, but nototherwise. This comparison is so important that we use special terms to discuss it.

28

Page 29: 01 Fundamentals Lecture

Definition 3.47 (Countable set). A set S is countable if it has cardinality weakly less thanthat of the natural numbers: |S| ≤ |N|. In other words, S is countable if there exists aninjective function from S to N.

Definition 3.48 (Countably infinite set). A set S is countably infinite if it has cardinalityequal to that of the natural numbers: |S| = |N|. (Sometimes the term countable is usedspecifically with this meaning as well.) In other words, S is countably infinite if thereexists a bijection between S and N.

Definition 3.49 (Uncountable, or uncountably infinite, set). A set S is uncountable (alsocalled uncountably infinite) if |S| > |N|. In other words, S is uncountable if there existsan injective function from N to S, but no bijection between N and S.

The following is a surprising and useful result about countable sets. We will use it inthe next section to show that the set of rational numbers is countable!

Proposition 3.50. The union of countably many countable sets is countable.

Proof. Suppose that we have countably infinitely many countable sets. In particular, let{An} be a collection of countable sets, indexed by n ∈ N. Let us arrange the elements ofeach set An in a sequence {xnk}, where k ∈N as well, and consider the array

A1 : x11 x12 x13 x14 . . .A2 : x21 x22 x23 x24 . . .A3 : x31 x32 x33 x34 . . .A4 : x41 x42 x43 x44 . . .

By going up the diagonals from bottom-left to top-right, we can arrange these elementsin a sequence (note that since we did not assume that the An were disjoint, there might berepeats):

x11, x21, x12, x31, x22, x13, x41, . . . (41)

This sequence is a surjective mapping from N to ∪n∈NAn. Therefore by Proposition 3.46,|∪n∈NAn| ≤ |N|. Yet for any n, |An| = N, and since An is a subset of ∪n∈NAn, Proposi-tion 3.45 implies that |N| ≤ |∪n∈NAn|. We conclude that

|∪n∈NAn| = |N| (42)

as desired.The result for finitely many countable sets A1, . . . , Am (rather than countably infinitely

many, as above) is implied by the above result, where we simply take An to be the emptyset ∅ for all n ≥ m + 1.

Above we found a way to show that for certain set unions A, |A| = |N|. When can weshow that |A| > |N|, or more generally |A| > |B| for some B? The following propositionprovides one situation where this is possible.

29

Page 30: 01 Fundamentals Lecture

Proposition 3.51. For any set X, |X| < |2X|: the cardinality of X is strictly less than thecardinality of its power set.

Proof. First, note that there is an injection f : X → 2X that takes each x to the singletonset {x} in 2X. Therefore we have |X| ≤ |2X|, and we now need to rule out |X| = |2X|.

Suppose that |X| = |2X|, i.e. that there is a bijection between X and 2X. Then, inparticular, there is a surjection g from X onto 2X. Let us define the element Y of 2X (i.e. asubset Y of X) as follows:

x ∈ Y ⇐⇒ x /∈ g(x)

Since g is surjective, there must exist some y ∈ X such that g(y) = Y. Substituting y for xabove, and using g(y) = Y, we obtain:

y ∈ Y ⇐= y /∈ Y

which is clearly impossible. Therefore our assumption was wrong, and there cannot be abijection between X and 2X. We conclude that |X| < |2X|, as desired.

30

Page 31: 01 Fundamentals Lecture

4 Fields

4.1 Fields, orders, and ordered fields

We now define special kinds of sets, called fields, on which the basic operations of arith-metic are defined. The formal definition of a field is important in its own right, but itwill be especially important in the definition of vector spaces, which we will cover in thelinear algebra lecture. Although many different fields exist, there are three specific fieldsthat we will emphasize in these notes: the rational numbers Q, the real numbers R, andthe complex numbers Q.

Definition 4.1 (Field). A field is a triple (F,+, ·) consisting of a set F, an addition opera-tion + that maps a pair of elements of F to another element of F:

F×F→ F

(x, y)→ x + y

and a multiplication operation · that maps a pair of elements of F to another element ofF:

F×F→ F

(x, y)→ x · y

such that addition satisfies the following 4 properties:

1. Addition is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ F.

2. Addition is commutative: x + y = y + x for all x, y ∈ F.

3. Zero (additive identity): There is a unique element 0 ∈ F such that x + 0 = x for allx ∈ F.

4. Additive inverse: For each element x ∈ F, there is a unique element−x ∈ F such thatx + (−x) = 0.

multiplication satisfies the following 4 properties:

1. Multiplication is associative: (x · y) · z = x · (y · z) for all x, y, z ∈ F.

2. Multiplication is commutative: x · y = y · x for all x, y ∈ F.

3. One (multiplicative identity): There is a unique element 1 ∈ F such that x · 1 = x forall x ∈ F.

4. Multiplicative inverse: For each element x ∈ F other than 0, there is a unique element1/x ∈ F such that x · 1/x = 1.

and addition and multiplication jointly obey the distributive law

x · (y + z) = x · y + x · z

for all x, y, z ∈ F.

31

Page 32: 01 Fundamentals Lecture

Like other sets, fields can be preordered, partially ordered, or totally ordered. The to-tally ordered case (see Definition 3.19) is particularly interesting, because in special caseswe can define an ordered field.

Definition 4.2 (Ordered field). An ordered field is a 4-tuple (F,≤,+, ·) such that (F,+, ·)is a field, (F,≤) is a totally ordered set, and the following additional two properties hold:

1. For any x, y, z ∈ F, y ≤ z implies x + y ≤ x + z.

2. For any x, y ∈ F, x ≥ 0 and y ≥ 0 together imply x · y ≥ 0.

Keep in mind that an ordered field is both a field and a totally ordered set, but theconverse is not true: it is possible for a field to have a total order that does not satisfyproperties 1 and 2 in Definition 4.2. These properties link the structure of the arithmeticoperations in the field to the order on that field.

Examples.

• The integers Z = {. . . ,−1, 0, 1, . . .} are a totally ordered set under the usual orderbut not a field, because no elements except 1 and -1 have multiplicative inverses.

• The set {0, 1} is a field if 0 and 1 have their usual meanings and we complete thedefinition of addition by writing 1 + 1 = 1. (Note: the outcome of all other opera-tions is specified by the field axioms. For instance, it can be proven that in any field,0 · x = 0 for all x.) It is not an ordered field, however, since property 1 in Definition4.2 is violated regardless of how we define the order.

• Both the rationals Q and the reals R (which we will define rigorously soon, thoughyou have probably already encountered them) are ordered fields under the usualorder. The complex numbers C are a field, and can be made into a totally orderedset by defining the order in various ways, but cannot be made into an ordered fieldno matter how the order is defined.

• Although the reals R are an ordered field under the usual order, we can define theorder in other ways that do not satisfy the requirements for an ordered field. Forinstance, suppose we define a new order≤R by reversing the usual order≤ of R, sothat x ≤R y ⇐⇒ y ≤ x. Then (R,≤R) is still an ordered set, but it is not an orderedfield because it fails property 2 of Definition 4.2. (Remember: an ordered field ismore than just a field with some order!)

All the usual rules of arithmetic can be derived from the properties of a field. For instance,we have:14

Proposition 4.3. Suppose that (F,+, ·) is a field. Then the properties of addition imply:

• D(a): If x + y = x + z, then y = z.

14All these examples are taken from Rudin

32

Page 33: 01 Fundamentals Lecture

• D(b): If x + y = x, then y = 0.

• D(c): If x + y = 0, then y = −x.

• D(d): −(−x) = x.

The properties of multiplication similarly imply:

• M(a): If x 6= 0 and xz = yz, then y = z.

• M(b): If x 6= 0 and xy = x, then y = 1.

• M(c): If x 6= 0 and xy = 1, then y = 1/x.

• M(d): If x 6= 0, then 1/(1/x) = x.

We also have the following four statements:

• F(a): 0 · x = 0.

• F(b): If x 6= 0 and y 6= 0, then xy 6= 0.

• F(c): (−x) · y = −(x · y) = x · (−y).

• F(d): (−x) · (−y) = x · y.

Proof. We will prove D(a), M(d), F(a), and F(b) as examples, leaving the rest as exercises.(All these statements are very simple to prove, but it is good practice to work throughthem formally.)

For D(a), we add −x from the left on both sides of x + y = x + z to obtain −x +(x + y) = −x + (x + z). We apply the associative property to make this (−x + x) + y =(−x + x) + z, and then apply the definition of an additive inverse to obtain 0+ y = 0+ z.Finally, we apply the definition of the additive identity 0 to obtain y = z.

For M(d), we observe that 1/x · x = x · 1/x = 1, where the first step follows from thecommutative property for multiplication and the second step follows from the definitionof 1/x. Therefore x is an additive inverse of 1/x (and additive inverses are unique by theproperties of a field); we write this as 1/(1/x) = x.

For F(a), we note that 0 · x equals (0 + 0) · x by the definition of 0, and further that thisequals 0 · x + 0 · x by the commutativity of multiplication and the distributive property.Now that we have 0 · x + 0 · x = 0 · x, we apply D(b) to obtain 0 · x = 0.

For F(b), if x · y = 0 and x 6= 0, then we multiply both sides by 1/x to obtain 1/x · (x ·y) = 1/x · 0. Using associativity on the left and F(a) on the right, we have (1/x · x) · y = 0,and from the definitions of additive inverse and identity this reduces to y = 0.

Similarly, all the usual rules for arithmetic and inequalities can be derived from theproperties of an ordered field:

Proposition 4.4. Suppose that (F,≤,+, ·) is an ordered field. Then we have:

• O(a): If x ≥ 0, then −x ≤ 0, and vice versa.

33

Page 34: 01 Fundamentals Lecture

• O(b): If x ≥ 0 and y ≤ z, then x · y ≤ x · z.

• O(c): If x ≤ 0 and y ≤ z, then x · y ≥ x · z.

• O(d): x · x ≥ 0. (We usually write this as x2 ≥ 0)

• O(e): If 0 ≤ 1/x ≤ 1/y, then 0 ≤ y ≤ x.

4.2 Rational numbers

We’ve seen the abstract properties of ordered fields, but let’s try to imagine an orderedfield. What elements might it contain, and what might it look like? By definition, we musthave the elements 0 and 1, and using the addition operation we obtain 1 + 1, 1 + 1 + 1,and so on.

For fields in general, it’s possible that a sum of n 1s will equal 0 (1 + 1 + · · ·+ 1 = 0),but for ordered fields this is impossible. Why? By point O(d) in Proposition 4.4 above, weknow that 1 = 12 ≥ 0. Then property 1 of Definition 4.2 implies 1 + 1 ≥ 1, 1 + 1 + 1 ≥1 + 1, and so on. If

1 + 1 + · · ·+ 1︸ ︷︷ ︸n ones

= 0

for some n, then we find:

1 + 1 + · · ·+ 1︸ ︷︷ ︸n−1 ones

≥ 0

=⇒ 1 + 1 + · · ·+ 1︸ ︷︷ ︸n ones

≥ 1

=⇒ 0 ≥ 1

This is not consistent with antisymmetry: we can’t have both 0 ≥ 1 and 1 ≥ 0 without0 = 1. But as F(a) in Theorem 4.3 demonstrates, we can’t have 0 = 1 either. We concludethat it’s impossible to have 1 + 1 + · · ·+ 1 = 0, for any number of 1s on the left.

This has a useful consequence: it implies that 1 + 1 + · · ·+ 1︸ ︷︷ ︸n ones

and 1 + 1 + · · ·+ 1︸ ︷︷ ︸m ones

are

not equal whenever m 6= n. Otherwise (if, say, m < n) we could subtract the sum ofm 1s from both sides, and conclude that 1 + 1 + · · · 1︸ ︷︷ ︸

n−m ones

= 0, which we just proved to be

impossible.We therefore have found a hierarchy of distinct elements in any ordered field: 1, 1+ 1,

1 + 1 + 1, and so on. In the usual manner, we denote these values by 1, 2, 3, etc. By theadditive inverse property, the ordered field must also contain −1, −2, −3, etc. as well.Essentially, we’ve demonstrated that any ordered field must contain the integers.

That’s not all. Multiplicative inverses must exist as well. We must have 1/2, 1/3, etc.,as well as the products of these values with the integers: say, 3 · 1/2, or 5 · 1/7. And weneed to define the sums and products of these values (say, the sum of 3 · 1/2 and 5 · 1/7)in a way that’s consistent with the properties of an ordered field. And so on...

34

Page 35: 01 Fundamentals Lecture

Although we won’t prove it here, it turns out that there is really only one way todefine the sums and products of these values consistently, and this gives us the systemof numbers known as the rationals. Every ordered field must contain the rationals asa subset, though beyond this there are many possibilities for additional elements andstructure.

Definition 4.5 (Rational numbers). The set of rational numbers Q is an ordered fieldconsisting of equivalence classes of pairs of integers (p, q) where q > 0, which we writeas p/q. Two pairs p1/q1 and p2/q2 are in the same equivalence class (i.e. represent thesame rational number) if p1q2 = p2q1.15

We define the order ≤ on Q by p1/q1 ≤ p2/q2 ⇐⇒ p1q2 ≤ p2q1. The additive identityis the equivalence class of pairs of the form 0/q, and the multiplicative identity is theequivalence class of pairs of the form q/q. Addition, multiplication, additive inverses,and multiplicative inverses are defined by:

p1

q1+

p2

q2≡ p1q2 + p2q1

q1q2

p1

q1· p2

q2≡ p1p2

q1q2

− p1

q1≡ −p1

q1

1/

p1

q1≡{ q1

p1p1 > 0

−q1−p1

p1 < 0

It is straightforward but extremely tedious to verify that the ordered field thus definedsatisfies all the properties laid out for fields, totally ordered sets, and ordered fields in Def-initions 4.1, 3.19, and 4.2. Among other things, we must demonstrate that our definitionsare consistent with the equivalence relation we’ve used to define Q. For instance, if p′1/q′1and p′2/q′2 are equal to p1/q1 and p2/q2, respectively, then our rule for addition had bet-ter produce the same results for p′1/q′1 + p′2/q′2 and p1/q1 + p2/q2. More concretely, weshould have 2/4 + 1/3 = 1/2 + 3/9. (And we do: applying the rule for addition, thisbecomes 10/12 = 15/18.)

This all sounds like basic arithmetic, because it is. But there are several ideas herethat will recur in the future. The idea that we define a set by partitioning a larger setinto equivalence classes, for instance, is extremely common. Perhaps we have a space of

15Recall that in the previous lecture, we mentioned that an equivalence relation induces a partition on aset; each subset in the partition consists of elements that are equivalent to each other. The rational num-bers are simply the subsets in this partition. The rational number 1/2, for instance, is really the partition{1/2, 2/4, 3/6, 4/8, . . .}. This is very important, because we don’t want to define the rationals in a waythat, say, 1/2 and 2/4 are treated as different; they are really just two ways of writing the same element inQ.

35

Page 36: 01 Fundamentals Lecture

functions, and we want to say that f = g as long as f (x) = g(x) for “almost all” x, even ifthey disagree at a few points. This requires that we define some notion of an equivalenceclass, and show that our rules for the space of functions are compatible with it.

More generally, we’ll often need to define a set along with a set of operations on thatset, and verify that these operations satisfy some basic axioms. Defining the rationals mayseem like unnecessary formalism to an economist, but the concepts involved certainly arenot.

4.3 Properties of the rational numbers

The following property says, intuitively speaking, that Q has no “very small” or “verylarge” elements: any positive element of Q, added to itself sufficiently many times, canbe made greater than any other element. It is one of the important properties of Q (and,as we’ll soon see, R).

Proposition 4.6 (Archimedean property for rationals). If x ∈ Q, y ∈ Q, and x > 0, thenthere exists some positive integer n such that n · x > y.

Proof. Suppose x = p1/q1 and y = p2/q2. Then one such n is 2q1p2.

A related property is that between any two rational numbers, we can find anotherrational number.

Proposition 4.7. If x ∈ Q, y ∈ Q, and x < y, then there exists a z ∈ Q such that x < z andz < y.

Proof. Take z = 1/2 · (x + y).

One surprising feature of the rationals is that they are countably infinite. As stated inDefinition 3.48, this means that the rationals have the same cardinality as N, i.e. that thereis a bijective mapping f : N→ Q from the natural numbers N to the rationals Q.

Proposition 4.8. Q is countably infinite.

Proof. First, there is an injection from N into Q defined by f (x) = x/1. Therefore thecardinality of N is weakly less than the cardinality of Q: |N| ≤ |Q|.

We now show that |Q| ≤ |N| as well, which in conjunction with |N| ≤ |Q| impliesthat |Q| = |N|. To do so, we use Proposition 3.50, which states that the union of countablymany countable sets is countable. For n ∈N, we define the set An ⊂ Q as the set containingall rationals with denominator Q.

An = {m/n : m ∈ Z} (43)

(Note that there will be some overlap between the various An: for instance, 1/2 = 2/4 =3/6 = . . ., so that 1/2 ∈ An for each even n.)

Since any rational must be expressible as m/n with some denominator n, we can writethe rationals as the union of all the An: Q = ∪n∈NAn. Thus Q can be written as the unionof countably many countable sets. By Proposition 3.47 from the previous lecture, this impliesthat Q itself is countable, i.e. |Q| ≤ |N|.

36

Page 37: 01 Fundamentals Lecture

The countability of the rationals is surprising in part because we’re accustomed to“counting” the elements of a set in a natural order. For instance, if we were asked to showthat the set of positive squares {1, 4, 9, . . .} was countable, we’d simply write f (1) = 1,f (2) = 4, f (3) = 9, and so on. But we can’t do this with rationals: any bijective mappingf : N → Q cannot always have f (n) < f (n + 1), where < is the standard order on Q.To see why, just consider Proposition 4.7: if f (n) < f (n + 1), there must be some otherrational z in between f (n) and f (n + 1). But if f is strictly increasing, it can’t possiblymap to z; we know, therefore, that we’ll miss at least one rational number by counting inthis manner. To show that Q is countable, we need to map N to Q in a less natural way.

The fact that the rationals Q are countable is frequently useful, particularly because(as we will soon show) the reals R are not countable. Sometimes an argument only worksfor a countable set; a common strategy is to use this argument to prove a proposition overQ, then find some way to extend it to R.

Unfortunately, all is not well in the land of rational numbers. Although they possessddesirable features like the Archimedean property and countability, they fail in otherways. For instance, we cannot always take square roots within the set of rational numbers.

Proposition 4.9. x2 = 2 does not have a solution x ∈ Q.

Proof. Suppose to the contrary that there was some solution x = p/q. Take p and q to besuch that not both p and q are even. (If p and q are both even, we can divide them bothby 2 to get an equivalent representation of x.) Then we have:(

pq

)2

= 2

p2 = 2q2

Therefore p2 must be even, which implies that p is even as well. But then we can writep = 2p1, which implies:

4p11 = 2q2

2p21 = q2

This implies that q2 and therefore q must also be even. But this contradicts our assumptionthat not both p and q were even. We conclude that the premise was false, and there cannotbe a solution x ∈ Q to x2 = 2.

The strange part about this is that we can come arbitrarily close to obtaining a solutionx ∈ Q to x2 = 2. For instance, we can take x = 1.4, x = 1.41, x = 1.414, x = 1.4142, andso on. All these x are rational, and their squares seem to be approaching 2 from below.Yet we’ve proven that we can’t find a x that actually solves the equation.

We recall the least upper bound property given in Definition 3.24. When this propertyis true for a set X, for any nonempty subset A ⊂ X that has an upper bound, there mustbe a least upper bound sup A. As you can probably guess, the rationals do not satisfy thisproperty.

37

Page 38: 01 Fundamentals Lecture

Proposition 4.10. Q does not satisfy the least upper bound property.

Proof. Suppose to the contrary that Q does satisfy the least upper bound property.Consider the set X of all x ∈ Q such that x2 ≤ 2. This set has an upper bound (for

instance, 2), and by assumption it must have a least upper bound c. Since we cannot havec2 = 2, either c2 < 2 or c2 > 2.

Suppose c2 < 2. Choose some positive r ∈ Q sufficiently small that r ≤ 1/8 · (2− c2).(It follows from the Archimedean property that this is possible by picking r = 1/n forsome n.) Then

(c + r)2 = c2 + 2rc + r2

≤ c2 +2− c2

2+

(2− c2)2

64< 2

c + r is strictly greater than c and satisfies (c + r)2 < 2, which contradicts c’s status as theleast upper bound of X.

We can show that the case c2 > 2 leads to a contradiction by similar logic, and weconclude that the set X cannot have a least upper bound.16 Therefore Q does not satisfythe least upper bound property.

In one sense, rational numbers are everywhere: you can find rational numbers thatcome closer and closer to solving an equation like x2 = 2. But in another sense, there aregaps: you can’t actually solve x2 = 2, and the set of rational numbers with x2 ≤ 2 hasno least upper bound. To close these gaps, we need a richer number system. This systemwill be R, the reals.

4.4 Real numbers

After defining the concept of an ordered field, we realized that any ordered field mustcontain a subset with certain properties. Focusing our attention on this subset as an or-dered field of its own, we called it Q, the rationals.

It turns out that when we add the requirement that an ordered field must satisfy theleast upper bound property, only one ordered field remains: the real numbers R. We can,therefore, simply define R to be an ordered field with the least upper bound property.

Definition 4.11 (R, axiomatic definition). The field of real numbers R is the unique or-dered field with the least upper bound property. (In the context of R, the latter propertyis often called the completeness axiom.)

This definition is extremely useful: armed with only the properties of an orderedfield—including the fact that it must contain Q—and the completeness axiom, we canverify many facts about the reals. This is what we will do in the next section.

16Observe that if we were in a field that we knew did satisfy the least upper bound property, then wewould have just proven that c2 = 2, by ruling out the alternatives c2 < 2 and c2 > 2! But

38

Page 39: 01 Fundamentals Lecture

At the same time, this definition is also slightly unsatisfying. It is not easy to prove thatthat real numbers are the unique field satisfying these properties. Worse, it is not obviousthat it is even possible for an ordered field with the completeness axiom to exist. To showthat it is possible, we would need to explicitly construct the real numbers, defining themin terms of more fundamental constituents.

One standard method of doing this is to define real numbers as Dedekind cuts. Es-sentially, we define each real number r ∈ R to be the subset of rationals x ∈ Q wherex < r. (Or, equivalently, the subset of rationals x ∈ Q where x > r.) The idea is thateach real number r is equivalent to a “cut” in Q, which splits Q into a subset of numbersgreater than r a subset of numbers less than r, possibly along with r itself.

This idea is motivated by the failure of {x ∈ Q : x2 < 2} to have a least upperbound in Q; if we want some r to be the least upper bound, why not just define r by theproperty r > x ⇐⇒ x2 < 2? This is extremely unintuitive at first: it’s weird to say thatwe’re defining a real number to be a certain subset of rational numbers. But it turns outto be a clever and very useful way to construct the reals. We discuss this a little more inAppendix A.

4.5 Properties of the real numbers

Like the rationals, the reals satisfy the Archimedean property.

Theorem 4.12. If x ∈ R, y ∈ R, and x > 0, then there exists some positive integer n such thatn · x > y.

Proof. Consider the set X = {n · x : n ∈ Z}.17. If the Archimedean property is false, theny is an upper bound for X. Then by the completeness axiom, this set must have a leastupper bound, which we call z.

Now consider z − x. Since z − x < z = sup X, there must be n · x ∈ X such thatz − x < n · x, or else z would not be the least upper bound. But then we have z <(n + 1) · x, which is a contradiction. We conclude that X must not have an upper boundafter all, and therefore that the Archimedean property holds.

Again like the rationals, between any two real numbers there is another real number.(If we have x, y ∈ R and x < y, we just take (x + y)/2.) More surprisingly, however,there is also a rational number between any two real numbers. This a critical fact about therelationship between Q and R. It follows almost immediately from the Dedekind cut con-struction of the reals in Appendix A, but here we will prove it without the construction,which takes a little more work.

Proposition 4.13. If x ∈ R and y ∈ R, then there is some z ∈ Q such that x < z and z < y.

Proof. From Theorem 4.12, there exists some n such that n · (y− x) ≥ 1.Applying Theorem 4.12 once more, there exist some positive integers m1 and m2 such

that m1 > −nx and m2 > nx. This implies m1 < nx < m2, which then implies that thereexists some m satisfying m1 ≤ m ≤ m2 such that m− 1 ≤ nx < m.

17This is the set {. . . ,−2x,−1x, 0x, 1x, 2x, . . .}

39

Page 40: 01 Fundamentals Lecture

We now take z = m/n. We observe that z = m/n > nx/n > x, and also z = m/n <(m + 1)/n < x + 1/n ≤ y, as desired.

One major difference between R and Q is that R is not countable. There are manyproofs of this fact, but arguably the simplest and most generalizable is the “diagonal argu-ment” given below, which is closely related to the argument we used to prove Proposition3.51. It involves the notion of decimal expansions, which we haven’t formally developedin these notes (although certainly we are all familiar with it on an intuitive level!). It’s soelegant that I’ll use it anyway.

Proposition 4.14. R is uncountable.

Proof. Certainly if the interval [0, 1) is uncountable, then R is uncountable; we will provethe former.

Suppose to the contrary that [0, 1) is countable, and that there is a bijection f : N →[0, 1). We look at the decimal expansions18 of f (1), f (2), and so on:

f (1) = 0.a11a12a13 . . .f (2) = 0.a21a22a23 . . .f (3) = 0.a31a32a33 . . .

Now we will construct a new real number 0.b1b2b3 . . . such that bn 6= ann for all n. In otherwords, we are constructing a new number whose nth digit will disagree with the nth digitin bold above. Clearly, this number cannot be in the list f (1), f (2), f (3), . . ., because it willdisagree with f (n) in the nth digit. But this is impossible, becasue we assumed that f wasa bijection, meaning that every real number in [0, 1) should equal f (n) for some n. We areforced to conclude that such a bijection is impossible, and that [0, 1) (and thus R) is notcountable after all.

Alternatively, we can adapt Proposition 3.51 itself to show that R is uncountable,though this is slightly less transparent. The idea is that each element S of 2N can berepresented as an infinite sequence of 0s and 1s, with the nth number in the sequenceindicating whether n is a member of S. For instance, (1, 0, 1, 1, 0, . . .) represents S ⊂ 2N

such that 1 ∈ S, 2 /∈ S, 3 ∈ S, 4 ∈ S, 5 /∈ S, etc. We can interpret this sequence of 0s and1s as base-3 notation for a real number in the interval [0, 1).19 This is an injection from 2N

into R, and therefore |2N| ≤ R. Since we know from Proposition 3.51 that |N| < |2N|, itfollows that |N| < |R|.

Existence of roots. In Q, we couldn’t find a solution to x2 = 2. As the following propo-sition shows, the situation is much better in R.

Proposition 4.15. Let n ∈N, y ∈ R, and y > 0. Then there exists a solution x ∈ R to xn = y.

18Technically decimal expansions are not unique, since 0.999 . . . = 1. We get around this problem bybanning decimal expansions with infinite strings of repeating 9s.

19This gets around the problem in binary where, for instance, 0.01̄ = 0.1.

40

Page 41: 01 Fundamentals Lecture

Proof. Consider the set of all x ∈ R such that xn ≤ y. This set is nonempty and has anupper bound (for instance, it is bounded by 1 + y). Let us denote the upper bound by c.

We argue that cn = y. Suppose to the contrary that cn < y; we will show that for ajudiciously chosen r, we then have (c + r)n ≤ y as well, contradicting the definition of c.

First, for any r > 0, we obtain the following factorization and inequality:

(c + r)n − cn = r((c + r)n−1 + (c + r)n−2c + · · ·+ (c + r)cn−2 + cn−1)

≤ rn(c + r)n−1

If r ≤ c, then this becomes (c + r)n − cn ≤ rn2n−1cn−1, and setting r = min(c, (y −cn)/(n2n−1cn−1)), we obtain:

(c + r)n = cn + ((c + r)n − cn)

≤ cn + rn2n−1cn−1

≤ cn + (y− cn)

= y

This contradicts our definition of c as the least upper bound of the set xn ≤ y. We can ruleout the case cn > y using similar arguments.

Although this proposition exhibits a nice feature of R, it was a pain to prove; we hadto do a factorization and explicitly choose an r of the right size.

The intuition for the existence of a solution is much simpler. As a function of x,f (x) = xn is continuous. Furthermore, f (0) = 0 and f (y + 1) > y. Surely somewhere inbetween 0 and y + 1, there was a point x where f (x) = y. Otherwise, there would be adiscontinuity in the graph of xn, and that doesn’t seem likely.

There is a powerful formalization of this intuition called the Intermediate Value The-orem. First, however, we need to develop more basic analytic machinery, like the formalconcept of continuity itself; and before we do any of this, we will think carefully aboutthe notion of distance in order to define the general concept of a metric space. Once theseideas have been developed (in the lectures on analysis), propositions like Proposition 4.15will be much easier to prove.

Although it improves over Q in many ways, R has some flaws as well. For instance,we cannot take nth roots of every number. x2 = −1 has no real solution. Indeed, it hasno solution in any ordered field, as we can see from property O(d) in Proposition 4.4. Toremedy this, we’ll need to expand the reals to make an even larger field, the complexnumbers C. But again we’ll need to make some sacrifices: since x2 = −1 cannot have asolution in any ordered field, we will not be able to define any order on C that behaves sonicely with respect to arithmetic operations.

4.6 Complex numbers

The complex numbers arise from the real numbers via a straightforward approach tosolving the equation x2 = −1. Since there does not exist any x ∈ R that solves thisequation, why not just add another element that is defined to be a solution? In particular,

41

Page 42: 01 Fundamentals Lecture

why not create a field containing R and also some new number i, called the imaginaryunit, which satisfies i2 = −1?20 It turns out that it is possible to define such a field, whichwe call the field of complex numbers.

Definition 4.16 (Complex numbers). The field of complex numbers consists of orderedpairs of reals (a, b), which we will typically write as a + b · i, where i denotes the imag-inary unit. (Sometimes when a = 0 we simply write b · i, and when b = 0 we writea.)

The sum of a1 + b1 · i and a2 + b2 · i is defined to be (a1 + a2)+ (b1 + b2) · i. The additiveidentity is 0+ 0 · i, the additive inverse of a+ b · i is−a+ (−b) · i. The product of a1 + b1 · iand a2 + b2 · i is defined to be (a1a2− b1b2) + (a1b2 + a2b1) · i, the mutiplicative identity is1 + 0 · i, and the mutiplicative inverse of a + b · i is

1a + b · i =

a− b · ia2 + b2

Again, it is tedious to verify that all the properties of a field formally hold. It is easierfor this case than either the rationals or reals before it, however, because the definitionis simpler. For one, there is a bijection between complex numbers and ordered pairs ofreal numbers. Unlike the rationals, we are not muddying the waters with an equivalencerelation; and unlike the construction of the reals in Appendix A, the basic object we arestudying is just a pair of objects we have already defined, not some weird subset obeyingparticular conditions. And since C cannot possibly be an ordered field, we don’t have toworry about verifying those axioms anymore.

Definition 4.17 (Notation for complex numbers). Let z = a + bi be an element of C. Thenwe define:

1. The real part Re z = a.

2. The imaginary part Im z = b.

3. The complex conjugate z = a− bi.

4. The absolute value |z| =√

a2 + b2. (Also called modulus.)

Complex conjugation. The complex conjugate is especially interesting, since it is basedon a deep relationship between i and −i. All arithmetic operations on C can be derivedfrom the fact that i2 = −1; yet −i satisfies the exact same equation, with (−i)2 = −1.How, then, can we possibly distinguish between i and −i? The answer is that we can’t;if we have some equation in C, and we replace every i in the equation with −i, thenwe obtain an equivalent equation. A few implications of this principle are listed in thefollowing proposition.

20This sounds like a wild approach, but in fact the idea of extending a field with new elements in orderto satisfy some equation is very fundamental in field theory, and happens all the time—it’s called a fieldextension. We won’t go into this, since abstract algebra (outside linear algebra) doesn’t matter very much toeconomists.

42

Page 43: 01 Fundamentals Lecture

Proposition 4.18. The complex conjugate z has the following properties:

1. z + w = z + w

2. zw = zw

3. z is a root of the polynomial p(z) = zk + a1zk−1 + . . . + ak−1z + ak iff z is a root of theconjugate polynomial p(z) = zk + a1zk−1 + . . . + ak−1z + ak.

Proof. We let z = a1 + b1i and w = a2 + b2i.

1. z + w = (a1 + a2) + (b1 + b2)i, so z + w = (a1 + a2)− (b1 + b2)i = (a1− b1i) + (a2−b2i) = z + w.

2. zw = (a1a2 − b1b2) + (a1b2 + a2b1)i, so zw = (a1a2 − b1b2) − (a1b2 + a2b1)i. Wecompute zw = (a1 − b1i)(a2 − b2i) = (a1a2 − b1b2)− (a1b2 + a2b1)i as well.

3. If zk + a1zk−1 + . . . + ak−1z + ak = 0, then taking the conjugate of both sides andrepeatedly applying both properties 1 and 2, we have zk + a1 · zk−1 + . . . + ak−1 · z +ak = 0.

Since the complex conjugate has such nice properties, it is fortunate that we can re-trieve the real and imaginary parts of any complex number, as well as its modulus, solelywith formulas involving the conjugate.

Proposition 4.19. The real part, imaginary part, and modulus of any complex number z satisfy

Re z =z + z

2Im z =

z− z2

|z| =√

zz

We also have some useful formulas involving the modulus.

Proposition 4.20. The modulus |z| has the following properties:

1. |z| ≥ 0, with |z| = 0 iff z = 0.

2. |Re z| ≤ |z| and |Im z| ≤ |z|.

3. |z| = |z|

4. |zw| = |z||w|

5. |z + w| ≤ |z|+ |w|

Proof. Writing z = a + bi:

1. |z| = a2 + b2 ≥ 0, and a2 + b2 = 0 iff a = b = 0, which is equivalent to z = 0.

2. |Re z| = |a| ≤√

a2 + b2, and |Im z| = |b| ≤√

a2 + b2.

43

Page 44: 01 Fundamentals Lecture

3. |z| =√

z · z =√

zz = |z|

4. |zw| =√

zw · zw =√

zz√

ww = |z||w|

5. |z + w| =√(z + w)(z + w) =

√zz + wz + zw + ww =

√|z|2 + 2 · Re zw + |w|2 ≤√

|z|2 + 2|z||w|+ |w|2 = |z|+ |w|

Geometric interpretation of complex numbers: the complex plane We sometimes plotreal numbers on a one-dimensional axis, which depicts the concepts of order and distancebetween reals. Since complex numbers are defined as pairs of real numbers, it is naturalto plot them on a two-dimensional plane, called the complex plane.

Generally, if z = a + bi, we plot the real part a horizontally and the imaginary part bvertically. The horizontal axis, which consists of real numbers a, is called the “real” axis;and the vertical axis, which consists of “purely imaginary” numbers of the form bi (realmultiples of i) is called the “imaginary” axis. Then the modulus |z| is the length of the linesegment connecting a and b, and the conjugate z is the reflection of z across the horizontal(“real”) axis. This is depicted in Figure 4.1.

Something very interesting happens when we depict complex numbers in this manner.It is a fact from elementary geometry that if we rotate the ray from the origin to (a, b)by 90 degrees counterclockwise, we get a ray from the origin to (−b, a). Since i · (a +bi) = −b + ai, therefore, multiplication by i is equivalent to a 90 degree counterclockwiserotation. This gives us a clear geometrical interpretation of the identity i2 = −1: rotatingtwice by 90 degrees is equivalent to rotating once by 180 degrees, which is equivalent toreflecting across the origin, or multiplying by -1. See Figure 4.2.

In other words, we managed to solve the equation z2 = 1 by adding numbers thatcorrespond to rotations. This is in contrast to real numbers, which correspond to dilations:if we multiply a+ bi by some real number c, we get ca+ cbi, which is equivalent to dilatingthe ray from the origin to (a, b) by a factor of c, obtaining the ray to (ca, cb).21

This geometrical view is so useful that we will extend it to all complex numbers z.First, it is a fact that as we vary θ from 0 to 2π (in radians), (x(θ), y(θ)) = (cos θ, sin θ)traces out the unit circle, the locus of points (x, y) such that x2 + y2 = 1.22 Therefore,for any z = a + bi satisfying |z| = a2 + b2 = 1, there is a unique θ ∈ [0, 2π) such thatz = cos θ + i sin θ. More generally, for any complex z, it follows from Proposition 4.20 thatz/|z| has modulus 1, and then there is a unique θ ∈ [0, 2π) such that z/|z| = cos θ + i sin θ.This θ is so important that we introduce a name for it: the argument of z.

Definition 4.21 (Argument, trigonometric form). Let z be any complex number, and letθ be the unique value in [0, 2π) such that z/|z| = cos θ + i sin θ. We say that θ is theargument of z, and denote this relationship by θ = arg z.

21If c < 1, we might call this “squeezing”, while if c > 1 we might call this “stretching”.22We must depart a little from our careful, formal approach here, since to prove facts about sin and cos we

need to define them first; and unfortunately, there is not a good way to do this rigorously without analysis,which we will cover later. I assume that you are already familiar with the geometric meaning of sin andcos.

44

Page 45: 01 Fundamentals Lecture

z

z

Re z

Im z

|z|

Figure 4.1: The complex plane

ziz

i2z = −z

Figure 4.2: Multiplication by i

45

Page 46: 01 Fundamentals Lecture

This allows us to write any z in trigonometric form z = r(cos θ + i sin θ), where r = |z|is the modulus and θ = arg z is the argument of z.

The argument has a remarkable property: the argument of a product is equal to the sumof the arguments of each factor. This supplements the result we have already obtained inProposition 4.20, that the modulus of a product is equal to the products of each factor.

Proposition 4.22. Let z1 = r1(cos θ1 + i sin θ1) and z2 = r2(cos θ2 + i sin θ2) be complexnumbers expressed in trigonometric form. Then

z1z2 = r1r2[cos(θ1 + θ2) + i sin(θ1 + θ2)]

and, up to a multiple of 2π,arg z1z2 = arg z1 + arg z2

Proof. Directly multiplying z1 and z2 we have

z1z2 = r1r2((cos θ1 cos θ2 − sin θ1 sin θ2) + i(sin θ1 cos θ2 + cos θ1 sin θ2))

From trigonometry we know that cos(θ1 + θ2) = cos θ1 cos θ2 − sin θ1 sin θ2 and sin(θ1 +θ2) = sin θ1 cos θ2 + cos θ1 sin θ2. Therefore this becomes

z1z2 = r1r2[cos(θ1 + θ2) + i sin(θ1 + θ2)]

as desired. By definition, then, arg z1z2 = θ1 + θ2 up to a multiple of 2π; or equivalentlyarg z1z2 = arg z1 arg z2.

In the complex plane, the argument of z is the angle the positive real axis and the rayfrom the origin to z. What Proposition 4.22 means geometrically, therefore, is that theangle between the positive real axis and z1z2 is the sum of the angles between the positivereal axis and z1 and z2. Together with the fact that |z1z2| = |z1||z2|, from Proposition 4.20,this is depicted in Figure 4.3.

Euler’s formula. Suppose that we want to extend the usual exponential function ex onthe reals so that it takes complex values as well. How would we define this new functionez? One way is to observe that w(x) = ex is a solution to the differential equation w′ = w:the rate of change of the function equals the value of the function.23

Suppose that the extension to the complex plane, w(z) = ez, obeys the same relation-ship. Let us first consider the case where z is purely imaginary, so that z = iy for real y.Then defining v(y) ≡ w(iy) = eiy, applying the chain rule24 we have v′(y) = ieiy = iv.The derivative of v equals v times the imaginary unit i. Since multiplication by i is equiv-alent to counterclockwise rotation by 90 degrees in the complex plane, this means that v

23I apologize for bringing in calculus and differential equations before we formally cover them in mathcamp—this is the only way I know to obtain a clean intuition for complex exponentiation.

24Again, since we have not yet covered analysis, here we are sloppy and assume that some version of thechain rule holds for functions with real arguments and complex output. This turns out to be reasonable,since we can identify the space of complex outputs with the space of two-dimensional real outputs, andthen our results from multivariable calculus will go through.

46

Page 47: 01 Fundamentals Lecture

z1

z2

z1z2

|z1|

|z2|

|z1z2| = |z1||z2|

arg z1

arg z2

arg z1z2 = arg z1 + arg z2

Figure 4.3: arg z1z2 = arg z1 + arg z2

47

Page 48: 01 Fundamentals Lecture

moves in a direction perpendicular to its displacement from the origin. Visually, this correspondsto rotation around the origin—satellites travelling in circular orbits about a planet, for in-stance, have velocity perpendicular to their position relative to the planet. Indeed, it turnsout that v(y) traces out exactly the unit circle in the complex plane.

Proposition 4.23. Let g(y) = cos y + i sin y be the function tracing out the unit circle in thecomplex plane. Then v(y) = g(y); in other words, v(y) is the position on the unit circle that is atan angle of y radians counterclockwise from the positive real axis. This result, summarized as

eiy = cos y + i sin y

is known as Euler’s formula.

Proof. Sincev(0) = ei0 = e0 = 1 = cos 0 + i sin 0 = g(0)

it suffices using the uniqueness of solutions to first order ordinary differential equations25

to show that v and g satisfy the same differential equation, namely v′ = iv and g′ = ig.We have already demonstrated this for v, and can easily do the same for g:

g′(y) = − sin y + i cos y = i(cos y + i sin y) = ig(y)

We conclude that v = g.

This proposition defines the exponential for purely imaginary inputs; we can com-bine this with the conventional definition for real inputs and the usual properties of theexponential function to obtain a definition of ez for any complex z = x + iy:

ez = ex+iy = exeiy

(This is the same as the result obtained from solving the differential equation w′(z) =w(z) for all complex z, with the initial condition w(0) = 1.)

An alternative derivation of the exponential ez, which arrives at the same result, ex-tends the characterization of the exponential as a limit:

ez = limn→∞

(1 +

zn

)n

The plots in Figure 4.4, inspired by an animation on the Wikipedia page for Euler’s for-mula, shows the progression of this limit in a special case z = iθ.

Another common use of the exponential is to rewrite the trigonometric form given inDefinition 4.21 in essentially identical polar form instead.26

Definition 4.24. Let z be any complex number, and let z = r(cos θ + i sin θ) be its trigono-metric form. Then we define the polar form of z to be z = reiθ, using the identity inProposition 4.23 to replace cos θ + i sin θ in the trigonometric form with eiθ.

In polar form, the additivity result from Proposition 4.23 becomes even simpler, asit is merely another instance of the standard properties of the exponential function. Ifz1 = r1eiθ1 and z2 = r2eiθ2 , then z1z2 = r1r2eiθ1eiθ2 = r1r2ei(θ1+θ2).

25Again, apologies for the bad ordering of the course here; we will formally cover ODEs, includinguniqueness of solutions, later.

26Terminology here is used somewhat inconsistently.

48

Page 49: 01 Fundamentals Lecture

Figure 4.4:(

1 + iθn

)nfor n = 1, 2, 4, 8, 16 and θ = 7π/18

Computations with complex numbers. For many computations involving complex num-bers, it is far simpler to use trigonometric or polar form. This is true in particular for rootsand logarithms.

Definition 4.25. For any positive integer n, we define an nth root of unity to be somecomplex number ω satisfying ωn = 1.

Proposition 4.26. For any positive integer n, the nth roots of unity are the n numbers taking theform ω = e2πim/n for m = 0, . . . , n− 1.

Proof. Writing ω = reiθ, ω is an nth root of unity if ωn = 1 ⇐⇒ rnei(nθ) = 1 · ei0. Clearly,for this to be true we must have r = 1. We must also have nθ = 0 up to a multiple of 2π,or equivalently nθ = 2πm ⇐⇒ θ = 2πm/n for some integer m. If m1 −m2 is a multipleof n, then the corresponding θ are equal up to a multiple of 2π; therefore, there are onlyn values of m that lead to distinct angles θ, corresponding to the n possible values of mmodulo n. We can pick any n of these, and for convenience pick m = 0, . . . , n− 1.

Proposition 4.27. Let z = reiθ be any complex number, and n be any positive integer. Then thereare n complex nth roots w of z, which take the form w = ω n

√z, where ω can be any of the nth

roots of unity, and n√

z ≡ r1/neiθ/n.

Proof. Clearly n√

z is an nth root of z, since ( n√

z)n = (r1/neiθ/n)n = reiθ. Furthermore, theratio of any two nth roots of z must be an nth root of unity, since if both (z1)

n = z and(z2)

n = z , then (z1/z2)n = (z1)

n/(z2)n = 1. Therefore all nth roots of z must equal some

nth root of unity times n√

z, as desired.

Proposition 4.28. For any complex number z = reiθ, there are infinitely many values of thenatural logarithm log z, taking the form log z = log r + i(θ + 2πk), where k ranges over allintegers.

Proof. Let w = x + iy. Then w = log z ⇐⇒ exeiy = ew = z = reiθ, which is equivalentto ex = r and eiy = eiθ. This is true iff x = log r, under the usual definition of naturallogarithm for real numbers, and y = θ + 2πk for some k.

Since it is hard to deal with multiple-valued roots or multiple-valued logarithms, it issometimes convenient to pick a single value for each root or logarithm. The choice for

49

Page 50: 01 Fundamentals Lecture

logarithms is, in fact, more fundamental, since if we have a single value for the logarithmthere is a corresponding choice for any root as well:

w = log z =⇒ wn= log(z1/n) =⇒ ew/n = z1/n

Unfortunately, it is impossible to pick a single value of log z for each z such that thefunction log z is continuous on the complex plane. This is because there is no way topick the “angle” θ of each complex number z = eiθ such that θ is a continuous function ofz everywhere in the complex plane. If we use angles in the range θ ∈ [0, 2π), for instance,then there will be a discontinuity at the positive real axis, as the angle changes from justbelow 2π to 0.

We must make an arbitrary choice; roughly speaking, once that choice is made, wethrow away the points at which the angle would be discontinuous. Such a choice of asingle value for log z, which defines log z everywhere except the origin and a ray goingoutward from the origin, is called a branch of log z. The most popular branch, called theprincipal branch, chooses angles in the range θ ∈ (−π, π), and throws away the negativereal axis.27

Fundamental theorem of algebra. Finally we have the result that makes the field ofcomplex numbers so special.

Proposition 4.29 (Fundamental theorem of algebra). Let p(z) be a nonconstant polynomialwith complex coefficients, in the single complex variable z. Then p has at least one complex root:there exists some a ∈ C such that p(a) = 0.

There are many proofs of the fundamental theorem of algebra, each of which providessome algebraic or analytic insight. Unfortunately, such proofs are slightly beyond thescope of this course. Intuitively, however, the key contribution of C is that it offers rota-tions in addition to dilations; multiplication by a number eiθ on the unit circle correspondsto a rotation, while multiplication by a real number r corresponds to a dilation. There aremany polynomials, like z2 + 1, that do not have solutions in R because multiplication inR consists only of dilation, and the solution must involve rotation as well.

It will be important to keep this intuition in mind when we study linear algebra, wherethere are many direct analogies between certain types of linear maps on a vector spaceand corresponding types of complex numbers. (For instance, rotation in a vector space isclosely related to complex numbers eiθ on the unit circle, which represent rotation in thecomplex plane.)

Using polynomial division, we can go one step further than the fundamental theoremof algebra, and write a degree n polynomial as the product of n linear factors.

Proposition 4.30. Let p(z) be a polynomial of degree n with complex coefficients, in the singlecomplex variable z. Then we can write p(z) as the product

p(z) = c · (z− a1) · . . . · (z− an)

27This is all a little sloppy; if you have questions, please ask me or consult your friendly complex analysistextbook.

50

Page 51: 01 Fundamentals Lecture

where c ∈ C is a constant, and a1, . . . , an ∈ C are (not necessarily distinct) roots of C. Thisfactorization is unique up to the ordering of a1, . . . , an.

Proof. We prove this using induction on polynomial degree n. In the base case of degree0, a polynomial is simply a constant c, and the result is immediate.

For arbitrary degree n > 0, from the fundamental theorem of algebra we know thatp(z) has a root an. Using polynomial division, we can divide p(z) by (z− an) to obtain apolynomial q(z) of degree n− 1. Invoking the induction hypothesis, we know that q(z)can be written in the form q(z) = c · (z− a1) · . . . · (z− an−1), where a1, . . . , an−1 are rootsof q(z). Therefore p(z) = q(z) · (z− an) = c · (z− a1) · . . . · (z− an), with a1, . . . , an rootsof p(z), as desired.

To prove that the factorization is unique up to ordering of the roots, we use a similarinduction on degree. Clearly the factorization is unique in the base case of constant poly-nomials. Now suppose that we have two factorizations p(z) = c1 · (z− a1) · . . . · (z− an)and p(z) = c2 · (z− b1) · . . . · (z− bn) of the degree-n polynomial p. an is a root of p, andtherefore we must have bk = an for some k. Dividing p(z) by (z− an), we have a degreen− 1 polynomial q(z) that, by the induction hypothesis, has a unique factorization. Butthen our two factorizations of p must be the same; if they were distinct, then factors otherthan (z − an) = (z − bk) would have to be distinct, and this would imply two distinctfactorizations of q(z), which is impossible.

Corollary 4.31. Let p(x) be a polynomial of degree n with real coefficients, in the single realvariable x. Then we can write p(x) as the product

p(x) = c · (x2 + b1) · . . . . · (x2 + bk) · (x− d1) · . . . · (x− dm)

where d1, . . . , dm are the real roots of p(x), and 2k + m = n.

Proof. Write p as a polynomial in a complex variable z, or p(z), and apply Proposition4.30 to obtain a factorization c · (z− a1) · . . . · (z− an). Since the coefficients of p are real,p(z) = p(z), and we can write:

p(z) = p(z) = c(z− a1) · . . . · (z− an)

p(z) = c(z− a1) · . . . · (z− an)

By the unique factorization result in Proposition 4.30, p(z) must have a unique factoriza-tion, and therefore c = c, and a1, . . . , an is a rearrangement of a1, . . . , an. This means thatwe can group the nonreal ai (where Im ai 6= 0) into conjugate pairs.

Let d1, . . . , dm be the real members of the list a1, . . . , an; and let e1, e1, . . . , ek, ek be thenonreal members. Then

p(z) = c(z− a1) · . . . · (z− an)

= c(z− e1)(z− e1) · . . . · (z− ek)(z− ek) · (z− d1) · . . . · (z− dm)

= c(z2 + |e1|2) · . . . · (z2 + |ek|2) · (z− d1) · . . . · (z− dm)

Now, writing bi ≡ |ei|2, and assuming that z is real and writing x ≡ z = z, this reduces to

p(x) = c · (x2 + b1) · . . . . · (x2 + bk) · (x− d1) · . . . · (x− dm)

as desired.

51

Page 52: 01 Fundamentals Lecture

Both the complex factorization in Proposition 4.30 and the real factorization in Corol-lary 4.31 will have important consequences when we cover canonical forms in linear alge-bra. In general, when writing a real matrix in a canonical form, it will be possible either towrite it in a simpler complex canonical form, or a slightly more complicated real canoni-cal form. As we will see, these are the direct analogues of the factorizations in Proposition4.30 and Corollary 4.31.

52

Page 53: 01 Fundamentals Lecture

A Dedekind cut construction of the reals.

Definition A.1 (R, definition by Dedekind cuts). Each member of R is defined to be acertain subset of Q, called a cut. We define a cut to be a subset α ⊂ Q with the followingproperties:

1. α is not empty, and α 6= Q.

2. If p ∈ α, q ∈ Q, and q < p, then q ∈ α.

3. If p ∈ α, then p < r for some r ∈ α.

We define the order on cuts by set inclusion: α ≤ β if α ⊂ β. We define the sum of two cutsα + β to be the cut consisting of all rationals of the form r + s, where r ∈ α and s ∈ β. Wedefine the additive identity 0 to be the set of all negative rationals. We define the additiveinverse −α to be the cut consisting of all r such that there exists s > 0 where −r− s /∈ α.

We define the product of two positive cuts α · β, where α > 0 and β > 0, to be thecut consisting of all rationals of the form r · s, where r ∈ α and s ∈ β. When either α orβ is 0, we define the product to also be 0. In other cases, we define multiplication by thefollowing rule:

α · β =

−((−α) · β) α < 0, β > 0−(α · (−β)) α > 0, β < 0(−α) · (−β) α < 0, β < 0

We define the multiplicative identity 1 to be the set of all rationals less than 1. For α > 0,we define the multiplicative inverse 1/α to be the cut consisting of all r such that thereexists s > 0 where (1/r) · (1/s) /∈ α. For α < 0, we define 1/α to be −1/(−α).

It is very lengthy and tedious to verify that this construction actually produces anordered field with the least upper bound property. First, we have to verify that set inclu-sion is a legitimate order on cuts (in particular that it is a total order), and that it givesus the least upper bound property. We then have to verify that all the properties of afield hold: that the definition of addition, the additive identity, and the additive inverseare consistent with the properties of addition; that the definition of multiplication, themultiplicative identity, and the multiplicative inverse are consistent with the propertiesof multiplication; and that these definitions jointly give us the distributive property. Wealso must show that we have an ordered field. If you have a spare day at your disposaland no sense of boredom, you can try this yourself. Otherwise, we will content ourselvesby proving that the order satisfies the least upper bound property; for the rest, if you arestill curious, you can check an analysis textbook.

We can see here, however, the notion of building up a new system (the reals) from asimpler one (the rationals). This is a recurrent theme in mathematics. In fact, we did it forthe rationals too, which we defined as pairs of integers satisfying certain properties.

53