MA3H1 Topics in Number Theory Lecture Noteshomepages.warwick.ac.uk/~maslau/lecturenote.pdfChapter 6. p-adic numbers 37 1. Congruences modulo pm 37 2. p-adic Norm on Q 39 3. Sequences

MA3H1 Topics in Number Theory

Lecture Notes

Soma Purkait

Contents

Chapter 1. Revision 61. Divisibility 62. Rings and Ideals 73. Greatest Common Divisor 74. Primes and Irreducibles 85. ordp 9

Chapter 2. The ring Z/mZ, the unit group Um, primitive roots 111. Congruences 112. Solving Linear Diophantine equations 133. Primitive Roots 16

Chapter 3. Quadratic Reciprocity 181. Quadratic Residues and Non-Residues 182. Law of Quadratic Reciprocity 193. Mersenne Numbers 244. A Diophantine Equation 25

Chapter 4. Algorithms 261. Tonelli-Shanks 262. Fermat’s Factorization 283. Quadratic Sieve 29

Chapter 5. Introduction to Public Key Cryptography 321. RSA 332. Digital Signature Using RSA 343. Diffie-Hellman Key Exchange 34

Chapter 6. p-adic numbers 371. Congruences modulo pm 372. p-adic Norm on Q 393. Sequences and Series 414. Construction of Qp and Completeness 425. p-adic Digit Expansion 476. Hensel’s Lemma over Zp 487. Hasse Principle 52

Chapter 7. Geometry of Numbers 54

3

CONTENTS 4

1. Two Squares Theorem 542. Areas and Volumes 563. Four Squares Theorem 574. Proof of Minkowski’s Theorem 585. Quadratic Forms and Hasse-Minkowski 59

Chapter 8. Irrationality and Transcendence 631. Irrationality 632. Algebraic and Transcendental Numbers 643. Liouville’s Theorem 65

Acknowlegdements

I would like to thank Prof. Samir Siksek and Dr. Alex Bartel for theirsupport and for providing essential resources. A part of this lecture note isbased on the previously taught lecture note of Prof. Samir Siksek. I wouldalso like to thank my students for providing several corrections to earlierversions.

CHAPTER 1

Revision

1. Divisibility

Definition. Given two a, b ∈ Z, we say that a divides b if b = an for somen ∈ Z. We write a | b.

It is clear from the above definition that the following holds for alla, b, c ∈ Z:

(1) a | b.(2) a | b =⇒ a | bc.(3) a | b =⇒ ac | bc.(4) a | b and b | c =⇒ b | c.(5) a | b and b | a ⇐⇒ a = ±b.(6) a | b and a | c =⇒ a | (bx+ cy) for all x, y ∈ Z.(7) (±1) | a.(8) a | (±1) =⇒ a = ±1.(9) a | b and b 6= 0 =⇒ |a| ≤ |b|.

Example 1.1. 8 | (52n + 7) for all n ≥ 1.

Proof. For n = 1, we have 8 | 32 = 52 + 7. Now assume that it is truefor n = k. Then

52(n+1) + 7 = 52(52n + 7) + (7− 527).

Since 8 | 52(52n + 7) and 8 | (7− 527) = −168, it is proved by induction. �

Theorem 1.1. Let a, b ∈ Z with b > 0. Then there exist unique integers q,r such that a = qb+ r with 0 ≤ r < b.

Proof. Uniqueness: Suppose ∃ q, r, q′, r′ with 0 ≤ r, r′ < b such thata = qb + r = q′b + r′. WLOG let r′ ≥ r. Then (q − q′)b = r′ − r. Since0 ≤ r′ − r < b, we have r′ = r and q′ = q.Existence: Consider S = {a − xb | x ∈ Z and a − xb ≥ 0} ⊆ N. Note thatS 6= ∅ since a − (−|a|)b ≥ a + |a| ≥ 0. So applying well-ordering principlewe get that S has the least element say, r and r = a− qb for some q ∈ Z. Ifr ≥ b then a − (q + 1)b ∈ S but a − (q + 1)b < a − qb = r, a contradictionto r being the least element of S. So 0 ≤ r < b. �

Corollary 1.2. Let a, b ∈ Z with b 6= 0. Then there exist unique q, r ∈ Zsuch that a = qb+ r with 0 ≤ r < |b|.

Proof. Apply Theorem 1.1 to a and |b| > 0. �

6

3. GREATEST COMMON DIVISOR 7

2. Rings and Ideals

Definition. A ring R is a set together with binary operations +, · suchthat

(i) (R,+) is an abelian group.(ii) a · (b · c) = (a · b) · c for all a, b, c ∈ R.

(iii) ∃ 1 ∈ R such that 1 · a = a · 1 = a for all a ∈ R.(iv) a · (b+ c) = a · b+ a · c for all a, b, c ∈ R

If a · b = b · a for all a, b ∈ R then R is called a commutative ring.

Example 1.2. (Z,+, ·) is a commutative ring.

From now on we will consider only commutative rings.

Definition. A subring S ⊆ R is a subset that is a ring under the operationscoming from R.

Definition. An ideal I is a subring of R such that a · i ∈ I for all a ∈ R,i ∈ I. we write I E R.

Definition. For a1, a2, . . . ak ∈ R, the ideal generated by a1, a2, . . . ak is

〈a1, a2, . . . ak〉R = {∑k

i=1 riai | ri ∈ R}.

Definition. An ideal I E R is principal if I = 〈a〉R for some a ∈ R. Anintegral domain is called Principal Ideal Domain (PID) iff every ideal of Ris principal.

Theorem 1.3. (Z,+, ·) is a PID.

Proof. Let I E Z. If I = {0} then nothing to prove. Suppose I 6= {0}.Let b ∈ I be a non-zero element with the smallest absolute value. We provethat I = 〈b〉. Clearly 〈b〉 ⊆ I. Let a ∈ I. Then by Corollary 1.2, a = qb+ rfor some q, r ∈ Z with 0 ≤ |r| < b. If r 6= 0 then r = a − qb ∈ I, acontradiction to minimality of |b|. Thus r = 0 and a ∈ 〈b〉. �

Note. 1. I = 〈b〉 = 〈−b〉, so can always take non-negative generator.2. It follows from Theorem 1.1 that Z has an Euclidean function which isgiven by absolute value | · | : Z \ {0} → N, so Z is an Euclidean domain andhence a PID.3. Other such examples of Euclidean domains are Gaussian integers Z[i],polynomial rings K[x] over field K.

3. Greatest Common Divisor

Theorem 1.4. Let a1, a2, . . . an ∈ Z. Then there exists a unique integerd ≥ 0 satisfying

(i) d | ai for all i = 1, 2, · · ·n.(ii) If c | ai for all i = 1, 2, · · ·n, then c | d.

(iii) d = u1a1 + u2a2 + · · ·+ unan.

4. PRIMES AND IRREDUCIBLES 8

The integer d ≥ 0 is called the gcd of a1, a2, . . . an and we will denoted it byeither gcd(a1, a2, . . . an) = d or simply by (a1, a2, . . . an) = d

Proof. By Theorem 1.3 and the note above, there exists a unique d ≥ 0such that 〈a1, a2, . . . an〉 = 〈d〉. �

Note. 1. gcd(a, b) = 1⇔ ax+ by = 1 for some x and y ∈ Z.2. gcd(a1, a2, . . . an) = d⇔ gcd(a1d ,

a2d , . . .

and ) = 1.

Corollary 1.5 (Euler’s Lemma). If a | bc and gcd(a, b) = 1 then a | c.

Proof. gcd(a, b) = 1 ⇒ ax + by = 1 for some x, y ∈ Z ⇒ acx + bcy =c⇒ a | c. �

Euclid’s algorithm is based on following lemma

Lemma 1.6. If a = qb+ r then gcd(a, b) = gcd(b, r).

Proof. Clearly gcd(a, b) | gcd(b, r) and gcd(b, r) | gcd(a, b). Hencegcd(a, b) = ±gcd(b, r). Since gcd is non-negative, equality follows. �

Definition. m1,m2, . . .mn ∈ Z are coprime if gcd(m1,m2, . . .mn) = 1;they are called pairwise coprime if gcd(mi,mj) = 1 for all i 6= j.

Lemma 1.7. Let m1,m2, . . .mn ∈ Z be pairwise coprime such that mi | xfor all i. Then M | x where M = m1m2 · · ·mn.

Proof. We prove for n = 2, the rest follows by induction. Let m1 |x, m2 | x. So x = q1m1 for some q1 ∈ Z. Since m2 | x = q1m1 andgcd(m2,m1) = 1 by Euler’s lemma, m2 | q1. It follows that x = q1q2m1m2

for some q2 ∈ Z. �

4. Primes and Irreducibles

Definition. Let R be a ring, x ∈ R is called a unit iff ∃ x′ ∈ R such thatxx′ = 1. We denote by R× the group of units of R.

Definition. p ∈ R \ (R× ∪ {0}) is called prime if whenever p | ab then p | aor p | b.

Definition. p ∈ R \ (R×∪{0}) is called irreducible if whenever p = ab theneither a or b is a unit.

Recall that a prime number is a natural number > 1 that has no posi-tive divisors other than 1 and itself. So prime numbers are the irreducibleelements of Z that are > 1. Composites in Z are integers > 1 that are notprimes.

Theorem 1.8. In Z, prime elements and irreducible elements are the same.

Proof. Let p be a prime. Let p = ab. So p | a or b. WLOG let p | a.Since a | p we get a = ±p. But then p = ±pb and so b = ±1. Converselylet p be irreducible. Let p | ab. If p | a we are done. If p - a then since p isirreducible we get that gcd(a, p) = 1. Now by Euler’s lemma p | b. �

5. ordp 9

Example 1.3. Let R = Z[√−5] = {a + b

√−5 : a, b ∈ Z}. 2 ∈ R is irre-

ducible since if 2 = (a+ b√−5)(c+d

√−5) then taking complex conjugation

and multiplying we obtain 4 = (a2 +5b2)(c2 +5d2) which implies that eithera+ b

√−5 = ±1 or c+ d

√−5 = ±1. Note that R× = {±1}. 2 is not prime

since 2 | 6 = (1 +√−5)(1−

√−5) but 2 - (1±

√−5).

Theorem 1.9 (Fundamental theorem of arithmetic(FTA)). Every non-zerointeger n can be written uniquely (upto re-ordering and sign) as a finiteproduct of primes.

Proof. If n < 0 then write n = −m for m > 0. We use induction toprove the above statement for positive integers n. If n = 1, it is a productof empty set of primes. If n is a prime, nothing to prove. If n is compositethen n = ab where 1 < a, b < n. By induction hypothesis a, b can be writtenas a finite product of primes and so can be n. For uniqueness, use inductionagain! �

Theorem 1.10. (Euclid)There are infinitely many primes.

5. ordp

Definition. For a prime p and n ∈ Z \ {0} define ordp(n) = e if pe‖n.Define ordp(0) =∞.

Extend ordp to Q by defining ordp(ab ) = ordp(a) − ordp(b). We can

rephrase Theorem 1.9 as

Theorem 1.11 (FTA). For α ∈ Q× = Q \ {0}, α = ±∏p∈P p

ordp(α) whereP denotes set of primes.

Proof. Use Theorem 1.9 and ordp(pn) = n. �

Corollary 1.12.

(i) α = ±β ⇐⇒ ordp(α) = ordp(β) for all p ∈ P.(ii) α ∈ Q× is square ⇐⇒ ordp(α) is even for all p ∈ P.

(iii) If α ∈ Q×, then α = pordp(α) uv where p - u, v.

Proof. Exercise (use FTA). �

Theorem 1.13 (Properties of ordp). Let α, β ∈ Q. Then

(i) ordp(αβ) = ordp(α) + ordp(β).(ii) ordp(α + β) ≥ min{ordp(α), ordp(β)} with equality if ordp(α) 6=

ordp(β).

Note: the statement (ii) is not iff statement.

Proof. If either of α, β is zero then it clearly holds. Assume thatα, β ∈ Q× = Q \ {0}. Then by Corollary 1.12,

α = pma

band β = pn

c

d,

5. ordp 10

where p - a, b, c, d and ordp(α) = m and ordp(β) = n.(i) Clearly ordp(α, β) = m+ n.(ii) WLOG assume that m ≤ n. We have

α+ β = pma

b+ pn

c

d= pm

(ab

+ pn−mc

d

)= pm

(ad+ pn−mc

bd

).

Since n ≥ m we can write ad+ pn−mc = pte where t ≥ 0 and gcd(p, e) = 1.Therefore

α+ β = pm+t e

bdwhere p - e, b, d,

and so ordp(α+ β) = m+ t ≥ m = min{ordp(α), ordp(β)}.For the second part, suppose ordp(α) 6= ordp(β). WLOG assume that

m < n, i.e. n − m ≥ 1. If p | (ad + pn−mc) then p | ad, a contradic-tion. Hence t = 0 in the above arguments and so ordp(α + β) = m =min{ordp(α), ordp(β)}. �

CHAPTER 2

The ring Z/mZ, the unit group Um, primitive roots

1. Congruences

Definition. Let a, b ∈ Z and m be a positive integer. We say that a iscongruent to b modulo m iff m | (a− b). We write a ≡ b (mod m).

It is easy to see that for m ≥ 1, “congruence modulo m” denoted by ≡mis an equivalence relation on Z.

(i) reflexive: a ≡ a (mod m) for all a ∈ Z.(ii) symmetric: If a ≡ b (mod m) then b ≡ a (mod m).

(iii) transitive: If a ≡ b (mod m) and b ≡ c (mod m) then a ≡ c(mod m).

By Theorem 1.1 given an integer a, we can write a = qm+ r for uniqueq, r ∈ Z with 0 ≥ r < m, i.e., a ≡ r (mod m). Hence every integer iscongruent modulo m to precisely one of 0, 1, . . .m−1. Thus the equivalenceclasses under the relation ≡m can be represented precisely by the remainders0, 1, . . .m− 1.

Lemma 2.1 (Properties of ≡m).

(a) If a ≡ b (mod m), c ≡ d (mod m), then a + b ≡ c + d (mod m),ac ≡ bd (mod m).

(b) If a ≡ b (mod m) then a ≡ b (mod d) for all divisor d | m.(c) If a ≡ b (mod m) then ac ≡ bc (mod m) for all c ∈ Z.(d) If a ≡ b (mod m) then gcd(a,m) = gcd(b,m) (Euclid’s algorithm).(e) If ac ≡ bc (mod m) then a ≡ b (mod m

gcd(m,c)); in particular if

ac ≡ bc (mod m) and gcd(m, c) = 1 then a ≡ b (mod m).

Proof. (a), (b), (c) follow from properties of ≡m and divisibility. (d)is Lemma 1.6. For (e), since ac ≡ bc (mod m), we have m | (ac − bc). Letgcd(m, c) = d. Then ac − bc = mt for some t ∈ Z, so ac

d −bcd = m

d t. Hencemd |

cd(a − b). Since gcd(md ,

cd) = 1, by Euler’s lemma m

d | (a − b). Hencea ≡ b (mod m

d ). �

Note that in (d), we cannot do cancellation unless gcd(m, c) = 1.

Example 2.1. 8 ≡ 0 (mod 8) but 4 6≡ 0 (mod 8).

Definition. A complete residue system (CRS) modulo m is a set of m in-tegers {a1, a2, . . . am} such that ai 6≡ aj (mod m) whenever i 6= j.

11

1. CONGRUENCES 12

Example 2.2. The set of all remainders {0, 1, . . .m − 1} after division bym.

It is easy to check that:

• If {a1, a2, . . . am} is CRS modulo m, then any integer is congruentto precisely one of ai modulo m.• If {a1, a2, . . . am} is CRS modulo m, then so is {x + ca1, x +ca2, . . . , x+ cam} where x, c ∈ Z and gcd(c,m) = 1.

Definition. Let Z/mZ be the set of equivalence classes under ≡m,

Z/mZ = {[0]m, [1]m, . . . , [m− 1]m}

Define + and · on Z/mZ as follows:

[a]m + [b]m := [a+ b]m

[a]m · [b]m := [ab]m.

Then check that Z/mZ is a commutative ring under + and · as definedabove and [0]m is the additive identity, [1]m is the multiplicative identity.

The next lemma gives a criterion for existence of inverse modulo m

Lemma 2.2. Suppose a,m ∈ Z with m ≥ 1. Then ∃ b ∈ Z such that ab ≡ 1(mod m) iff gcd(a,m) = 1.

Proof. gcd(a,m) = 1 ⇔ ∃ b, c ∈ Z such that ab + mc = 1 ⇔ ab ≡ 1(mod m). �

Hence [a]m is invertible (i.e. ∃ b ∈ Z such that [a]m · [b]m = [ab]m = [1]m)⇐⇒ gcd(a,m) = 1. Thus the group of units modulo m is

Um := (Z/mZ)× = {[a]m ∈ Z/mZ | gcd(a,m) = 1}.

One can use Euclid’s algorithm to compute an inverse modulo m.Remark. For a prime p, Z/pZ = {[0]p, [1]p, . . . , [p− 1]p} is a commutativering under + and · defined above. Further [a]p is invertible iff p - a, so everynon-zero element of Z/pZ is invertible. Hence (Z/pZ,+, ·) is a field.

Definition. The Euler’s totient ϕ is a function ϕ : N −→ N defined by

ϕ(m) = #Um = #{[a]m ∈ Z/mZ | gcd(a,m) = 1}= #{a | 0 ≤ a < m and gcd(a,m) = 1}.

Definition. A reduced residue system (RRS) modulo m is a set of ϕ(m)integers {a1, a2, . . . , aϕ(m)} such that gcd(ai,m) = 1 for all i and ai 6≡ aj(mod m) whenever i 6= j.

If {a1, a2, . . . , aϕ(m)} is RRS modulo m then so is {ca1, ca2, . . . , caϕ(m)}whenever gcd(c,m) = 1.

2. SOLVING LINEAR DIOPHANTINE EQUATIONS 13

2. Solving Linear Diophantine equations

Lemma 2.3. Let a, b ∈ Z. Then

(i) ∃ x0 ∈ Z such that ax0 ≡ b (mod m) ⇐⇒ gcd(a,m) | b.(ii) If gcd(a,m) | b, then #{[x]m ∈ Z/mZ | ax ≡ b (mod m)} =

gcd(a,m).

Proof. (i) ∃ x0 ∈ Z such that ax0 ≡ b (mod m) ⇐⇒ ∃ x0, y0 ∈ Z suchthat ax0 −my0 = b ⇐⇒ gcd(a,m) | b.(ii) Let gcd(a,m) = d and d | b. Then b = ax0 −my0 for some x0, y0 ∈ Z.For any solution x, y ∈ Z we have ax−my = b = ax0−my0 ⇐⇒ a

d(x−x0) =md (y − y0) ⇐⇒ x = x0 + m

d t and y = y0 + ad t for some t ∈ Z (the last ⇔

follows by Euler’s lemma again as gcd(ad ,md ) = 1). Hence

#{[x]m ∈ Z/mZ | ax ≡ b (mod m)}

= #{[x0]m + [mt

d]m ∈ Z/mZ | ax0 ≡ b (mod m), t ∈ Z}

= #{t ∈ Z | 0 ≤ mt

d< m} = d.

�

Theorem 2.4 (Chinese Remainder Theorem). Let a1, a2, . . . , an ∈ Z andlet m1,m2, . . . ,mn be pairwise coprime positive integers. Let M =

∏ni=1mi.

Then there exists a unique [x]M ∈ Z/MZ such that x ≡ ai (mod mi) for alli = 1, 2, . . . n.

Proof 1. Consider the map

θ : Z/MZ −→ Z/m1Z× Z/m2Z× · · · × Z/mnZ[a]M −→ ([a]m1 , [a]m2 , . . . , [a]mn).

Note θ is a well-defined map since [a]M = [b]M ⇒ a ≡ b (mod M)⇒ a ≡ b(mod mi) ∀ i ⇒ [a]mi = [b]mi ∀ i. Check that θ is a ring homomorphism(i.e. θ([a+ b]M ) = θ([a]M ) + θ([b]M ) and θ([ab]M ) = θ([a]M ) · θ([b]M ) for all[a]M , [b]M ∈ Z/MZ and θ([1]M ) = ([1]m1 , · · · [1]mn). We first see that θ isinjective since

[x]M ∈ ker(θ) ⇐⇒ [x]mi = [0]mi for all i

⇐⇒ x ≡ 0 (mod mi) for all i

⇐⇒ mi | x for all i

⇐⇒ M | x (since mi are pairwise coprime)

⇐⇒ [x]M = [0]M ,

Hence

M = #Z/MZ = #im(Z/MZ) ≤ #(Z/m1Z× Z/m2Z× · · · × Z/mnZ)

= m1m2 · · ·mn = M.


Thus θ is injective and surjective and therefore a ring isomorphism. Thisgives CRT as follows:

Given a1, a2, . . . , an ∈ Z, consider ([a]m1 , [a]m2 , . . . , [a]mn) ∈ Z/m1Z ×Z/m2Z × · · · × Z/mnZ. Since θ is an isomorphism, there exists a unique[x]M ∈ Z/MZ such that [x]mi = [ai]mi for all i = 1, 2, . . . , n. �

This is existensial proof. Let us look at a constructive proof.

Proof 2. Let Mi =∏j 6=imj . Since mi’s are pairwise coprime, we have

gcd(Mi,mi) = 1 and so ∃ xi ∈ Z such that Mixi ≡ 1 (mod mi). Note thatMixi ≡ 0 (mod mj) for all j 6= i. Define

x =

n∑i=1

ai(Mixi),

then x ≡ ai (mod mi) for all i = 1, 2, . . . , n. If y is any other integersatisfying the congruence, then y ≡ ai ≡ x (mod mi) for all i. So mi | (x−y)for all i which implies that M | (x − y). Hence x ≡ y (mod M). So[x]M ∈ Z/MZ is a unique such element. �

Corollary 2.5. If gcd(m,n) = 1 then ϕ(mn) = ϕ(m)ϕ(n).

Proof 1. By CRT, Z/mnZ ∼= Z/mZ × Z/nZ. So the unit groups areisomorphic, i.e., Umn ∼= (Z/mZ× Z/nZ)× ∼= Um × Un. So we are done.

�

Proof 2. Note that ϕ(mn) is the number of positive integers less thanmn that are coprime to m and n. Write {1, 2, . . .mn} as follows:

1 m+ 1 · · · (j − 1)m+ 1 · · · (n− 1)m+ 1...

......

...i m+ i · · · (j − 1)m+ i · · · (n− 1)m+ i...

......

...m 2m · · · jm · · · nm

For each i where 1 ≤ i ≤ m we have gcd(i,m) = 1 ⇔ gcd((j − 1)m +i,m) = 1 for all 1 ≤ j ≤ n. So to choose elements that are coprime to mand n, need to only look at ith row such that gcd(i,m) = 1. There are ϕ(m)such rows.

Since gcd(m,n) = 1, each such row is CRS modulo n and hence hasprecisely ϕ(n) elements coprime to n. Hence we are done. �

Lemma 2.6. If p is prime, then ϕ(pr) = pr − pr−1 = pr−1(p− 1).


Proof.

ϕ(pr) = #{m | 0 ≤ m < pr and gcd(m, pr) = 1}= #{m | 0 ≤ m < pr and gcd(m, p) = 1}= #(Z/prZ)−#{m | 0 ≤ m < pr and p | m}= #(Z/prZ)−#{pt |t ∈ Z and 0 ≤ pt < pr}= pr − pr−1.

�

Corollary 2.7. If m =∏ni=1 p

rii , then ϕ(m) =

∏ni=1 p

ri−1i (pi − 1).

Proof. Follows from Corollary 2.5 and Lemma 2.6. �

Corollary 2.8. For any n ∈ N,∑

d|n ϕ(d) = n.

Proof. Partition {1, 2, . . . , n} into subsets as follows:

{1, 2, . . . , n} =⊔d|n

{a ∈ {1, 2, . . . , n} | gcd(a, n) = d}

=⊔d|n

{a ∈ {1, 2, . . . , n} | a = da′, gcd(a′,n

d) = 1}

=⊔d|n

{a′ ∈ {1, 2, . . . , nd} | gcd(a′,

n

d) = 1}}.

Hence n = #LHS =∑

d|n ϕ(nd ) =∑

d|n ϕ(d). �

Before moving further let us recall some basic facts from group theorythat will be useful.

Lemma 2.9. Let (G, ·) be a group. Let a ∈ G.

(i) If order of a is d and ae = 1 then d | e.(ii) If order of a is d then order of ah is d

gcd(h,d) .

(iii) Let G be abelian group and a, b ∈ G with orders m,n respectively.If gcd(m,n) = 1 then order of ab is mn.

(iv) (Lagrange’s Theorem) If G is a finite group and H ≤ G is a sub-group then #H | #G. In particular if a ∈ G and order of a is dthen d | #G.

(v) If #G is n and a is an element of order n then G = 〈a〉 is a cyclicgroup.

(vi) If G is cyclic group of order n, say G = 〈a〉, then for every divisord | n, there are exactly ϕ(d) elements of order d, namely

{aknd | 0 ≤ k < d and gcd(k, d) = 1}.

Proof. Exercise. For (vi) use Corollary 2.8. �

Theorem 2.10 (Euler’s Theorem). If a is coprime to m then aϕ(m) ≡ 1(mod m).

3. PRIMITIVE ROOTS 16

Proof. If gcd(a,m) = 1 then [a]m ∈ Um. Since #Um = ϕ(m), applying

Lagrange’s Theorem (Lemma 2.9 (iv)) to [a]m we have [a]ϕ(m)m = [1]m, i.e.,

aϕ(m) ≡ 1 (mod m). �

Theorem 2.11 (Fermat’s Little Theorem (FLT)). Let p be a prime anda ∈ Z. Then

ap−1 ≡ 1 (mod p) if p - a.Further,

ap ≡ a (mod p) for any a ∈ Z.

Proof. Take m = p in Theorem 2.10. Since ϕ(p) = p − 1, we haveap−1 ≡ 1 (mod p) for all a ∈ Z such that gcd(a, p) = 1. Multiply a on bothsides to get the second statement. �

3. Primitive Roots

Definition. Letm ∈ Z such thatm ≥ 1. Let a ∈ Z such that gcd(a,m) = 1,then we define the order of a modulo m to be the order of [a]m in Um.

Definition. Let a ∈ Z such that gcd(a,m) = 1. Then a ∈ Z is a primitiveroot modulo m if a has order ϕ(m) modulo m.

Note. It follows by applying Lemma 2.9 (ii),(v),(vi) to the unit group Umthat(i) A primitive root modulo m exists iff Um is cyclic.(ii) If a is a primitive root modulo m, then so is ah if gcd(h, ϕ(m)) = 1.There are precisely ϕ(ϕ(m)) distinct primitive roots modulo m in Um.

Theorem 2.12 (Lagrange’s Theorem). Let p be a prime and let f(x) =∑ni=0 aix

i ∈ Z/pZ[x] be a non-zero polynomial of degree n (i.e., p - an).Then f(x) ≡ 0 (mod p) has at most n solutions in Z/pZ.

Proof. We prove it by induction. For n = 1, f(x) = a1x+a0 such thatp - a1. By Lemma 2.3(ii), f(x) has exactly one solution in Z/pZ.

Assume true for n = k. Let f(x) =∑k+1

i=0 aixi such that p - ak+1. If

f(x) has no solutions in Z/pZ, then we are done. Otherwise let [r]p ∈ Z/pZsuch that f(r) ≡ 0 (mod p). So

f(x) = f(x)− f(r) =

k+1∑i=0

ai(xi − ri)

=

k+1∑i=1

ai(x− r)

i−1∑j=0

xi−1−jrj

= (x− r)

k+1∑i=1

ai

i−1∑j=0

xi−1−jrj

= (x− r)g(x),

3. PRIMITIVE ROOTS 17

where deg(g) = k. By induction hypothesis, g has at most k solutions inZ/pZ. Since x−r has precisely one solution in Z/pZ, f(x) has at most k+1solutions in Z/pZ. �

Remark. Same proof works if we take f to be a polynomial in K[x] for anyfield K instead of Z/pZ.

Example 2.3. Above theorem does not hold over a ring that is not a field.Let R = (Z/8Z,+, ·). Then the polynomial x2− 1 in Z/8Z[x] has four rootsin Z/8Z, namely [1]8, [3]8, [5]8 and [7]8. Note that x2− 1 ≡ (x− 1)(x− 7) ≡(x− 3)(x− 5) (mod 8).

Corollary 2.13. Let p be a prime and let d ≥ 1 such that d | (p− 1). Thenxd ≡ 1 (mod p) has distinct d solutions in Z/pZ.

Proof. Since d | (p− 1), we have p− 1 = kd for some k ≥ 1. So

xp−1 − 1 = (xd − 1)(xd(k−1) + xd(k−2) + · · ·+ 1).

Note that xp−1 − 1 has precisely p − 1 distinct solutions in Z/pZ, namely1, 2, . . . , p − 1 by Fermat’s little theorem. Also by Lagrange’s theoremxd(k−1) + xd(k−2) + · · ·+ 1 has at most d(k− 1) = p− 1− d solutions. Hencexd−1 has at least d distinct solutions, so applying Lagrange’s theorem againwe see that xd − 1 has precisely d distinct solutions in Z/pZ. �

Theorem 2.14. If p is a prime then there exists a primitive root modulo p.

Proof. We want to find an integer a coprime to p such that a has orderp − 1 modulo p. One can write p − 1 = qe11 q

e22 · · · qerr where qi are distinct

primes. For each i, by Corollary 2.13, xqeii ≡ 1 (mod p) has exactly qeii

distinct solutions in Z/pZ and xqei−1i ≡ 1 (mod p) has exactly qei−1i distinct

solutions in Z/pZ. Therefore there exist ai ∈ Z such that aiqeii ≡ 1 (mod p)

but aiqei−1i 6≡ 1 (mod p), i.e., order of [ai]p is qeii . Take a = a1a2. · · · ar.

Then by applying Lemma 2.9(iii) to Up we get that order of [a]p is p−1. �

Corollary 2.15.

(i) Up is a cyclic group of order p− 1.(ii) There are precisely ϕ(p− 1) distinct primitive roots modulo p.

Theorem 2.16. If F is a field and G is a finite subgroup of F \ {0}. ThenG is a cyclic group.

Proof. Proof is similar to the proof of Theorem 2.14, essentially usingLagrange’s Theorem 2.12 for general fields. �

CHAPTER 3

Quadratic Reciprocity

1. Quadratic Residues and Non-Residues

Definition. Let gcd(a,m) = 1. We say that a is a quadratic residue modulom if the congruence x2 ≡ a (mod m) has a solution. Otherwise we say thata is a quadratic non-residue modulo m.

Example 3.1. Note that

12 ≡ 62 ≡ 1, 22 ≡ 52 ≡ 4, 32 ≡ 42 ≡ 2 (mod 7).

Hence the quadratic residues modulo 7 are 1, 2 and 4. The quadratic non-residues modulo 7 are 3, 5 and 6.

We will focus on quadratic residues modulo primes and return to qua-dratic residues modulo arbitrary positive integers later.

Lemma 3.1. Let p be an odd prime and let g be a primitive root modulo p.

• The quadratic residues modulo p are of the form gr where 0 ≤ r ≤p− 2 and r is even.• The quadratic non-residues are of the form gr where 0 ≤ r ≤ p− 2

and r is odd.

In particular, exactly half the non-zero residues are quadratic residues mod-ulo p and the other half are quadratic non-residues.

Proof. Let g be a primitive root modulo p. Modulo p, the integers 1 ≤a ≤ p− 1 are a rearrangement of the integers 1, g, . . . , gp−2, since both listsare reduced residue systems. Note that gr is certainly a quadratic residuemodulo p for all even integers r. Let us prove the converse. Suppose thatgr ≡ x2 (mod p). Then we can write x ≡ gs (mod p) for some 0 ≤ s ≤ p−2.Thus gr−2s ≡ 1 (mod p). As g is a primitive root, p− 1 divides r− 2s. Butp − 1 is even so r − 2s is even and so r is even. Thus we know that gr isa quadratic residue modulo p if and only if r is even. Hence the quadraticresidues modulo p are 1, g2, g4, . . . , gp−3 and the quadratic non-residues areg, g3, g5, . . . , gp−2. This proves the lemma. �

Lemma 3.2. Let p be an odd prime and g a primitive root modulo p. Then

g(p−1)/2 ≡ −1 (mod p).

Proof. Let h = g(p−1)/2. Then h2 = gp−1 ≡ 1 (mod p). So p | (h2 −1) = (h + 1)(h − 1). Hence h ≡ ±1 (mod p). If h ≡ 1 (mod p) then

18

2. LAW OF QUADRATIC RECIPROCITY 19

g(p−1)/2 ≡ 1 (mod p) contradicting the fact that the order of g (a primitiveroot) is exactly p− 1. Hence h ≡ −1 (mod p) which is what we want. �

Definition. Let p be an odd prime. Let(a

p

)=

1 if a is a quadratic residue modulo p

−1 if a is a quadratic non-residue modulo p

0 if p | a.

The symbol

(a

p

)is called a Legendre symbol.

The Legendre symbol is extremely convenient for discussing quadraticresidues.

Example 3.2. From earlier example we have(0

7

)= 0,

(1

7

)=

(2

7

)=

(4

7

)= 1,

and (3

7

)=

(5

7

)=

(6

7

)= −1.

Proposition 3.3. Let p be an odd prime, and a, b integers.

(i) If a ≡ b (mod p) then

(a

p

)=

(b

p

).

(ii) (Euler’s Criterion)

(a

p

)≡ a(p−1)/2 (mod p).

(iii) For integers a, b we have

(ab

p

)=

(a

p

)(b

p

).

Proof. (i) follows straightaway from the definition, and (iii) followsfrom (ii). Let’s prove (ii). Let a be an integer. If p | a then(

a

p

)= 0 ≡ a(p−1)/2 (mod p).

Hence suppose that p - a. Let g be a primitive root modulo p. We knowfrom Lemma 3.1 that a ≡ gr (mod p) for some 0 ≤ r ≤ p− 2 and that r iseven if and only if a is a quadratic residue. Hence

a(p−1)/2 ≡(g(p−1)/2

)r≡ (−1)r (mod p)

by Lemma 3.2. This proves (ii). �

2. Law of Quadratic Reciprocity

The main theorem on quadratic reciprocity is the Law of QuadraticReciprocity.

Theorem 3.4. Let p and q be distinct odd primes. Then


(a) (Law of Quadratic Reciprocity)

(p

q

)(q

p

)= (−1)

(p−1)2

(q−1)2 .

(b) (First Supplement to the Law of Quadratic Reciprocity)(−1

p

)=

{1 if p ≡ 1 (mod 4)

−1 if p ≡ 3 (mod 4).

(c) (Second Supplement to the Law of Quadratic Reciprocity)(2

p

)=

{1 if p ≡ 1, 7 (mod 8)

−1 if p ≡ 3, 5 (mod 8).

Remark. Note that we can rephrase the Law of Quadratic Reciprocity asfollows:

(pq

)= −

(qp

)if p ≡ q ≡ 3 (mod 4)(

pq

)=(qp

)if p ≡ 1 or q ≡ 1 (mod 4)

Example 3.3. Is 94 a square modulo 257? One way to decide this is to runthrough the integers x = 0, 1, . . . , 256 and see if 94 ≡ x2 (mod 257). It ismuch quicker to use Proposition 3.3 and the Law of Quadratic Reciprocity.(

94

257

)=

(2

257

)(47

257

)by Proposition 3.3

=

(47

257

)using the second supplement

=

(257

47

)since 257 ≡ 1 (mod 4)

=

(22

47

)257 ≡ 22 (mod 47)

=

(2

47

)(11

47

)=

(11

47

)using the second supplement

= −(

47

11

)11 ≡ 47 ≡ 3 (mod 4)

= −(

3

11

)=

(11

3

)3 ≡ 11 ≡ 3 (mod 4)

=

(2

3

)11 ≡ 2 (mod 3)

= −1 using the second supplement.

Hence 94 is not a square modulo 47.


Exercise. Is 1729 a square modulo 2011?The proof of the first supplement is straightforward.

Proof of the First Supplement. By Euler’s Criterion,(−1

p

)≡ (−1)

p−12 (mod p).

Thus(−1p

)= 1 if and only if (p− 1)/2 is even. This is the case if and only

if p ≡ 1 (mod 4). �

To prove the Law of Quadratic Reciprocity we need Gauss’ Lemma.

Theorem 3.5 (Gauss’ Lemma). Let p be an odd prime and write

S =

{1, 2, . . . ,

p− 1

2

}.

For integer n let n be the unique integer satisfying n ≡ n (mod p) and−p/2 < n < p/2. Let p - a and let

aS = {as : s ∈ S}.

Define µ(a) to be the number of negative members of the set aS. Then(a

p

)= (−1)µ(a).

Example 3.4. Let us determine(

311

)using Gauss’ Lemma. Note that

S = {1, 2, 3, 4, 5}

and

3S = {3, 6, 9, 12, 15} = {3,−5,−2, 1, 4}.

Thus µ(3) = 2 and so(

311

)= 1.

Proof of Gauss’ Lemma. We will show that (−1)µ(a)a(p−1)/2 ≡ 1(mod p). Gauss’ Lemma will then follow from Euler’s Criterion.

By definition, µ(a) is the number of negative elements in aS. Let |aS| ={|as| : s ∈ S}. We claim that |aS| = S. Let’s assume this for the momentand use it to complete the proof. We will return to prove the claim later on.


Now∏s∈S

s =∏t∈|aS|

t as S = |aS|

=∏s∈S|as| by definition of |aS|

= (−1)µ(a)∏s∈S

as as = −|as| for precisely µ(a) values of s ∈ S

≡ (−1)µ(a)∏s∈S

as since as ≡ as (mod p)

≡ (−1)µ(a)a(p−1)/2∏s∈S

s (mod p) since #S = (p− 1)/2.

Cancelling∏s∈S s we obtain the desired conclusion that (−1)µ(a)a(p−1)/2 ≡ 1

(mod p).

It remains to prove our claim that |aS| = S. Suppose s ∈ S. Then−p/2 < as < p/2 so 0 ≤ |as| < p/2. But as 6= 0 since p - a and p - s. Hence

as ∈ S. This shows that |aS| ⊆ S. To show that the two sets are equal,we must show that the have the same number of elements. Suppose that s,t ∈ S satisfy |as| = |at|. Then as ≡ ±at (mod p) and so s ≡ ±t (mod p).But −p/2 < s,±t < p/2, so their difference can’t be divisible by p unless it

is 0. Thus s = ±t. But s, t ∈ S so s = t. This shows that |aS| has as manyelements as S, completing the proof. �

Gauss’ Lemma enables us to prove the second supplement to the Law ofQuadratic Reciprocity.

Proof of the Second Supplement. We want to show that(2

p

)=

{1 if p ≡ 1, 7 (mod 8)

−1 if p ≡ 3, 5 (mod 8).

Consider the case p ≡ 1 (mod 8); the other cases are similar and are left asan exercise. Then p = 8m+ 1 for some integer m. Here (p−1)/2 = 4m. We

will apply Gauss’ Lemma to determine(2p

). For this we need to compute

2x where x = 1, 2, . . . , 4m. Now for x = 1, 2, . . . , 2m we have 0 < 2x < p/2

and so 2x = 2x which is positive. However, for x = 2m+ 1, 2m+ 2, . . . , 4mwe have p/2 < 2x < p and 2x = 2x−p which is negative. Hence µ(2) = 2m,so by Gauss’ Lemma (

2

p

)= 1.

�


Proof of the Law of Quadratic Reciprocity. The first proof isdue to Gauss and he altogether gave eight different proofs of LQR, and thereare hundreds of published proofs. The proof we give is due to Eisenstein. Itstarts with the following trigonometric identity. Let m be an odd positiveinteger and let

Sm =

{1, 2, 3, . . . ,

m− 1

2

}.

Then

(1)sinmx

sinx= (−4)(m−1)/2

∏t∈Sm

(sin2 x− sin2 2πt

m

).

Observe that if u ≡ v (mod p) then 2πu/p and 2πv/p differ by a multipleof 2π and so sin(2πu/p) = sin(2πv/p). Let sgn(u) denote the sign of u sothat u = sgn(u)|u|. Then sin(2πu/p) = sgn(u) sin(2π|u|/p). Now for s ∈ Sp,

qs ≡ qs ≡ sgn(qs)|qs| (mod p).

Thus

sin2πqs

p= sgn(qs) sin

2π|qs|p

.

In the notation of Gauss’ Lemma, exactly µ(q) of the qs are negative. Hence(q

p

)=∏s∈Sp

sgn(qs).

From the last two equations,∏s∈Sp

sin2πqs

p=

(q

p

) ∏s∈Sp

sin2π|qs|p

.

However, from the proof of Gauss’s Lemma, {|qs| : s ∈ Sp} = |qSp| = Sp.Hence ∏

s∈Sp

sin2πqs

p=

(q

p

) ∏s∈Sp

sin2πs

p,

which can be rewritten as(q

p

)=∏s∈Sp

sin(2πqs/p)

sin(2πs/p).

Using the identity (1) with q = m and x = 2πs/p we obtain(q

p

)=∏s∈Sp

(−4)(q−1)/2∏t∈Sq

(sin2(2πs/p)− sin2(2πt/q)

)= (−4)(p−1)(q−1)/4

∏s∈Sp,t∈Sq

(sin2(2πs/p)− sin2(2πt/q)

),

3. MERSENNE NUMBERS 24

as Sp has (p− 1)/2 members. Now interchanging p and q we have(p

q

)= (−4)(q−1)(p−1)/4

∏s∈Sp,t∈Sq

(sin2(2πt/q)− sin2(2πs/p)

).

The right-hand sides of the last two equations are identical except for aminus sign for each term in the product. But there are (#Sp)(#Sq) =(p− 1)

2

(q − 1)

2terms in the product. Thus(

q

p

)(p

q

)= (−1)

(p−1)2

(q−1)2 ,

completing the proof. �

Next we are going to look at some applications of the Law of QuadraticReciprocity.

3. Mersenne Numbers

You have met the Mersenne numbers Mn = 2n − 1 in the homework,and know that if n is composite then so is Mn. What if n = q is prime; isMq necessarily prime? Computing the first few we find

M2 = 3, M3 = 7, M5 = 31, M7 = 127,

which are all prime numbers. Now M11 = 2047. Is it a prime? The followingtheorem gives us a large supply of Mersenne numbers Mq where q is primebut Mq is composite.

Theorem 3.6. Let q ≡ 3 (mod 4) be a prime such that p = 2q + 1 is alsoprime. Then p divides Mq. In particular, for such q > 3, Mq is composite.

Before proving Theorem 3.6 let us apply it with q = 11. Note that q ≡ 3(mod 4) and p = 2q+ 1 = 23 is prime. Then according to the Theorem 3.6,p divides Mq and indeed we find that M11 = 2047 = 23 × 89. You can usethe same argument to find a factor of Mq for

q = 11, 23, 83, 131, 179, 191, 239, 251, 359, 419, 431, 443, 491, 659, . . .

Proof of Theorem 3.6. Since q ≡ 3 (mod 4), we have that p = 2q+1 ≡ 7 (mod 8). Hence (

2

p

)= 1.

But by Euler’s Criterion

2q = 2p−12 ≡

(2

p

)= 1 (mod p).

Hence Mq = 2q − 1 is divisible by p. To prove the last statement in The-orem 3.6, observe that Mq is composite if Mq > p. This is the same as2q − 1 > 2q + 1 which is satisfied if q > 3. �

4. A DIOPHANTINE EQUATION 25

4. A Diophantine Equation

A Diophantine equation is one where we are interested in integer solu-tions. It can be very hard to determine all the solutions of a Diophantineequations (e.g. Fermat’s Last Theorem). However, quadratic reciprocity cansometimes be used to show that there are no solutions. Here is an example.

Theorem 3.7. The equation

y2 = x3 − 5

has no solutions with x, y ∈ Z.

Proof. We proceed by contradiction. Suppose that x, y ∈ Z satisfyy2 = x3 − 5. If x is even then y2 ≡ −5 ≡ 3 (mod 4) which is impossible asthe squares modulo 4 are 0 and 1. Thus x is odd. Now rewrite the equationas

y2 + 4 = x3 − 1 = (x− 1)(x2 + x+ 1).

Note that x2 + x + 1 = odd + odd + odd and so is odd. Let p be a primedivisor of x2 + x+ 1. Then p | (y2 + 4) and so y2 ≡ −4 (mod p). Hence(

−1

p

)= 1.

Thus p ≡ 1 (mod 4). As this is true of all prime divisors of x2 + x + 1 wehave

x2 + x+ 1 ≡ 1 (mod 4).

If x ≡ 1 (mod 4) then x2 + x + 1 ≡ 3 (mod 4) giving a contradiction.Hence x ≡ 3 (mod 4). Hence y2 ≡ x3 − 5 ≡ 3 − 5 ≡ 2 (mod 4), which isimpossible. �

CHAPTER 4

Algorithms

In this section we will study algorithms to compute square roots moduloa given odd prime p. We will also study some factoring methods includingQuadratic Sieve.

In previous chapters we have already seen some algorithms:

• Euclid’s algorithm : (i) To compute gcd of two numbers, (ii) tocompute inverse modulo n.• Chinese Remainder Theorem: To solve simultaneous linear congru-

ences.• Primitive roots modulo prime p: Write p − 1 = qr11 q

r22 · · · q

rkk . To

compute primitive root modulo p either(i) For each 1 ≤ i ≤ k find an element gi of order qrii in Up, then∏ki=1 gi is a primitive root; or

(ii) Through a process of iteration find an element g ∈ Up such that

gp−1qi 6= 1 for all i. Then g is a primitive root.

Example 4.1. We find a primitive root modulo 19. Write 19− 1 = 2× 32.We need to find g1, g2 ∈ U19 of order 2, 9 respectively. Take g1 = −1, it hasorder 2. Note 29 = 512 ≡ −1 (mod 19), hence 4 has order 9 modulo 19.Take g2 = 4. Hence −4 ≡ 15 is a primitive root modulo 19.

In the previous chapter we proved LQR which is a powerful tool to decidewhether a give integer is a square modulo a given prime. Note that everynumber is a square modulo 2.

One can easily see by using LQR and the second supplement that(2297

)=(

297

) (1197

)=(9711

)=(

911

)= 1, so 22 is a square modulo 97. The question is

to find square roots of 22 modulo 97, i.e. to find x ∈ Z such that x2 ≡ 22(mod 97). In order to this we will use algorithm of Tonell-Shanks.

1. Tonelli-Shanks

Let p be an odd prime. For integers a with(ap

)= 1, we want to find

square roots of a modulo p. Write p − 1 = 2e · q with q odd. Since Upis group of order 2e · q, by Sylow’s Theorem1, Up has a 2-Sylow subgroup,say H, of order 2e. Since Up is cyclic, H is also cyclic. Let H = 〈z〉 for

1Sylow’s theorem: Let G be a finite group such that pk‖#G, then ∃ a subgroup H ofG such that #H = pk. H is called p-Sylow subgroup of G.

26

1. TONELLI-SHANKS 27

some z of order 2e. Note that squares in H are precisely the elements of

order dividing 2e−1 and are also even powers of z. Since(ap

)= 1 by Euler’s

criterion (aq)2e−1

= ap−12 ≡ 1 (mod p) and so b = aq (mod p) is a square in

H. Thus 1/b is a square in H and therefore there exists en even integer k

with 0 ≤ k < 2e such that 1/b = zk, i.e., aqzk = 1 in H. Let x = aq+12 zk/2.

Then x2 = aq+1zk = a in H. Thus x2 ≡ a (mod p).Thus we need to find (i) a generator z of H and (ii) exponent k.

Exercise: n is a quadratic non-residue modulo p iff nq is a generator of H.

Algorithm 4.1 (Tonelli-Shanks algorithm). Let p be a prime.

Step 1: Write p− 1 = 2e · q where q is odd.

Step 2: Find a quadratic non-residue n modulo p.

Step 3: Let z, x, b, r ∈ Z be such that

z ≡ nq (mod p),

x ≡ aq+12 (mod p),

b ≡ aq (mod p),

r = e.

Step 4: If b ≡ 1 (mod p), then output x and terminate the algorithm. Elseiterate the following until we obtain b ≡ 1 (mod p).

• Find the least m ≥ 1 such that b2m ≡ 1 (mod p). If m = r, then

output the message “a is a quadratic non-residue” and stop. Elsem < r.• Let t = z2

r−m−1(mod p), and let

x← tx (mod p),

b← bt2 (mod p)

z ← t2 (mod p),

r ← m.

(note: here “a← b” denotes “replace a by b”. )

Example 4.2. Find x ∈ Z such that x2 ≡ 22 (mod 97).

Step 1: Write p− 1 = 96 = 25 · 3. Let e = 5, q = 3 and a = 22.

Step 2: We choose n = 5, since(5

97

)=

(97

5

)=

(2

5

)= −1.

2. FERMAT’S FACTORIZATION 28

Step 3: Let

z = 53 ≡ 28 (mod 97),

x = 222 ≡ 96 ≡ −1 (mod 97),

b = 223 ≡ −22 ≡ 75 (mod 97),

r = 5.

Step 4: Since b 6≡ 1 (mod 97), we compute the least m ≥ 1 such that b2m ≡ 1

(mod 97). Since

b2 = 226 = (222)3 ≡ −1 (mod 97),

b4 ≡ 1 (mod 97),

we have m = 2 < 5 = r. Thus we proceed the iteration. Let

t = z2r−m−1

= (282)2 ≡ 284 ≡ 64 (mod 97),

x← tx = 64 · 96 ≡ −64 ≡ 33 (mod 97),

b← bt2 = 75 · (64)2 ≡ 1 (mod 97).

Since b ≡ 1 (mod 97), the process terminates and we get that x ≡ 33(mod 97) is a solution. Thus the two square roots are 33 and −33 = 64modulo 97.

2. Fermat’s Factorization

A naive way to find a factor of given number n is to check whether itis divisible by any prime p with 1 < p ≤

√n. The first improvement over

this naive method was given by Fermat which is essentially based on thefollowing lemma.

Lemma 4.2. Every positive odd number n can be written as n = x2 − y2,where x, y ∈ Z and x > y ≥ 0.

Proof. Write n = ab where 1 ≤ b ≤ a. Then we can write

n =

(a+ b

2

)2

−(a− b

2

)2

.

Since n is odd, both a and b are odd. Hence both a+b2 and a−b

2 are integers.

It is clear that a+b2 > a−b

2 ≥ 0. �

Exercise: Let n be a positive odd integer. Show that if n = x2 − y2 wherex, y ∈ Z and x > y ≥ 0, then d

√ne ≤ x ≤ n+1

2 .

Algorithm 4.3 (Fermet’s factorization method). Given a positive odd in-teger n, we want to find x and y such that n = x2 − y2. To do this weconsider the values x2−n as x varies between d

√ne and n+1

2 until we obtain

a value of x such that x2 − n is a square.

Step 1: Let s = d√ne (i.e. s is the smallest integer such that s2 ≥ n).

3. QUADRATIC SIEVE 29

Step 2: Define f(d) = (s+ d)2 − n.

Step 3: Iterate d = 0, 1, 2, . . . until f(d) is a square. Note that the process

will stop once we have s+d = n+12 , since then f(d) =

(n+12

)2−n =(n−12

)2.

• If we find d such that s + d < n+12 and f(d) is a square, say y2,

then f(d) = (s + d)2 − y2 = (s + d − y)(s + d + y) gives a non-trivial factorization of n (by exercise above), showing that n is acomposite.• Otherwise, n is a prime.

Example 4.3. Let n = 221. Let s = d√ne = 15 (since 152 = 225). So set

f(d) = (15 + d)2 − 221. Then

f(0) = (152 + 0)− 221 = 225− 221 = 4 = 22.

Thus 152 − 221 = 22, i.e. 221 = 152 − 22 = (15− 2)(15 + 2) = 13 · 17.

3. Quadratic Sieve

Let n ≥ 1 be composite. In order to factor n, in Fermat’s factorizationmethod we look for integers x and y such that n = x2−y2. In other words, wesearch for an integer x such that x2−n is a square, where d

√ne ≤ x ≤ n+1

2 .But as n becomes large, this process becomes slow as we need to try manyvalues of x. We can instead look for x and y such that x2−y2 ≡ 0 (mod n),i.e., (x − y)(x + y) ≡ 0 (mod n). If x 6≡ ±y (mod n), then it follows thatgcd(x− y, n) and gcd(x+ y, n) are proper divisors of n. Hence we obtain anontrivial factorization of n. Using this idea, we have the following method.

Algorithm 4.4 (Quadratic sieve). Let n ≥ 1 be composite.

Step 1: Pick a set of primes B = {p1, p2, · · · , pr} such that n is a squaremod pi for all i = 1, 2, · · · , r. Such a set B is called a factor base. A numberis called B-smooth if all of its factors are in B.

Step 2: Let s = d√ne and set f(d) = (s + d)2 − n. Set a “sieve in-

terval” [−D,D]. We will be looking at f(d) for −D ≤ d ≤ D. LetV = [f(−D), · · · , f(0), · · · f(D)]. We want to find δ1, δ2, . . . δt in [−D,D]such that f(δ1)f(δ2) · · · f(δt) is a square in Z. For this we would first findentries of V that are B-smooth by process of sieving.

Step 3 (Sieve): For each pi in B, apply Tonelli-Shanks algorithm to get

solutions r1 and r2 to the congruence x2 ≡ n (mod pi). Then

f(d) = (s+ d)2 − n ≡ 0 (mod pi)

⇐⇒ d ≡ r1 − s or r2 − s (mod pi)

⇐⇒ d ≡ a1 or a2 (mod pi),

where a1 = r1 − s, a2 = r2 − s. For −D ≤ d ≤ D, if d ≡ a1 or a2 (mod pi),

then divide f(d) by pordpi f(d)i and record this power in a separate list.


After repeating this for each pi, any entry of V that eventually becomes1 or −1 is a B-smooth number, let us call them f(d1), f(d2), . . . f(dk). Sincewe recorded the powers earlier, we already know the factorization of f(di),i.e., we know ordpj f(di) for each j = 1, . . . , r and i = 1, . . . , k.

Step 4: Form a k × (r + 1) matrix A such that (i, j)-th entry of A equalsordpj f(di) (mod 2) for 1 ≤ j ≤ r,{

1 if f(di) < 0,

0 if f(di) > 0.for j = r + 1.

for 1 ≤ i ≤ k. Consider the following system of linear equations over Z/2Z,

AT

x1...xk

=

0...0

.

Check that rank of AT is less than k and hence obtain a non-trivial

solution, say

y1...yk

to the above system of equations. Then∏ki=1 f(di)

yi

is a square in Z, say Y 2. Also, let X =∏ki=1(s + di)

yi . Since f(di) ≡(s+ di)

2 (mod n), we have X2 =∏ki=1(s+ di)

2yi ≡ Y 2 (mod n). Computegcd(X − Y, n) and gcd(X + Y, n) to find a proper divisor of n. If we do notfind a proper divisor, then try different relations (or change B etc.) andrepeat the process.

Example 4.4. Factorise 31309 using Quadratic Sieve.

To do this we first look for a factor base. Note that 2, 3, 5, 11 arefirst few primes such that 31309 is square modulo these primes. So let B ={2, 3, 5, 11}. Let s = d

√31309e = 177 and f(d) = (177 + d)2 − 31309. Con-

sider sieve interval [0, 5] and let V = [f(0), f(1), f(2), f(3), f(4), f(5)] =[20, 375, 732, 1091, 1452, 1815].

For p = 2, clearly 1 is only solution to x2 ≡ 31309 (mod 2). So f(d) ≡ 0(mod 2) ⇔ d ≡ 1 − 177 ≡ 0 (mod 2). Dividing each even indexed entry ofV by highest power of 2, we get [5, 375, 183, 1091, 363, 1815].

Next for p = 3, we get 1 and 2 are square-roots of 31309 modulo 3. Sof(d) ≡ 0 (mod 3)⇔ d ≡ 1−177 or 2−177 ≡ 1 or 2 (mod 3). Dividing eachsuch entry of V by highest power of 3, we get [5, 125, 61, 1091, 121, 605].

Next for p = 5, we get 2 and 3 are square-roots of 31309 modulo 5. Sof(d) ≡ 0 (mod 5)⇔ d ≡ 2−177 or 3−177 ≡ 0 or 1 (mod 5). Dividing eachsuch entry of V by highest power of 5 now, we get [1, 1, 61, 1091, 121, 121].

Finally for p = 11, we get 5 and 6 are square-roots of 31309 modulo5. So f(d) ≡ 0 (mod 5) ⇔ d ≡ 5 − 177 or 6 − 177 ≡ 4 or 5 (mod 11).


Dividing each such entry of V by highest power of 11 now, we finally get[1, 1, 61, 1091, 1, 1].

Thus f(0), f(1), f(4) and f(5) are B-smooth and

f(0) = 22 · 5, f(1) = 3 · 53, f(4) = 22 · 3 · 112, f(5) = 3 · 5 · 112

which we can record as the following table.

2 3 5 11 -1f(0) 2 0 1 0 0f(1) 0 1 3 0 0f(4) 2 1 0 1 0f(5) 0 1 1 2 0

So we want to find a non-trivial solution for the following system:0 0 0 00 1 1 11 1 0 10 0 1 00 0 0 0

x1x2x3x4

=

0000

.

Clearly rank of the LHS matrix above is less than 4 and

0101

is a non-

trivial solution. Thus f(1)f(5) is a square and f(1)f(5) = 32 · 54 · 112.Hence 32 · 54 · 112 = f(1)f(5) ≡ (1 + 177)2(5 + 177)2 (mod 31309) andgcd(178 · 182 + 3 · 52 · 11, 31309) = gcd(33221, 31309) = 239. Furthergcd(178 ·182−3 ·52 ·11, 31309) = gcd(31571, 31309) = 131. Indeed 31309 =239 · 131.

Reading Suggestion: A Tale of Two Sieves, Carl Pomerance.This article is an excellent reading on history of factoring algorithms andprovides a beautiful exposition of Quadratic and Number Field Sieve.

CHAPTER 5

Introduction to Public Key Cryptography

Aims of cryptography

(1) Privacy: Encryption and Decryption of a message.(2) Authenticity: Digital Signature.

Suppose Alice wants to send Bob a secret message. One way is to usesymmetric key cryptography:

(key) Alicelocked message−−−−−−−−−→ Bob (key)

Here they share keys for the same lock. For this they need to share theirkeys in advance.

Another way is to use Public key cryptography. The following arethe main steps:

Step 0 (KeyGen): Alice and Bob generate their own public key PK and privatekey SK. Public keys are known to everyone, but the private key is keptsecretly to each person.

Now suppose Alice wants to send Bob a message M.

Step 1 (Encription): Alice obtains Bob’s public key PK. Alice uses it toencrypt M and sends the ciphertext C to Bob.

Step 2 (Decryption): Bob uses his SK to decrypt C to obtain the originalmessage M.

In other words, we need to have the system

Decrypt(Encrypt(M, PK), SK) = M.

Here Step 1 and Step 2 are for private communication, but there is afollowing problem. Since the public key PK is known to everyone, some thirdperson may disguise as Alice to send a message to Bob. But Bob wants tomake sure that the message is indeed sent by Alice. This leads to the useof Digital signature: Alice creates a signature s by using her SK and sendit together with C. Bob then uses Alice’s PK to authenticate the signature s

and hence confirms that the ciphertext C is indeed sent by Alice.We will look at the following two public key cryptosystem:

(1) RSA(2) Diffie-Hellman key exchange

32

1. RSA 33

1. RSA

KeyGen: Each user chooses two large primes p and q of similar size. LetN = pq. Then ϕ(N) = (p−1)(q−1). Choose e ∈ Z such that gcd(e, ϕ(N)) =1. Compute d ∈ Z such that ed ≡ 1 (mod ϕ(N)). Here it generates

• Public key PK = (N, e),• Private key SK = d.

Encrypt(M, (N, e)): Let (N, e) be the PK of the recipient, Bob. Let M bea message and assume M < N . Compute C = Me (mod N) and transmitciphertext C.

Decrypt(C, d): Recipient Bob uses his SK d to compute Cd (mod N) andgets M , since

Ce ≡ M (mod N)

by Lemma 5.1 below and since M < N .

Lemma 5.1. If gcd(M, N) = 1, then Med ≡ M (mod N). If gcd(M, N) 6= 1,then also Med ≡ M (mod N).

Proof. If gcd(M, N) = 1, then we have Mϕ(N) ≡ 1 (mod N). Sinceed ≡ 1 (mod ϕ(N)),

Med ≡ M1+ϕ(N)t (mod N) for some t ∈ Z

≡ M(Mϕ(N))t (mod N)

≡ M (mod N).

Now suppose gcd(M, N) 6= 1. Since N = pq, we have gcd(M, N) = p, qor N . If gcd(M, N) = N , then clearly Med ≡ 0 ≡ M (mod N). Thus WLOGassume that gcd(M, N) = p. Then we can write M = pM1 for some M1 ∈ Zsuch that gcd(M1, q) = 1. Hence

Med − M ≡ pM1ed − pM1 (mod N)

≡ pM1(pM1ed−1 − 1) (mod N)

= pM1(pM1(p−1)(q−1)t − 1) (mod N)

(2)

for some t ∈ Z, where the last equality follows since ed−1 ≡ 0 (mod ϕ(N) =(p − 1)(q − 1)). Now since gcd(M1, q) = 1, we have (pM)q−1 ≡ 1 (mod q).Thus

(3) (pM1)(p−1)(q−1)t−1 ≡ 1 (mod q).

From (2) and (3), it follows that Med − M ≡ 0 (mod pq = N), i.e., Med ≡ M

(mod N). �

3. DIFFIE-HELLMAN KEY EXCHANGE 34

2. Digital Signature Using RSA

Suppose Alice is sending a message M to Bob. Alice sends her digitalsignature s together with M. Let Alice’s public key be (N, e) and her privatekey be d. She does the following:

sign(M, d):

• Compute s = Md (mod N).• Send (C, s) to Bob where C = Encrypt(M, (N, e)).

Once Bob receives (C, s), he does the following:

verify((C, s), (N, e)): Decrypt C to obatin M using his SK.

• Compute se (mod N).• Check if M ≡ se (mod N) or not. If yes, then the message is indeed

sent by Alice.

Note:

• RSA security relies on the hardness of factorization of a number Nthat is a product of two very large primes p and q of similar size.• In practise RSA is used mainly for transporting symmetric keys

and digital signature.• One would like to encrypt text message in alphabet. To do that

we need to convert a text message to an integer M. One standardprocedure is to use the following assignment:

A = 00, B = 01, C = 03, · · ·Z = 25.

The integer M so obtained will be most likely greater than N . SoM is partitioned into smaller blocks: M = M1M2...Mr such that eachMi < N .• In 1994, RSA-129 (129-digit N) was factorized by using Quadratic

sieve method with a factor base consisting 524339 primes, using1600 computers.• Commonly used RSA uses N that is ranging from 1024-bits to 2048-

bits (i.e. 309− 617 decimal digits). In 2009, RSA-768 (768-bits N ,i.e., 232-digit N) is broken. There is a US$ 200000 cash prize forbreaking RSA-2048!!

3. Diffie-Hellman Key Exchange

Suppose Alice and Bob want to share a secret random key K. Assumethat they know a group G and g ∈ G of order r.

• Alice chooses random 0 < a < r and sends c1 = ga to Bob.• Bob chooses random 0 < b < r and sends c2 = gb to Alice.• Alice compute (c2)

a = gab and Bob computes (c1)b = gab. Then

K = gab will be their shared key.


Note that unlike RSA, all users share same group G and g ∈ G, but valueof shared key K is randomised.

This is based on the discrete logarithm problem(DLP): given g andga, find a. In this case, the problem is to find gab given the triplet (g, ga, gb).Thus for Diffie-Hellman key exchange one needs to consider groups for whichDLP is hard.

Note: 1. For G = (Z/pZ,+) for some some prime p, DLP is easy sinceit is Euclidean algorithm.2. If N is composite then DLP for G = ((Z/NZ)×, ·) is at least as hard asfactoring.Exercise. Suppose N is a product of two primes. Suppose you know DLPfor ((Z/NZ)×, ·) then find factors of N .

Elgamal Encryption System.

Suppose Alice wants to send message M to Bob.KeyGen: Choose a prime p, and let G = Up = (Z/pZ)×. Compute theprimitive root g of G. Bon chooses a random b such that 2 ≤ b ≤ p− 2 andcompute h ≡ gb (mod p), and generates his

• Public key PK = (p, g, h),• Private key SK = b.

Encrypt(M, PK): Let M be a message and assume that M ≤ p− 1.

• Alice obtains Bob’s PK = (p, g, h).• Alice chooses a random 2 ≤ a ≤ p − 2 and computes c1 = ga

(mod p) and c2 = Mha (mod p).• Alice sends ciphertext (c1, c2) to Bob.

Decrypt((c1, c2), b): Bob uses his secret key to compute c2cp−1−b1 to obtain

M since

c2cp−1−b1 = Mha(ga)p−1−b

= Mgbaga(p−1)−ab

≡ M(g(p−1))a (mod p)

≡ M (mod p).

Note that Elgamal encryption is Diffie-Hellman key exchange followed bysymmetric encryption. In the above both Alice and Bob computes sharedkey K = gab which is then used for symmetric encryption M → MK by Al-ice. Bob then computes (MK)K−1 to obtain M. This is an example of hybridencryption.

Digital signature using Elgamal: Bob wants to make sure that themessage M is indeed coming from Alice. Alice uses her PK = (p, g, h′) whereh′ = ga and SK = a to create a signature as follows.

• Alice chooses a random 1 ≤ j ≤ p− 1 such that gcd(j, p− 1) = 1.• Compute s = gj .


• Compute t such that jt + as ≡ M (mod p− 1) with 1 ≤ t ≤ p− 2.The pair (s, t) is the signature for message M.• Alice sends Bob (C, (s, t)) where C is a ciphertext for M that Alice

obtains using Bob’s PK.• Bob decrypt C using his SK to get M. He computes

V1 = h′sst (mod p),

V2 = gM (mod p).

The signature is verified when V1 = V2. Note that this equalityshould hold since

V1 = h′sst = gasgjt

= gas+jt

≡ gM+(p−1)t (mod p) (for some t ∈ Z)

≡ gM (mod p)

= V2.

Example 5.1. Alice and Bob share a group (Z/pZ)× where p = 43 andg = 3. Bob’s PK is (43, 3, 22) and his SK is b = 15 (note that 322 ≡ 15(mod 43)). Alice wants to send “HELP” = 07041115 by separating into 4blocks B1 = 07, B2 = 04, B3 = 11 and B4 = 15. Alice chooses a randoma = 23 and calculates c1 = 323 ≡ 34 (mod 43). Then they have shared keyK = 2223 ≡ 32 (mod 43). Alice then computes BiK for each i, so

B1K = 7 · 32 ≡ 9 (mod 43),

B2K = 4 · 32 ≡ 42 (mod 43),

B3K = 11 · 32 ≡ 8 (mod 43),

B4K = 15 · 32 ≡ 7 (mod 43).

Alice sends c2 = (34, 9), (34, 42), (34, 8), (34, 7) to Bob. Bob then computes34p−1−15 = 3427 ≡ 39 (mod 43) and gets the message back;

9 · 39 ≡ 7 (mod 43),

42 · 39 ≡ 4 (mod 43),

8 · 39 ≡ 11 (mod 43),

7 · 39 ≡ 15 (mod 43).

Finally, by concatinating the numbers Bob obtains 07041115 =“HELP”.

CHAPTER 6

p-adic numbers

1. Congruences modulo pm

We looked at congruences of the form

x2 ≡ a (mod p)

where p is prime and a ∈ Z, and we saw how to solve such congruencesmodulo p. We now consider congruences of the form

x2 ≡ a (mod pm) for m ≥ 2,

i.e., modulo higher powers of p.

Example 6.1. Consider the polynomial f(x) = x2 + 1. Note that there isno solution over Z. Let us now look at f(x) modulo 5. It is clear that

f(x) ≡ 0 (mod 5)⇐⇒ x2 ≡ −1 (mod 5)⇐⇒ x ≡ 2 or 3 (mod 5).

Now consider the congruence

x2 ≡ −1 (mod 52).

Since any solution to the above congruence must also satisfy x2 ≡ −1(mod 5), it is of the form x = 2 + 5t1 or 3 + 5t1 for some t1 ∈ Z.

Suppose x = 2 + 5t1 for some t1 ∈ Z. Then

x2 = (2 + 5t1)2 ≡ −1 (mod 52)⇐⇒ 4 + 20t1 + (5t21) ≡ −1 (mod 52)

⇐⇒ 5 + 20t1 ≡ 0 (mod 52)

⇐⇒ 5(1 + 4t1) ≡ 0 (mod 52)

⇐⇒ 1 + 4t1 ≡ 0 (mod 5)

⇐⇒ t1 ≡ 1 (mod 5).

Similarly, if x = 3 + 5t1 for some t1 ∈ Z, then we get t1 ≡ 3 (mod 5).Therefore

x2 ≡ −1 (mod 52)⇐⇒ x ≡ 7 or 18 (mod 52).

Now consider the congruence

x2 ≡ −1 (mod 53).

Since any solution to this congruence must also satisfy x2 ≡ −1 (mod 52), itis of the form x = 7+25t2 or 18+25t2 for some t2 ∈ Z. Suppose x = 7+25t2,

37

1. CONGRUENCES MODULO pm 38

then

x2 = (7 + 25t2)2 ≡ −1 (mod 53)

⇐⇒ 49 + 2 · 7 · 25t2 + (25t2)2 ≡ −1 (mod 53)

⇐⇒ 50 + 50 · 7t2 ≡ 0 (mod 53)

⇐⇒ 52(2 + 14t2) ≡ 0 (mod 53)

⇐⇒ 2 + 14t2 ≡ 0 (mod 5)

⇐⇒ t2 ≡ 2 (mod 5).

Similarly, if x = 18 + 25t2 then t2 ≡ 2 (mod 5). Therefore

x2 ≡ −1 (mod 53)⇐⇒ x ≡ 57 or 68 (mod 53).

Note that if x = 2 + 5t1, then

x = 2 + 5t1

= 2 + 5(1 + 5t2) = 2 + 1 · 5 + 2 · 52t2= 2 + 1 · 5 + 52(2 + 5t3) (for some t3 ∈ Z)

≡ 2 + 1 · 5 + 2 · 52 (mod 53).

Similarly, if x = 3 + 5t1, then

x ≡ 3 + 3 · 5 + 2 · 52 (mod 53).

Thus we can iteratively find solutions to f(x) = 0 modulo higher power of5, and from solutions at each stage we can, as above, form a power series in5 with coefficients in {0, 1, 2, 3, 4}.

In the above example, we saw how to construct solutions modulo p2,p3, . . . etc. from solutions modulo p,. The following is the general statement.

Theorem 6.1 (Hensel’s lemma). Let f(x) ∈ Z[x]. Let p be a prime andm ≥ 1 be an integer. Suppose there exists a ∈ Z such that

f(a) ≡ 0 (mod pm) and f ′(a) 6≡ 0 (mod p).

Then there exists b ∈ Z such that

b ≡ a (mod pm) and f(b) ≡ 0 (mod pm+1).

(We say that we lift a to a solution modulo pm+1.)

Proof. By the Taylor’s expansion for polynomials,

f(a+ x) = f(a) + f ′(a)x+f ′′(a)

2!x2 + · · ·+ f (n)(a)

n!xn

where n = deg(f).We want to construct b ∈ Z such that b ≡ a (mod pm) and f(b) ≡ 0

(mod pm+1), so let b = a+pmt where t is an integer we will determine below.

2. p-ADIC NORM ON Q 39

Since f(x) ∈ Z[x] implies that f (k)(x)k! ∈ Z[x] for all k ≥ 1 (exercise!), we

have

f(b) = f(a+ pmt)

= f(a) + f ′(a)pmt+f ′′(a)

2!p2mt2 + · · ·

= f(a) + f ′(a)pmt+ p2mK

for some K ∈ Z. Since f(a) ≡ 0 (mod pm) by assumption, write f(a) = pmsfor some s ∈ Z. So

f(b) = pm(s+ f ′(a)t) + p2mK.

Thusf(b) ≡ 0 (mod pm+1)⇐⇒ s+ f ′(a)t ≡ 0 (mod p).

Since f ′(a) 6≡ 0 (mod p), let h = (f ′(a))−1 (mod p). Now take t = −hs,then

f(b) = pm(s+ f ′(a)t) + p2mK ≡ 0 (mod pm+1)

as required. �

Hence using Hensel’s lemma (under certain hypothesis) repeatedly forsolving a congruence modulo pm starting from m = 1 we obtain a solutionthat is a power series of the form

a0 + a1p+ a2p2 + a3p

3 + · · · ,where 0 ≤ ai ≤ p− 1, as we saw in the above example. But does this seriesin p-power converge? In order to make this power series converge we needa setting where pm gets “smaller” in some sense as m → ∞. This leads usto define a new norm.

2. p-adic Norm on Q

First recall the function ordp. For all α, β ∈ Q, we have:

(1) ordp(pn ab

)= n where a, b ∈ Z and p - a, b, and ordp(0) =∞.

(2) α 6= 0 =⇒ α = ±∏p∈P p

ordp(α).

(3) ordp(αβ) = ordp(α) + ordp(β).(4) ordp(α + β) ≥ min{ordp(α), ordp(β)}, with equality if ordp(α) 6=

ordp(β).

Definition. The p-adic absolute value (p-adic norm) of α ∈ Q is given by

|α|p = p− ord(α) if α 6= 0, and |0|p = 0.

(Note that |0|p = pordp(0) = p−∞ = 0.)

The following are the basic properties of the p-adic norm.

Lemma 6.2. Let p be a prime and α, β ∈ Q.

(1) |α|p ≥ 0 and |α|p = 0⇐⇒ α = 0.

(2) |αβ|p = |α|p |β|p.

2. p-ADIC NORM ON Q 40

(3) |α+ β|p ≤ max{|α|p , |β|p} with equality if |α|p 6= |β|p.The inequality in (3) is called the ultrametric inequality. It is stronger

than the usual triangular inequality since (3) implies that |α+ β|p ≤ |α|p +

|β|p.

Proof. (1) Follows from the definition.(2) Follows from the property (3) of ordp.(3) We have

|α+ β|p = p− ordp(α+β)

≤ p−min{ordp(α),ordp(β)} (by property (4) of ordp)

= max{p− ordp(α), p− ordp(β)}= max{|α|p , |β|p}.

Clearly equality holds if ordp(α) 6= ordp(β), i.e., |α|p 6= |β|p. �

Hence the p-adic norm satisfies all the properties of a norm.

Note.

• {|α|p : α ∈ Q} = {0} ∪ {pn : n ∈ Z}.• ∀n ∈ Z, |n|p ≤ |1|p = 1.

• Since |pr|p = p−r, as r gets larger pr gets smaller in the p-adicnorm.

Example 6.2. Let α = −40/49. Then |α|2 = 2−3, |α|5 = 5−1, |α|7 = 72,|α|p = 1 for p 6= 2, 3, 5, 7. Then

∏p∈P |α|p = 49/40 = 1/|α|.

Lemma 6.3 (Product Formula). Let α ∈ Q \ {0}. Then |α| ·∏p∈P |α|p = 1.

Note that above product is a finite product since |α|p = 1 for all but finitelymany primes p.

Proof. Let α has the prime factorization α = ±∏p∈P p

ordp(α). Then

|α| =∏p∈P

pordp(α) =∏p∈P

1/ |α|p .

�

Let α ∈ Q. We define a disc of radius r centered at α by

Dr(α) = {γ ∈ Q : |γ − α|p < r}.The ultrametric property of p-adic norm implies that any point inside sucha disc is one of its center!

Lemma 6.4. If β ∈ Dr(α) then Dr(α) = Dr(β).

Proof. Let γ ∈ Dr(α). Then

|γ − β|p = |γ − α+ α− β|p ≤ max{|γ − α|p , |α− β|p} < r

since β, γ ∈ Dr(α). Hence γ ∈ Dr(β). Similarly Dr(β) ⊂ Dr(α). �

3. SEQUENCES AND SERIES 41

3. Sequences and Series

We have defined the p-adic norm and so we can talk about sequencesand series, their convergence, cauchy property etc. with repect to this norm.

Definition. A sequence (an)n of rational numbers converges p-adically to ain Q if ∀ε > 0, ∃ N0 ∈ N such that n ≥ N0 =⇒ |an − a|p < ε. Equivalently,

|an − a|p → 0 as n→∞.

Definition. A series∑∞

j=1 aj of rational numbers converges p-adically if

the sequence of partial sums sn =∑n

j=1 aj converges p-adically.

Definition. A sequence (an)n of rational numbers is called p-adically nullif |an|p → 0 as n→∞.

Example 6.3. (1) Let a ∈ Q. Then constant sequence (a)n converges to ap-adically.(2) The sequence (a+ pn)n converges p-adically to a since

|a+ pn − a|p = |pn|p =1

pn→ 0 as n→∞.

(3) ( pn

pn+1)n →p 0 with |·|p. Note this converges to 1 with the usual absolute

value | · |.(4) Consider the series 1 + c+ c2 + · · · . If |c|p < 1, then

1 + c+ c2 + · · · →p1

1− c.

Proof. Let sn = 1 + c+ c2 + · · ·+ cn−1 = 1−cn1−c . Then since |c|p ≤

1p ,∣∣∣∣sn − 1

1− c

∣∣∣∣p

=

∣∣∣∣ −cn1− c

∣∣∣∣p

= |c|np ≤1

pn→ 0 as m→∞.

�

What happens if |c|p ≥ 1?

Definition. A sequence (an)n is p-adically cauchy if ∀ε > 0, ∃N > 0 suchthat n,m ≥ N =⇒ |an − am|p < ε.

Proposition 6.5. A sequence (an)n of rational numbers is p-adically cauchyiff the sequence (bn)n := (an+1−an)n is p-adically null, i.e., |an+1 − an|p →0 as n→∞.

Example 6.4. This is not true for (R, | · |) where | · | is the usual absolutevalue. For example consider the sequence (an)n = (

√n)n. Then

|an+1 − an| = |√n+ 1−

√n| =

∣∣∣∣ 1√n+ 1

+1√n

∣∣∣∣→ 0 as n→∞.

But clearly (an)n is not a cauchy sequence since√n→∞ as n→∞.

4. CONSTRUCTION OF Qp AND COMPLETENESS 42

Proof. (=⇒) Clear.(⇐=) Suppose |an+1 − an|p → 0 as n → ∞. Let ε > 0 and let N0 ∈ N be

such that n ≥ N0 =⇒ |an+1 − an|p < ε. Then for m > n ≥ N0,

|am − an|p = |am − am−1 + am−1 − am−2 + am−2 + · · ·+ an+1 − an|p≤ max{|am − am−1|p , |am−1 − am−2|p , · · · , |an+1 − an|p}< ε.

Hence (an)n is p-adically cauchy. �

Lemma 6.6. If a sequence (an)n where an ∈ Q is p-adically convergent,then it is p-adically cauchy.

Proof. Suppose an →p a ∈ Q. Then

|am − an|p = |am − a+ a− an|p ≤ max{|am − a|p , |an − a|p} → 0

as n,m→∞. �

Does being p-adically cauchy imply being p-adically convergent in Q?Recall the first example with f(x) = x2 + 1 ≡ 0 (mod 5m). Using Hensel’slemma repeatedly, we can construct a sequence

a1 = 2, a2 = 2 + 1 · 5, a3 = 2 + 1 · 5 + 2 · 52, · · ·

which is 5-adically cauchy and a2n →5 −1. Since√−1 6∈ Q, (an)n does not

converge 5-adically in Q.We have similar problems when we consider cauchy sequences of rational

numbers and their convergence with respect to usual norm | · |. Recall thatthe cauchy sequence ((1 + 1

n)n)n → e ∈ R \Q. But what is e? Since e is the

limit of ((1 + 1n)n)n, one way would be to think it as the cauchy sequence

itself. But we also have 1 + 11! + 1

2! + 13! + · · · → e. So e would also be equal

to the sequence (∑n

k=01k!)n. To avoid ambiguity, we consider the sequences

((1 + 1n)n)n and (

∑nk=0

1k!)n to be equivalent! We will follow the same idea

in the p-adic setting.

4. Construction of Qp and Completeness

Let C be the set of p-adically cauchy sequence of rational numbers.Define ∼ on C by

(an)n ∼ (bn)n ⇐⇒ (bn − an)n is p-adically null.

This is clearly an equivalence relation:

• (reflexitive) (an − an)n = (0)n → 0 as n→∞.• (symmetic) (an − bn)n → 0 =⇒ (bn − an)n → 0.• (transitive) (an− bn)n → 0 and (bn− cn)n → 0 =⇒ (an− cn)n → 0

by the triangle inequality.


Definition. Define the p-adic numbers Qp to be the set of equivalenceclasses of C under ∼, i.e.,

Qp = C/ ∼

and we denote the class of (an)n ∈ C by [(an)n].

Note.

(1) Both the zero sequence (0)n in Q and (pn)n are p-adically null.Thus in Qp, [(0)n] = [(pn)n] = [any p-adically null sequence].

(2) Let a ∈ Q and consider the constant sequence (a)n. Then [(a)n] ∈Qp. We can thus identify Q as a subset of Qp via the injectiona 7→ [(a)n].

(3) Let (an)n ∈ C be such that an →p a ∈ Q. Then [(an)n] = [(a)n] inQp, since |an − a|p → 0 as n→∞.

Now we define operations on Qp. Note that if (an)n, (bn)n ∈ C then so are(an + bn)n and (anbn)n. Define +, · on Qp as follows.

• Addition: [(an)n] + [(bn)n] := [(an + bn)n].• Multiplication: [(an)n] · [(bn)n] := [(anbn)n].• Additive identity: 0 := [(0)n]• Multiplicative identity: 1 := [(1)n]

Exercise. (1) For the above defined addition and multiplication to be well-defined check that if (an)n ∼ (a′n)n and (bn)n ∼ (b′n)n then (an + bn)n ∼(a′n + b′n)n and (anbn)n ∼ (a′nb

′n)n.

(2) Check that Qp is a commutative ring with +, ·.Next we want to define the division in Qp. That is, we want to define

[(an)n]/[(bn)n] where [(bn)n] 6= [(0)n], i.e., (bn)n is not p-adically null. Notethat we cannot simply define [(an)n]/[(bn)n] := [(anbn )n] since some of the

terms of (bn)n may be zero. But we can show that such (bn)n contains onlyfinitely many zero terms. We start with the following lemma.

Lemma 6.7. Let (an)n be a p-adically cauchy sequence of rational numbers.Then (|an|p)n converges in the usual norm | · | in R to some element in

{0} ∪ {pr | r ∈ Z}.

Proof. It is easy to see that any cauchy sequence whose terms are in{0}∪{pr | r ∈ Z} converges in |·| to some element in the same set (exercise).Thus it is enough to show that (|an|p)n is cauchy sequence with the norm

| · |.Since (an)n is p-adically cauchy, ∀ε > 0, ∃ N0 ∈ N such that m,n ≥

N0 =⇒ |am − an|p < ε. Thus ∀ m,n ≥ N0 we get

| |am|p − |an|p | ≤ |am − an|p < ε,

where the first inequality follows from the triangular inequality of |·|p. Hence

(|an|p)n is cauchy. �


Corollary 6.8. Let (bn)n be a p-adically cauchy sequence of rational num-bers that is not p-adically null. Then (bn)n contains only finitely many termsthat are zero.

Proof. By Lemma 6.7, (|bn|p)n has a limit, and (|bn|p)n 6→ 0 since

(bn)n 6→p 0. If (bn)n contains infinitely many zero terms, then (|bn|p)n also

contains infinitely many zero terms. So (|bn|p)n has a subsequence that

converges to 0, which is a contradiction to the fact (|bn|p)n converges to anonzero limit. �

Now we are ready to define the division in Qp. Let (an)n, (bn)n ∈ Qp

where (bn)n is not p-adically null. Then by the corollary above, ∃ N0 ∈ Nsuch that bn 6= 0 for all n ≥ N0. Define

cn :=

{anbn

if n ≥ N0,

xn if n < N0 (can take any xn ∈ Q).

Note that [(bn)n] · [(cn)n] = [(an)n] since (bncn − an)n →p 0 as n → ∞.Hence we can define

[(an)n]/[(bn)n] := [(cn)n].

Thus we have shown that every non-zero element in Qp has a multiplicativeinverse and so we have the following theorem.

Theorem 6.9. (Qp,+, ·) is a field containing Q as a subfield.

Now we extend the p-adic norm that we defined on Q to Qp.

Definition. Let α ∈ Qp. Then α = [(an)n] for some (an)n ∈ C, the set ofp-adically cauchy sequences of rational numbers. We define

|α|p := limn→∞

|an|p(Note that the limit is taken with the usual norm in R).

By Lemma 6.7, the limit in RHS exists and is in {0} ∪ {pr | r ∈ Z}. Weneed to show that the above definition of norm on Qp is well-defined, i.e.,

α = [(an)n] = [(bn)n] ∈ Qp =⇒ limn→∞

|an|p = limn→∞

|bn|p .

Proof. We have

|an|p = |an − bn + bn|p ≤ max{|an − bn|p , |bn|p}.Recall that [(an)n] = [(bn)n] =⇒ (|an − bn|p)n → 0 as n → ∞. So if

(|bn|p)n → 0 as n → ∞, then (|an|p)n → 0 and we are done. Now suppose

(|bn|p)n 6→ 0 as n → ∞ Then ∃ N0 ∈ N such that ∀n ≥ N0, |an − bn|p <|bn|p. Hence |an|p = |bn|p for all n ≥ N0 and we are done. �

Thus for any α ∈ Qp, |α|p = 0 or pn for some n ∈ Z. One can check that

|·|p in Qp is indeed a norm, i.e., for all α, β ∈ Qp,

(1) |α|p ≥ 0 and |α|p = 0 iff α = 0.

(2) |αβ|p = |α|p |β|p.


(3) |α+ β|p ≤ max{|α|p , |β|p} with equality if |α|p 6= |β|p.

Proof. Exercise (use Lemma 6.2 and definition of |·|p). �

Since we have now defined a norm in Qp, we can talk of cauchy or convergentsequences in Qp with respect to this norm. In fact if α ∈ Qp is representedas α = [(an)n] where (an)n ∈ C, then we have an →p α as n → ∞ withrespect to above defined norm in Qp since

|an − α|p = limm→∞

|an − am|p → 0 as n→∞.

Now we can prove the following theorem.

Theorem 6.10. Qp is complete with |·|p. That is, a sequence in Qp iscauchy if and only if the sequence is convergent in Qp.

Proof. (⇐=) Easy and left as an exercise.(=⇒) Let (αn)n be a cauchy sequence in Qp, i.e., αn ∈ Qp for all n. Letαn = [(an,m)m] where (an,m)m is a p-adically cauchy sequence in Q. Wewant to show that ∃ α ∈ Qp such that |αn − α|p → 0 as n→∞, where |·|pis now the extended norm in Qp.

For each n, since (an,m)m is a p-adically cauchy, ∃ Nn ∈ N such thatr, s ≥ Nn =⇒ |an,r − an,s|p <

1n . Now let m ≥ Nn, then

|αn − an,m|p = limr→∞

|an,r − an,m| ≤1

n.

We now construct α := [(cn)n] ∈ Qp such that αn →p α with the norm inQp.

For each n, choose k(n) > Nn such that k(1) < k(2) < k(3) · · · . Let(cn)n be a sequence in Q such that cn = an,k(n).

We will show that (cn)n is p-adically cauchy.Given ε > 0, since (αn)n is a cauchy sequence in Qp, ∃ N ′ such that

n1, n2 > N ′ =⇒ |α1 − α2|p < ε. Then by taking N0 = max{N ′, 1ε}, forn1, n2 > N0, we have

|cn1 − cn2 |p =∣∣an1,k(n1) − an2,k(n2)

∣∣p

≤ max{∣∣an1,k(n1) − αn1

∣∣p, |αn1 − αn2 |p ,

∣∣αn2 − an2,k(n2)

∣∣p}

≤ max{ 1

n1, |αn1 − αn2 |p ,

1

n2}

< max{ε, ε, ε} = ε.

Therefore (cn)n is p-adically cauchy.Finally, it remains to show that αn →p α = [(cn)n] as n → ∞ in Qp-

norm.Let ε > 0. Since (cn)n is p-adically cauchy, ∃ M0 > 2/ε such that

m,n > M0 =⇒ |cm − cn|p < ε/2. Thus for all n > M0 we have

|α− cn|p = limm→∞

|cm − cn|p ≤ ε/2 < ε,


and |cn − αn|p =∣∣an,k(n) − αn∣∣p < 1/n < 1/M0 < ε/2. Hence by ultrametric

inequality for all n > M0

|α− αn|p ≤ max{|α− cn|p , |cn − αn|p} < ε

and we are done. �

Note that if αn →p α in Qp norm then |αn|p → |α|p in the usual norm |·|.Also you can show as before that a sequence (αn)n now in Qp is p-adicallycauchy if and only if the sequence (αn − αn−1)n →p 0 as n→∞.

We have now the following theorem for convergence of series in Qp.

Theorem 6.11. A series∑∞

n=1 αn where αn ∈ Qp converges p-adically inQp iff αn →p 0 as n→∞ in Qp.

Proof. (=⇒) Same proof as with the usual norm and left as an exer-cise.(⇐=) To show

∑∞n=1 αn converges p-adically in Qp, it is enough to show by

completeness of Qp (Theorem 6.10) that∑∞

n=1 αn is p-adically cauchy in Qp.So we need to show that the sequence of partial sums (sn)n = (

∑ni=1 αi)n is

p-adically cauchy. Now

sn − sn−1 = αn →p 0 as n→∞

in Qp. Hence by the note above (sn)n is p-adically cauchy in Qp. �

It is now easy to see for example that∑∞

n=1 npn converges in Qp.

Definition. The set of p-adic integers Zp is the unit disc in Qp centered at0, that is,

Zp := {α ∈ Qp : |α|p ≤ 1}.

Proposition 6.12. Zp is a ring and contains Z as a subring. The group ofunits of Zp is

Z×p = {α ∈ Zp : |α|p = 1}.

Proof. Enough to show Zp is closed under +, · . Now for any α, β ∈ Zp,

|α+ β|p ≤ max{|α|p , |β|p} ≤ 1, |α · β|p = |α|p |β|p ≤ 1.

If α ∈ Z then ordp(α) ≥ 0 and hence |α|p ≤ 1. Hence Z ⊂ Zp. Supposeα ∈ Zp is invertible then there exists β ∈ Zp such that α · β = 1, hence|α|p |β|p = |α · β|p = 1. Since |α|p ≤ 1 and |β|p ≤ 1, we must have |α|p =

|β|p = 1. �

Theorem 6.13. An element α ∈ Qp is a p-adic integer iff it is a p-adiclimit of a cauchy sequence of integers.

Proof. (⇐=) Suppose α be a p-adic limit of cauchy sequence of integers(an)n. Since |an|p ≤ 1 for all n, we have

|α|p = limn→∞

|an|p ≤ 1

5. p-ADIC DIGIT EXPANSION 47

and so α ∈ Qp.(=⇒) Suppose α ∈ Qp be a p-adic integer. Then α = [(an)n] for some p-adically cauchy sequence (an)n of rational numbers. We need to constructa sequence (bn)n of integers such that (an)n ∼ (bn)n and hence α = [(bn)n]which is what we want. Since an →p α in Qp we have limn→∞ |an|p =

|α|p ≤ 1 and so ∃ N0 such that ∀n ≥ N0, |an|p ≤ 1, i.e., ordp(an) ≥ 0. If

an = un/vn in the least form where un, vn ∈ Z then ∀n ≥ N0, p - vn. Hencefor each such n there exists wn ∈ Z such that vnwn ≡ 1 (mod pn). Now forall n ≥ N0 let bn = unwn ∈ Z. Then

an = un/vn ≡ unwn = bn (mod pn) ∀n ≥ N0

and hence (an)n ∼ (bn)n. �

In the above proof we can in fact choose bn ∈ Z such that bn ≥ 0. Thus ap-adic integer is a p-adic limit of a cauchy sequence of non-negative integers.

5. p-adic Digit Expansion

We would now like to describe Qp explicitly using p-adic digit expansion,

that is given α ∈ Qp we want to write α as α−kp−k +α1−kp

1−k + · · ·+α0 +α1p+ α2p

2 + · · · where 0 ≤ αi ≤ p− 1.First consider α ∈ Zp. By Theorem 6.13 there exists a sequence of non-

negative integers (bn)n such that |α− bn|p → 0 as n→∞. Let B0 := bn0 be

such that |α−B0|p < 1. We can write B0 = pC0 + α0 for some C0, α0 ∈ Zsuch that 0 ≤ α0 ≤ p − 1. Then |α− α0|p ≤ max{|α−B0|p , |pC0|p} < 1.Thus we can find α0 satisfying 0 ≤ α0 ≤ p− 1 such that

|α− α0|p ≤1

p< 1

and soα− α0

p∈ Zp.

Repeating the above steps we can now find α1 such that 0 ≤ α1 ≤ p− 1such that ∣∣∣∣α− α0

p− α1

∣∣∣∣p

< 1 i.e., |α− (α0 + pα1)|p <1

p.

Repeating this process we get a sequence αn ∈ Z such that 0 ≤ αn ≤ p− 1and ∣∣α− (α0 + α1 + α2p

2 + · · ·+ αnpn)∣∣p<

1

pn.

Hence∑∞

i=0 αipi →p α in Qp and we can write

α = α0 + α1 + α2p2 + · · ·

6. HENSEL’S LEMMA OVER Zp 48

and we call it the p-adic expansion of α ∈ Zp. Now consider α ∈ Qp such

that |α|p = pk for some k > 0. Let β = pkα then |β|p = 1 and so β ∈ Zpand has p-adic expansion

β = β0 + β1 + β2p2 + · · · , 0 ≤ βi ≤ p− 1.

Hence

α = β0p−k + β1p

−k+1 + β2p−k+2 + · · · .

So we have proved the following theorem except for the uniqueness that isleft as an exercise!

Theorem 6.14. Every p-adic number α ∈ Qp has a unique p-adic expansion

α = α−kp−k + α1−kp

1−k + · · ·+ α0 + α1p+ α2p2 + · · · , 0 ≤ αi ≤ p− 1.

If α ∈ Zp then α−r = 0 for all r > 0.

So in order to compute a p-adic digit expansion of α ∈ Zp we needto compute α modulo higher powers of p repeatedly. For example, 3-adicexpansion of −1 is

−1 = 2 + 2.3 + 2.32 + 2.33 + · · · .

In fact after computing −1 (mod 3m) for first few m, we check that indeed2+2.3+2.32+ · · · →3

21−3 = −1, which proves that it is the 3-adic expansion

of −1. Here is another way of getting this!

1 + (2 + 2.3 + 2.32 + · · · )= 0 + 1.3 + 2.3 + 2.32 + · · ·= 0 + 0 + 1.32 + 2.32 + · · · = · · · = 0.

6. Hensel’s Lemma over ZpWe need the following lemma which is an analogue of polynomials in

Z[x].

Lemma 6.15. Let f(x) ∈ Zp[x] be a polynomial of degree n. Note that if

f(x) =∑n

k=0 akxk then derivative of f(x) is f ′(x) =

∑nk=1 kakx

k−1. Thus

one can recursively define f (r)(x).

(a) For b ∈ Zp show that f(b+ x) =∑n

k=0f (k)(b)xk

k! .

(b) Show that f (r)(x)r! ∈ Zp[x].

Proof. The proof is left as an exercise! You can first give an algebraicproof for the statements for f(x) ∈ Z[x] and observe that exactly samesteps work for f(x) ∈ Zp[x]. We only need that Zp is a commmutative ringwith unity and has zero characteristic (i.e., 1 + 1 + · · ·+ 1︸︷︷︸

n times

6= 0 in Zp for all

n ∈ N). �


Theorem 6.16 (Hensel’s Lemma). Let f(x) ∈ Zp[x]. Let α0 ∈ Zp be suchthat

|f(α0)|p ≤1

pand

∣∣f ′(α0)∣∣p

= 1.

Then there exists a sequence (αn)n given by

αn+1 = αn − f ′(α0)−1f(αn)

such that for all n ∈ N, we have

(1) αn ∈ Zp.(2) |αn − α0|p < 1.

(3) |f(αn)|p <1pn .

Moreover, if α ∈ Zp is the p-adic limit of the p-adically cauchy sequence(αn)n, then f(α) = 0.

Before getting into the proof of the theorem let us look at how to usethis theorem.

First note that Theorem 6.16 can be applied to polynomials in Z[x] sinceZ ⊂ Zp. When applied to f(x) ∈ Z[x], it says that if ∃ α0 ∈ Z such thatf(α0) ≡ 0 (mod p) and f ′(α0) 6≡ 0 (mod p) then one can construct a p-adically cauchy sequence (αn)n of integers such that f(αn) ≡ 0 (mod pn)and αn+1 ≡ αn (mod pn). Let αn →p α ∈ Zp then f(α) = 0.

Corollary 6.17. Let b ∈ Z such that b 6= 0. Let p be an odd prime. Then

b is a square in Zp iff b = p2rc where c ∈ Z such that(cp

)= 1.

Proof. (⇐=) Let b = p2rc where(cp

)= 1. Consider f(x) = x2 − c ∈

Z[x]. Since(cp

)= 1, there exists a such that gcd(a, p) = 1 and a2 ≡ c

(mod p), i.e., f(a) ≡ 0 (mod p). Also f ′(a) = 2a 6≡ 0 (mod p) as p is oddand gcd(a, p) = 1. So by Hensel’s lemma there exists α ∈ Zp such thatf(α) = 0, i.e., α2 = c. So b = (prα)2 is a square in Zp.(=⇒) Let b = β2 for some β ∈ Zp. Also we can write b = psc where s ≥ 0

and gcd(c, p) = 1. Now |b|p = p−s = |β|2p is an even power of p. Hence s is

even, say s = 2r. So c = bp2r

= ( βpr )2 is a square in Qp, say c = γ2 for some

γ ∈ Qp. Now |γ|2p = |c|p = 1, so γ ∈ Zp. Let (γn)n be a sequence of integers

such that γn →p γ. Hence γ2n →p γ2 = c and so there exists n0 such that∣∣γ2n0

− c∣∣p≤ 1

p , i.e., γ2n0≡ c (mod p). Hence

(cp

)= 1. �

Proof of Theorem 6.16. Since f ′(x) ∈ Zp(x) and α0 ∈ Zp, we havef ′(α0) ∈ Zp. Now since |f ′(α0)|p = 1, f ′(α0) ∈ Z×p . So there exists h :=

−(f ′(α0))−1 ∈ Z×p with |h|p = 1. We will prove that αn satisfies the above

equations (1), (2), (3) for all n using induction.n = 1: Since α1 = α0 + hf(α0), clearly α1 ∈ Zp. Also,

|α1 − α0|p = |hf(α0)|p = |f(α0)|p < 1.


Moreover using Lemma 6.15, we get that if deg(f) = t then

f(α1) = f(α0 + hf(α0))

= f(α0) + f ′(α0)hf(α0) +f ′′(α0)

2!h2f(α0)

2 + · · ·+ f (t)(α0)

t!htf(α0)

t

= f(α0)(1 + f ′(α0)h) + f(α0)2K

= f(α0)2K (as 1 + f ′(α0)h = 0)

where K ∈ Zp. So

|f(α1)|p =∣∣f(α0)

2K∣∣p<

1

p

since∣∣f(α0)

2∣∣p≤ 1

p2.

Induction hypothesis: Assume that αk−1 satisfies (1), (2), (3) abovewhere k ≥ 2.

Clearly αk = αk−1 + hf(αk−1) ∈ Zp since αk−1 ∈ Zp. Moreover,

|αk − α|p ≤ max{|αk − αk−1|p , |αk−1 − α|p}= max{|hf(αk−1)|p , |αk−1 − α|p}< 1

since |hf(αk−1)|p <1

pk−1 and |αk−1 − α|p < 1. Further using Lemma 6.15,

we have

f(αk) = f(αk−1 + hf(αk−1))

= f(αk−1) + f ′(αk−1)hf(αk−1) + f(αk−1)2K1

= f(αk−1)(1 + f ′(αk−1)h) + f(αk−1)2K1

where K1 ∈ Zp.Since |αk−1 − α0|p < 1, we can write αk−1 = α0 +pt where t ∈ Zp. Thus

f ′(αk−1) = f ′(α0 + pt)

= f ′(α0) + pK2 (applying Lemma 6.15 to f ′)

for some K2 ∈ Zp. Hence

1 + hf ′(αk−1) = 1 + hf ′(α0) + phK2 = phK2

and so ∣∣1 + hf ′(αk−1)∣∣p≤ 1

p.

Hence it follows that

|f(αk)|p ≤ max{∣∣f(αk−1)(1 + f ′(αk−1)h)

∣∣p,∣∣f(αk−1)

2K1

∣∣p} < 1

pk.

Clearly (αn)n is p-adically cauchy sequence in Zp and so has a limit, sayα ∈ Zp. Therefore,

f(α) = f( limn→∞

αn) = limn→∞

f(αn) = 0,


since |f(αn)|p <1p → 0 as n → ∞. Note that the middle equality is true

since limit of sums and products are respectively sums and products oflimits!. �

Theorem 6.18 (Hensel’s Lemma - Strong version). Let f(x) ∈ Zp[x]. Letα1 ∈ Zp be such that |f(α1)|p ≤

1p2k+1 and |f ′(α1)|p = 1

pk. Then there is a

sequence (αn)n ∈ Zp such that

(i) |αn+1 − αn|p ≤1

pn+k, (ii) |f(αn)|p ≤

1

pn+2k.

If α ∈ Zp is the p-adic limit of αn, then f(α) = 0.

Proof. Given α1 ∈ Zp satisfying |f(α1)|p ≤1

p2k+1 and |f ′(α1)|p = 1pk

,

we can construct the following sequence (αn)n given by

αn+1 = αn − f ′(α1)−1f(αn)

such that for all n ∈ N, we have

(1) αn ∈ Zp.(2) |αn − α1|p ≤

1p1+k

.

(3) |f(αn)|p <1

pn+2k .

Note that unlike Theorem 6.16, f ′(α1)−1 ∈ Qp \ Zp, but since |f(α1)|p ≤

1p2k+1 we have∣∣∣f ′(α1)

−1f(α1)

∣∣∣p

=∣∣∣f ′(α1)

−1∣∣∣p· |f(α1)|p ≤ p

k · 1

p2k+1=

1

pk+1.

So f ′(α1)−1f(α1) ∈ Zp and hence α2 = α1 − f ′(α1)

−1f(α1) ∈ Zp and so forα2, (1) and (2) holds. Now by Lemma 6.15,

f(α2) = f(α1 + (−f ′(α1)−1f(α1)))

= f(α1)− f ′(α1)f′(α1)

−1f(α1) + p2k+2K = p2k+2K

for some K ∈ Zp. Hence |f(α2)|p <1

p2+2k .

Now by proceeding the step of induction hypothesis as in Theorem 6.16one obtains the strong version of Hensel’s Lemma. Note that by constructionof αn+1 we have

|αn+1 − αn|p =∣∣∣f ′(α1)

−1f(αn)

∣∣∣p≤ pk · 1

p2k+n=

1

pn+k,

at each step. �

Corollary 6.19. Let b ∈ Z such that b 6= 0. Then b is a square in Z2 iffb = 22rc for some r ∈ N where c ≡ 1 (mod 8).

Proof. (⇐=) Consider f(x) = x2 − c ∈ Z[x]. Since c ≡ 1 (mod 8), weget that f(c) ≡ 0 (mod 8), i.e. |f(c)|2 ≤ 1

23. Moreover, since f ′(c) = 2c and

c must be odd, we have |f ′(c)|2 = 12 . So by applying (a) with k = 1, there

7. HASSE PRINCIPLE 52

exists α ∈ Z2 such that f(α) = 0, i.e. α2 = c. Hence b = (2rα)2 is a squarein Z2.

(=⇒) Suppose b = β2 for some β ∈ Z2. Also we can write b = 2sc wheregcd(c, 2) = 1. Now |b|2 = 2−s = |β|2 is an even power of 2. Hence s is even,say s = 2r. Hence c is a square in Z2, say c = γ2 where |γ|2 = 1. So γ has2-adic expansion

γ = 1 + a1 · 2 + a2 · 22 + · · · , ai = 0 or 1

After squaring γ one obtains 2-adic expansion

γ2 = 1 + b1 · 23 + b2 · 24 + · · · , bi = 0 or 1

Hence c = γ2 ≡ 1 (mod 8). �

7. Hasse Principle

Let f(x1, x2, . . . xn) ∈ Z[x1, x2, . . . xn]. We want to know if f = 0 hassolutions in integers. Since Z ⊂ Zp for all primes p and Z ⊂ R, we knowthat

f = 0 has solution in Zn =⇒f = 0 has solution in Znp for all primes p and

f = 0 has solution in Rn.

So to show a diophantine equation has no integer solutions, it is enoughto find a prime p such that it has no solutions in Zp.Exercise. Show that the Diophantine equation

x2 + 7y4 = 3z2.

has no non-trivial solution in integers by showing it has no non-trivial solu-tion in Z7.

Suppose f = 0 has solution in Znp for all primes p and has solution in R.Then does f = 0 has solution in integers? This statement is called HassePrinciple and if it is true for f we say Hasse principle holds for f . Hasseprinciple holds for many polynomials. In particular we have

Theorem 6.20. (Hasse-Minkowski) Let f(x) ∈ Z[x1, x2, . . . xn] be homoge-neous degree 2 polynomial (i.e. each term of f has degree 2). Then Hasseprinciple holds for f .

There are many polynomials for which Hasse principle fails! Here is asimple counterexample.

Example 6.5. Let f(x) = (x2−2)(x2−17)(x2−34). Clearly only solutionsto f(x) = 0 in R are ±

√2, ±√

17, ±√

34. So it has no solutions in Z. Now17 ≡ 1 (mod 8), so by Corollary 6.19 we get that x2 − 17 has a solution inZ2. Also

(217

)= 1, So x2 − 2 has a solution in Z17. Thus f(x) = 0 has

solution in Z2 and Z17. Suppose p be a prime such that p 6= 2, 17. Wewant to show f(x) = 0 has a solution in Zp. So enough to show that either2 or 17 or 34 is a square modulo p in order to apply Hensel’s lemma. If

7. HASSE PRINCIPLE 53(2p

)= 1 or

(17p

)= 1 then we are done. Suppose

(2p

)=(17p

)= −1. Then(

34p

)=(2p

)(17p

)= 1 and so we are done.

Example 6.6. The polynomial f(x, y) = 2y2−x4+17 is another counterex-ample to the Hasse Principle. Note that f(x, y) = 0 has solutions over R,namely ( 4

√17, 0). Moreover it has solutions in Zp for all primes p, the proof

of which is not easy. Use quadratic reciprocity to check that f(x, y) = 0 hasno integer solutions.

CHAPTER 7

Geometry of Numbers

Let n be a positive integer. Consider the group (Rn,+).

Definition. A subgroup of Rn is called a (full) lattice in Rn if L = Zv1 +Zv2 + ·+ Zvn for some linearly independent vectors v1, v2 . . . vn of Rn. Forexample, Zn is lattice in Rn.

Definition. A sublattice of Zn is a subgroup of finite index.

Example 7.1. 2Z2 is a sublattice of Z2 of index 4 since

Z2/2Z2 = {(0, 0), (1, 0), (0, 1), (1, 1)}.

Definition. A subset S of Rn is symmetric if for every x ∈ S, −x ∈ S.

Definition. A subset S of Rn is convex if for every pair of points x, y ∈ S,the line joining x and y is contained in S, i.e., λx + (1 − λ)y ∈ S for allλ ∈ [0, 1].

Note that 0 ∈ S for every convex, symmetic subset S of Rn.

Theorem 7.1 (Minkowski’s Theorem). Let Λ be a sublattice of Zn of indexm. Let C be a convex symmetric subset of Rn having volume V (C) suchthat V (C) > 2nm. Then C ∩ Λ contains a point other than 0.

Let us see some applications of Minkowski’s Theorem.

1. Two Squares Theorem

Theorem 7.2. Let n = N2m be a positive integer where m is a square-free,i.e., ordp(m) = 0 or 1 for all primes p ∈ P. Then, n is a sum of two squaresiff m has no prime factor congruent to 3 (mod 4).

Before we continue with the proof consider the following easy lemma

Lemma 7.3. If m and n are each sum of two squares then so is mn.

Proof. The proof follows from the identity

(a2 + b2)(c2 + d2) = (ac+ bd)2 + (ad− bc)2.�

Also note that 2 = 12 + 12. So to prove the theorem we need to provethe following statement:A square-free m is a sum of two squares iff any odd prime divisor of m iscongruent to 1 (mod 4).

54

1. TWO SQUARES THEOREM 55

Proof. (=⇒): Let m = (a2 + b2). Let p be an odd prime divisor of m.If p | a and p | b then p2 | (a2 + b2) = m, a contradiction to m square-free.So WLOG we may assume that p - b. Let d ∈ Z such that bd ≡ 1 (mod p).Then we have

(a2+b2) ≡ 0 (mod p)⇐⇒ a2 ≡ −b2 (mod p)⇐⇒ (a2d2) ≡ −1 (mod p)

⇐⇒(−1

p

)= 1⇐⇒ p ≡ 1 (mod 4).

(⇐=): Let m = 2αp1p2 · · · pk where pi are distinct odd primes congruent to1 (mod 4) and α = 0 or 1. By Proposition 7.4 below it follows that pi is asum of two squares and hence so is m using Lemma 7.3. �

Proposition 7.4. An odd prime p is a sum of two squares iff p ≡ 1(mod 4).

Proof. (=⇒): Since sum of two squares is congruent to 0, 1 or 2 modulo4.(⇐=): Since p ≡ 1 (mod 4) we know that

(−1p

)= 1. Let ` ∈ Z be such

that `2 ≡ −1 (mod p). Let

Λ = {(x, y) ∈ Z2 : x ≡ `y (mod p)}.

Consider the map Θ : Z2 → Z/pZ given by (x, y) 7→ [x − `y]p. It is easy tosee that Θ is a group homomorphism and it is clear from the definition ofΛ that Ker(Θ) = Λ. So Λ is a subgroup of Z2. Further for any [a]p ∈ Z/pZwe have Θ((a, 0)) = [a]p, hence Θ is surjective. So by First IsomorphismTheorem we get that

Z2/Λ ∼= Z/pZ.Thus #Z2/Λ = p and so Λ has index p in Z2. Consider the disc centered at(0, 0) with radius

√2p

C = {(x, y) ∈ R2 : x2 + y2 < 2p}.

This is clearly convex and symmetric with volume

V (C) = Area(C) = 2πp > 22p.

So applying Minkowski’s Theorem to Λ and C we obtain a point (x, y) ∈C ∩ Λ such that (x, y) 6= (0, 0). Hence we have 0 < x2 + y2 < 2p. Also(x, y) ∈ Λ implies that

x2 + y2 ≡ `2y2 + y2 ≡ (`2 + 1)y2 ≡ 0 (mod p).

Thus x2 + y2 is a multiple of p strictly between 0 and 2p and so x2 + y2 =p. �

We will next apply Minkowski’s theorem to show that every positiveinteger can be written as sum of four squares! Before that we compute someareas and volumes.

2. AREAS AND VOLUMES 56

2. Areas and Volumes

Let S be a subset of Rn. Let χS be the characteristic function of Sdefined by

χS(x) =

{1 if x ∈ S0 if x /∈ S.

Then volume of S, denoted by V (S), is defined to be

V (S) :=

∫Sdx =

∫RnχS(x) dx.

Note that if we consider the coordinate system x1, x2, . . . xn for S ⊂ Rn thendx is simply dx1dx2 · · · dxn.

Example 7.2. Consider the ellipse

Ea,b =

{(x, y) ∈ R2 :

x2

a2+y2

b2< 1

},

where a and b are positive integers. Then

V (Ea,b) =

∫∫Ea,b

1dxdy.

Consider the substitution x/a = u and y/b = v. Then the ellipse Ea,b in thexy-plane becomes the unit disc

D = {(u, v) ∈ R2 : u2 + v2 < 1}

in the uv-plane. Further dx = adu and dy = bdv. Hence

V (Ea,b) =

∫∫Dabdudv = ab

∫∫D

1dudv = abV (D) = abπ.

Example 7.3. Consider the ellipsoid

Ea,b,c =

{(x, y, z) ∈ R3 :

x2

a2+y2

b2+z2

c2< 1

}.

Show that V (Ea,b) = πabc.

Example 7.4. We will next compute volume of Ball of radius r in 4-dimensions.

Br = {(x, y, z, w) ∈ R4 : x2 + y2 + z2 + w2 < r2}.

Then

V (Br) =

∫∫∫∫x2+y2+z2+w2<r2

dxdydzdw.

Note that −r < w < r, so we can rewrite the integral as

V (Br) =

∫ w=r

w=−r

(∫∫∫x2+y2+z2<r2−w2

dxdydz

)dw.

3. FOUR SQUARES THEOREM 57

However x2 +y2 + z2 < r2−w2 is a sphere in xyz-space of radius√r2 − w2,

so ∫∫∫x2+y2+z2<r2−w2

dxdydz =4π

3(r2 − w2)3/2.

Hence

V (Br) =4π

3

∫ w=r

w=−r(r2 − w2)3/2dw =

8π

3

∫ w=r

w=0(r2 − w2)3/2dw.

Now substitute w = r sin θ, so dw = r cos θdθ. So

V (Br) =8π

3

∫ θ=π/2

θ=0(r2cos2θ)3/2r cos θdθ =

8πr4

3

∫ π/2

0cos4 θdθ.

Writing cos θ = eiθ+e−iθ

2 and using binomial expansion we obtain that

cos4 θ =1

8cos 4θ +

1

2cos 2θ +

3

8.

Hence

V (Br) =8πr4

3

∫ π/2

0cos4 θdθ =

8πr4

3· 3π

16=π2r4

2.

3. Four Squares Theorem

Theorem 7.5. Every positive integer n can be written as the sum of fourinteger squares.

We need the following lemma.

Lemma 7.6. (Euler’s identity) If m and n are each sum of four squaresthen so is mn.

Proof. The proof follows from the following identity

(a2 + b2 + c2 + d2)(x2 + y2 + z2 + w2) =

(ax− by − cz − dw)2 + (ay + bx+ cw − dz)2

+ (az − bw + cx+ dy)2 + (aw + bz − cy + dx)2.

�

Proof of Theorem 7.5. Since 2 is a sum of four squares and by abovelemma it is enough to prove the statement of the theorem for odd primes.By Problem 9 Assignment 2, we know that there exists integers a, b suchthat a2 + b2 + 1 ≡ 0 (mod p). Consider

Λ = {(x, y, z, w) ∈ Z4 : x ≡ az + bw (mod p), y ≡ bz − aw (mod p)}.

By considering the homomorphism Θ : Z2 → Z/pZ×Z/pZ given by (x, y) 7→([x− az − bw]p, [y − bz + aw]p), it is easy to see that Λ is a subgroup of Z4

of index p2. Consider the following ball of radius√

2p

C = {(x, y, z, w) ∈ R4 : x2 + y2 + z2 + w2 < 2p}.

4. PROOF OF MINKOWSKI’S THEOREM 58

C is clearly convex and symmetric and by the above example we know that

V (C) =π2

2(√

2p)4 = 2π2p2 > 24p2.

Hence we can apply Minkowski’s Theorem to obtain (x, y, z, w) ∈ C ∩ Λsuch that (x, y, z, w) 6= (0, 0, 0, 0). So we have 0 < x2 + y2 + z2 + w2 < 2p.Further (x, y, z, w) ∈ Λ implies that

x2 + y2 + z2 + w2 ≡ (az + bw)2 + (bz − aw)2 + z2 + w2

= (a2 + b2 + 1)(z2 + w2) ≡ 0 (mod p).

Hence x2 + y2 + z2 +w2 is a multiple of p strictly between 0 and 2p and sox2 + y2 + z2 + w2 = p. �

4. Proof of Minkowski’s Theorem

Theorem 7.7 (Blichfeldt’s Theorem). Let m ≥ 1 be an integer. Let S be asubset of Rn such that volume V (S) > m. Then there exists m+ 1 distinctpoints x0, . . . , xm ∈ S such that

xj − xi ∈ Zn, for 0 ≤ i, j ≤ m.

Proof. Let W be the unit cube:

W = {(x1, . . . , xn) : 0 ≤ xi < 1}.Clearly volume of W is 1. Every vector x ∈ Rn can be decomposed uniquelyas x = z + w where z ∈ Zn and w ∈ W (note that every x in R can beuniquely written as bxc+ θ for some 0 ≤ θ < 1). Thus

Rn =⋃z∈Zn

(z +W ),

where z +W = {z + w : w ∈W}. Hence,

V (S) =

∫RnχS(x) dx =

∑z∈Zn

∫w∈W

χS(z + w) dw.

Interchanging the summation and integration (note that if S is boundedset then the summation is a finite sum and we can exchange the sum andintegral, in the general case one can use results from measure theory tojustify this interchange! It’s a good place to note that all the subset of Rn

that we are dealing with are actually Lebesgue-measurable),

V (S) =

∫w∈W

∑z∈Zn

χS(z + w)

dw.

Write f(w) =∑

z∈Z χS(z + w). If f(w ≤ m for all w ∈ W then V (S) =∫w∈W f(w)dw ≤

∫w∈W mdw = mV (W ) = m, a contradiction to our as-

sumption that V (S) > m. Hence there is some point w0 ∈ W such thatf(w0) > m; i.e.

∑z∈Zn χS(z +w0) > m. But χS(z +w0) are either zeros or

5. QUADRATIC FORMS AND HASSE-MINKOWSKI 59

ones, so there are m+ 1 distinct z0, . . . , zm ∈ Zn such that χS(zi + w) = 1.Let xi = zi + w0, so the xi are distinct elements in S. Finally

xj − xi = (zj + w)− (zi + w) = zj − zi ∈ Zn.

�

Theorem 7.8 (Minkowski’s Theorem). Let Λ be a sublattice of Zn of indexm. Let C be a convex symmetric subset of Rn having volume V (C) suchthat V (C) > 2nm. Then C ∩ Λ contains a point other than 0.

Proof. Let

S =1

2C =

{1

2x : x ∈ C

}.

The volume of S is

V (S) =1

2nV (C) > m.

By Blichfeldt’s Theorem, there are m+1 distinct points x0, . . . , xm ∈ S suchthat

xj − xi ∈ Zn, for 0 ≤ i, j ≤ m.Let y

j= xj − x0 ∈ Zn for j = 0, . . . ,m. These are m + 1 distinct points

in Zn. Since Λ has index m in Zn, it has m cosets and so there exists0 ≤ i 6= j ≤ m such that y

i, y

jlie in the same coset of Λ, i.e., 0 6= y

j−y

i∈ Λ.

Hence xj − xi = yj− y

iis a non-zero element of Λ. Next we will show that

xj − xi ∈ C.

Since xj , xi ∈ S we have xj = 12c and xi = 1

2c′ for some c and c′ are in

C. Now C is symmetric so, 12c−

12c′ is the mid-point between c and −c′, so

it must be in C. Hence xj − xi ∈ C. So xj − xi is a non-zero element inC ∩ Λ. �

5. Quadratic Forms and Hasse-Minkowski

Definition. A quadratic form over a commutative ring R in n variables isa homogeneous polynomial of degree 2 in R[x1, x2, . . . xn].

Example 7.5. (1) f(x) = ax2 ∈ R[x] is a quadratic form in one vari-able.

(2) f(x) = ax2+bxy+cy2 ∈ R[x, y] is a quadratic form in two variables.These are called binary quadratic forms.

(3) f(x) = ax2 + by2 + cz2 +dxy+ exz+ fyz ∈ R[x, y, z] is a quadraticform in three variables. These are called ternary quadratic forms.

Definition. A diagonal quadratic form over ring R in n variables is a qua-dratic form that looks like a1x

21 + a2x

22 + · · ·+ · · · anx2n.

Fact. Every quadratic form over a field F where 2 6= 0 , i.e. character-istic not equal to 2 can be represented by a diagonal form after a suitableinvertible linear change of variables over F .


In the case of binary quadratic form ax2 + bxy + cy2 where a 6= 0 oneuses the change of variables z = x+ b

2ay and w = y, to get a diagonal form

az2 + (c − b2

4a2)w2. Note that this change of variable is invertible, i.e., we

can write x, y in terms of z, w. Similarly we can get a digaonal form in thecases c 6= 0 or if a = c = 0, b 6= 0. For example, x2 + 2xy − 3y2 can berepresented as z2 − 4w2 where z = x+ y and w = y.

Theorem 7.9 (Hasse-Minkowski for binary forms). A binary quadraticform f(x, y) in Q[x, y] has a non-trivial zero in rationals iff it has non-trivial zeros in Qp for all primes p and in R.

Proof. (=⇒:)Clearly having solutions in Q2 implies having solutionsin Q2

p for all primes p and in R2.(⇐=:) As shown above we can represent the given binary form f(x, y) by adiagonal form az2 − bw2 after an invertible linear change of variables overQ. Now note that since the change of variables is invertible,

f(x, y) = 0 has a non-trivial solution in Q2 or Q2p or R2

⇐⇒ az2 − bw2 = 0 has a non-trivial solution in Q2 or Q2p or R2

(4)

So if a = 0 or b = 0 there is a non-trivial rational solution, say if a = 0 then(1, 0) is a non-trivial solution to az2 − bw2 = 0. So we may assume a 6= 0,b 6= 0. Then (4) is equivalent to z2 − cw2 = 0 having a non-trivial solutionin Q2 or Q2

p or R2, where c = b/a ∈ Q×. Let c = ±∏p∈P p

cp where cp ∈ Z.

Since by assumption z2 − cw2 = 0 has a non-trivial solution in Q2p for

all primes p and in R2 we get that c is a square in Qp for all primes p andc is a square in R. Hence we must have c > 0 and cp is even for all primesp. Hence c is a square in Q and thus z2− cw2 = 0 has a non-trivial solutionover rationals which completes the proof. �

Theorem 7.10 (Hasse-Minkowski for ternary forms). A ternary quadraticform over Q has a non-trivial zero in rationals iff it has non-trivial zeros inQp for all primes p and in R.

As before one way is easy! For the proof of the converse part, as beforewe can represent the given ternary form by a diagonal form a1x

21+a2x

22+a3x

23

after an invertible linear change of variables over Q. We may assume thata1, a2, a3 are all nonzero rationals since otherwise there will always be anon-trivial solution over rationals. We now have following lemmas.

Lemma 7.11. Let f = a1x21+a2x

22+a3x

23 ∈ Q[x1, x2, x3] where a1a2a3 6= 0.

Then there is an α ∈ Q× such that

(5) g = αf = b1y21 + b2y

22 + b3y

23

where b1, b2, b3 ∈ Z and b1b2b3 is square-free.


Proof. First we can multiply f by LCM of denominators of a1, a2, a3to get coefficients in integers, to obtain a form that looks like

a′1(c1x1)2 + a′2(c2x2)

2 + a′3(c3x3)2

where a′i, ci ∈ Z and a′i are square-free. Let zi = cixi and so our new formlooks like

a′1z21 + a′2z

22 + a′3z

23 .

Finally if there is a prime p such that p divides at least two of the coefficients,say p | a′1 and p | a′2 then dividing the above form by p we obtain

a′1pz21 +

a′2pz22 + pa′3

(z3p

)2

,

which can be written as

a′′1w21 + a′′2w

22 + a′′3w

23,

where a′′1 =a′1p , a

′′2 =

a′2p , a

′′3 = pa′3 are integers and by change of variables

w1 = z1, w2 = z2, w3 = z3p .

We can continue this process to finally obtain a form of type (5) (notethat this process should eventually stop as we are decreasing the absolutevalue of product of coefficients at every step). �

Lemma 7.12. Let g be of the form (5). Then

(a) Suppose g = 0 has a non-trivial solution in Qp. If p is an oddprime dividing b1b2b3, say p | b3 then there is an integer rp suchthat b1r

2p + b2 ≡ 0 (mod p).

(b) Suppose g = 0 has a non-trivial solution in Q2. Then(i) If 2 - b1b2b3 then after permuting the indices we may assume

that b1 + b2 ≡ 0 (mod 4).(ii) Suppose 2 | b3. Then b1 + b2 + b3s

2 ≡ 0 (mod 8) where s =0 or 1.

Proof. For part (a), let m1,m2,m3 ∈ Qp not all zero be such that

(6) b1m21 + b2m

22 + b3m

23 = 0.

We may assume, by multiplying a suitable power of p, that m1, m2m3 ∈Zp and max{|m1|p , |m2|p , |m3|p} = 1. Now

∣∣b3m23

∣∣p< 1. We claim that

|m1|p = |m2|p = 1. Suppose |m1|p < 1 Then∣∣b2m2

2

∣∣p

=∣∣b1m2

1 + b3m23

∣∣p< 1.

Since p - b2, we get |m2|p < 1. But then∣∣b3m2

3

∣∣p

=∣∣b1m2

1 + b2m22

∣∣p≤ 1/p2

which implies |m3|p < 1 contradicting max{|m1|p , |m2|p , |m3|p} = 1. Hence

|m1|p = |m2|p = 1. Let rp = m1/m2 ∈ Zp. Then it follows from (6) that

b1r2p + b2 ≡ 0 (mod p).

For part(b), let m1,m2,m3 ∈ Q2 not all zero be a solution and we mayassume as before that max{|m1|2, |m2|2, |m3|2} = 1.For part (i) let 2 - b1b2b3. Then one can show as before if one of mi hasnorm less than 1, then the other two are units in Z2. Also if any two of


them are units in Z2, say |m1|2 = |m2|2 = 1, then both b1m21 and b2m

22 are

2-adic units and so their 2-adic expansion starts with 1. Hence |b3m23|2 =

|b1m21 + b2m

22|2 ≤ 1/2 showing |m3|2 < 1. So precisely two of the mi’s are

units in Z2, say m1, m2. Then m21 ≡ m2

2 ≡ 1 (mod 4) giving b1 + b2 ≡ 0(mod 4).For part(ii), let 2 | b3. As in part (a) we obtain |m1|2 = |m2|2 = 1 > |b3m2

3|2and so m2

1 ≡ m22 ≡ 1 (mod 8). If |m3|2 = 1, we get b1+b2+b3 ≡ 0 (mod 8),

else |m3|2 < 1 and we get b1 + b2 ≡ 0 (mod 8). �

Proof of Theorem 7.10. Let g = 0 has solution in Qp for all primesp and in R. So Lemma 7.12 applies to g. The rest of the proof is anapplication of Minkowski’s Theorem. Let Λ be a subgroup in Z3 consistingof (n1, n2, n3) ∈ Z3 satisfying the following set of linear congruences:

(i) For each odd prime p | b3 impose the condition n1 ≡ rpn2 (mod p).In this case, indeed we have g(n1, n2, n3) = b1n

21 + b2n

22 + b3n

23 ≡

(b1r2p + b2)n

22 ≡ 0 (mod p). Similarly for each odd prime p | b1 or

p | b2 impose the respective conditions.(ii) If 2 | b3, impose the conditions n1 ≡ n2 (mod 4) and n3 ≡ sn2

(mod 2) where s = 0 or 1 as in part(ii) of Lemma 7.12. Then onecan check g(n1, n2, n3) ≡ 0 (mod 8). Impose similar conditions if2 | b1 or 2 | b2.

(iii) If 2 - b1b2b3 and b1 + b2 ≡ 0 (mod 4), then impose the conditionsn1 ≡ n2 (mod 2) and n3 ≡ 0 (mod 2).

If (n1, n2, n3) satisfies all the above imposed congruences then

g(n1, n2, n3) ≡ 0 (mod |4b1b2b3|).Let β = |4b1b2b3|. Then it is clear that Λ is a sublattice in Z3 of index β.Let C be an ellipsoid given by

C = {(r1, r2, r3) ∈ R3 : |b1|r21 + |b2|r22 + |b3|r23 < |4b1b2b3|}.In the above | · | is usual norm on R. Now volume of the ellipsoid S is

V (C) =π

3· 23 · |4b1b2b3| > 23β.

So applying Minkowski’s theorem we obtain (R1, R2, R3) ∈ C ∩Λ such that(R1, R2, R3) 6= (0, 0, 0). Since (R1, R2, R3) ∈ Λ we get that

b1R21 + b2R

22 + b3R

23 ≡ 0 (mod β).

Also (R1, R2, R3) ∈ C non-zero implies that

|b1R21 + b2R

22 + b3R

23| ≤ |b1|R2

1 + |b2|R22 + |b3|R2

3 < β.

It follows now b1R21 + b2R

22 + b3R

23 = 0 concluding the proof. �

CHAPTER 8

Irrationality and Transcendence

1. Irrationality

A number in R is called irrational if it does not belong to Q.

Example 8.1.√

2 is irrational. Since if√

2 = ab where a, b ∈ Z with

gcd(a, b) = 1, then squaring both sides we get 2b2 = a2 which implies 2divides both a and b giving a contradiction to gcd(a, b) = 1.

Theorem 8.1. (Gauss) Let f(x) = a0 + a1x + · · · + an−1xn−1 + xn be

a monic polynomial with integer coefficients and degree n ≥ 1. The onlypossible rational roots of f are integers which divide a0.

Proof. Let r = cd be a rational root of f where c, d ∈ Z with gcd(c, d) =

1. Thus

(7) a0 + a1c

d+ · · ·+ an−1

cn−1

dn−1+cn

dn= 0.

Multiplying by dn and rearranging we have

d(−a0dn−1 − a1cdn−2 − · · · − an−1cn−1) = cn.

Thus d | cn. We claim that d = 1. Suppose d > 1 and let p be any primefactor of d. Then p | d, so p | cn and hence p | c, giving a contradiction togcd(c, d) = 1. Hence d = 1. Therefore r = c ∈ Z. Moreover, by (7) we have

c(−a1 − a2c− · · · − an−1cn−2) = a0,

hence c | a0. Thus any rational root of f must be an integer dividing a0. �

Corollary 8.2. Let n > 1 be a positive integer. Suppose that d is a positiveinteger that is not an n-th power. Then n

√d is irrational.

Proof. Let f(x) = xn−d. Suppose n√d is rational. By Gauss’ Theorem,

n√d is an integer, say n

√d = c ∈ Z. Then d = cn is an n-th power, giving a

contradiction. �

So far the only irrational numbers we have seen are roots of polynomialswith integer coefficients. It is natural to wonder about the irrationality ofnaturally occuring numbers such as e = exp(1). In fact Euler proved that eis irrational by considering the expansion e = 1 + 1 + 1

2! + 13! + · · · . It is a

hard fact that e is not a root of any polynomial with integer coefficients.

63

2. ALGEBRAIC AND TRANSCENDENTAL NUMBERS 64

2. Algebraic and Transcendental Numbers

Definition. A number α ∈ C is algebraic if there is some n ≥ 1 and integersa0, a1, . . . , an, not all zero, such that α is a root of the polynomial

a0 + a1x+ · · ·+ anxn ∈ Z[x].

A number α ∈ C is transcendental if it is not algebraic.

Example 8.2.

• Any rational number pq where p, q ∈ Z is algebraic since it is a root

of qx− p ∈ Z[x].

• For any integer d, n√d is algebraic since it is a root of xn−d ∈ Z[x].

Thus√

2, i are algebraic.• Fact. Sums and products of algebraic numbers are algebraic.• Hard Fact. e and π are transcendental, i.e, e, π are not roots of

any polynomial with integer coefficients.

Lemma 8.3. The set of algebraic numbers is countable. Hence the set oftranscendentals is uncountable.

Proof. It is easy to see that the set of polynomial with integer co-efficiets is countable (countable union of countable sets!). Also each suchpolynomial has finitely many roots. Hence the set of roots of polynomialswith integer coefficients, which by definition is the set of algebraic numbers,is countable. Since R and hence C are uncountable, it follows that the setof transcendental is uncountable. �

Hence ‘almost all’ real and complex numbers are transcendental. How-ever in general it is very hard to show that a given number is transcendental.The number e is transcendental was proved by Charles Hermite in 1872. In1882 Lindemann showed that eα is transcendental for α any non-zero alge-braic number and hence iπ is transcendental since eiπ = −1. Since productof two algebraic numbers is algebraic it follows that π is transcendental.

In 1934, Gelfond and Schneider proved the following theorem.

Theorem 8.4. If a and b are algebraic numbers that are not zero or oneand b is not a rational number then ab is transcendental.

This theorem allows us to prove trancendence for many numbers. For ex-

ample,√

2√2

is transcendental. Also eπ = (eiπ)−i and hence transcendental.There are still many numbers

e+ π, πe, πe, ee, ππ, ee2, ππ

2, · · ·

that are expected to be transcendental but it is not even known whetherthey are rational or not!

Definition. Let α be an algebraic number. The degree of α is the smallestpositive integer d such that there is a polynomial f ∈ Z[x] of degree d withf(α) = 0.

3. LIOUVILLE’S THEOREM 65

Lemma 8.5. Let α be an algebraic number of degree d. Then it is a root ofan irreducible polynomial f ∈ Z[x] with degree d.

Proof. By definition of degree d, there is a polynomial f ∈ Z[x] ofdegree d such that f(α) = 0. We show that f is irreducible. Suppose f isreducible, then we can write f(x) = g(x)h(x) with g(x), h(x) ∈ Z[x] havingdegree smaller than d. Now f(α) = 0 implies either g(α) = 0 or h(α) = 0.This contradicts the mimimality of d. �

3. Liouville’s Theorem

Theorem 8.6 (Liouville’s Theorem). Let α ∈ R be an algebraic number ofdegree d. Then there is a constant C > 0, depending on α, so that for allrational numbers p/q,

either α =p

q, or

∣∣∣∣pq − α∣∣∣∣ ≥ C

qd.

Proof. We know that f(α) = 0 for some irreducible polynomial f ∈Z[x] of degree d ≥ 1. Write

f(x) = a0 + a1x+ · · ·+ adxd.

Then for any rational number p/q,

f

(p

q

)= a0 + a1

p

q+ · · · ad

pd

qd

=N

qd

where N = a0qd + a1pq

d−1 + · · · + adpd ∈ Z. Suppose that N = 0. Then

f(p/q) = 0, so qx − p is a factor of the irreducible polynomial f . Hence fis equal to qx − p up to multiplication by a non-zero constant and hencedegree 1. Now f(α) = 0 implies that qα− p = 0 so α = p/q.

What happens if α 6= p/q. Well, for a start N 6= 0. As N is an integer,|N | ≥ 1. So ∣∣∣∣f (pq

)∣∣∣∣ ≥ 1

qd.

Now we note that

1

qd≤∣∣∣∣f (pq

)∣∣∣∣=

∣∣∣∣f(α)− f(p

q

)∣∣∣∣ since f(α) = 0

= f ′(η)

∣∣∣∣α− p

q

∣∣∣∣ ,(8)

by the Mean Value Theorem, where η is some number between α and p/q.Let

C ′ = sup{|f ′(t)| : α− 1 ≤ t ≤ α+ 1.

}.


Let

C = min

{1,

1

C ′

}.

We shall show that ∣∣∣∣α− p

q

∣∣∣∣ ≥ C

qd,

which proves the theorem. If α − 1 ≤ p/q ≤ α + 1, then η is also in theinterval [α− 1, α+ 1]. So f ′(η) ≤ C ′. Hence by (8),∣∣∣∣α− p

q

∣∣∣∣ ≥ 1

C ′1

qd≥ C

qd,

which is what we want. Now all we have to worry about is the case whenp/q is outside the interval [α− 1, α+ 1]. But this is easy:∣∣∣∣α− p

q

∣∣∣∣ ≥ 1 ≥ 1

qd≥ C

qd,

which completes the proof. �

Joseph Liouville was the first to construct transcendental numbers in1844. Here is his example.

Corollary 8.7. Let

α =∞∑i=0

1

10i!.

Then α is transcendental.

Proof. We prove this using contradiction. Suppose that α is algebraicof degree d. Let n ≥ 1 and let q = 10n!. Let

p = q ·n∑i=0

1

10i!.

Note that p, q are positive integers, and that

0 < α− p

q

=1

10(n+1)!+

1

10(n+2)!+ · · ·

=1

10(n+1)!

(1 +

1

10(n+2)!−(n+1)!+ +

1

10(n+3)!−(n+1)!+ · · ·

)<

1

10(n+1)!

(1 +

1

10+

1

102+ · · ·

)since (n+ k)!− (n+ 1)! > (k − 1)

=10

9 · 10(n+1)!.


By the first inequality α 6= p/q. Hence by Liouville’s Theorem there existsa positive constant C such that

10

9 · 10(n+1)!> α− p

q>C

qd=

C

10d·n!.

Hence10

9C> 10(n+1)!−d·n!.

Note that here d and C are fixed, where as we can choose n as large as welike. Making n very large gives a contradiction. �

Documents

MA3H1 Topics in Number Theory Lecture Noteshomepages.warwick.ac.uk/~maslau/lecturenote.pdfChapter 6. p-adic numbers 37 1. Congruences modulo pm 37 2. p-adic Norm on Q 39 3. Sequences