Notes for the course in Analytic Number Theory · Analytic Number Theory G. Molteni Fall 2019 revision 8.0. Disclaimer These are the notes I have written for the course in Analytical

Notes for the course in

Analytic Number Theory

G. Molteni

Fall 2019

revision 8.0

Disclaimer

These are the notes I have written for the course in Analytical Number Theoryin A.Y. 2011–’20. I wish to thank my former students (alphabetical order): Gu-glielmo Beretta, Alexey Beshenov, Alessandro Ghirardi, Davide Redaelli and Fe-derico Zerbini, for careful reading and suggestions improving these notes. I amthe unique responsible for any remaining error in these notes.

The image appearing on the cover shows a picture of the 1859 Riemann’sscratch note where in 1932 C. Siegel recognized the celebrated Riemann–Siegelformula (an identity allowing to computed with extraordinary precision the valuesof the Riemann zeta function inside the critical strip).This image is a resized version of the image in H. M. Edwards Riemann’s ZetaFunction, Dover Publications, New York, 2001, page 156. The author has notbeen able to discover whether this image is covered by any Copyright and believesthat it can appear here according to some fair use rule. He will remove it in casea Copyright infringement would be brought to his attention.

Giuseppe Molteni

This work is licensed under a Creative Commons Attribution-Non-Commercial-NoDerivatives 4.0 International License. This means that: (Attribu-tion) You must give appropriate credit, provide a link to the license, and indicateif changes were made. You may do so in any reasonable manner, but not in anyway that suggests the licensor endorses you or your use. (NonCommercial) Youmay not use the material for commercial purposes. (NoDerivatives) If you re-mix, transform, or build upon the material, you may not distribute the modifiedmaterial.

i

Contents

Disclaimer i

Notation 1

Chapter 1. Prime number theorem 21.1. Preliminary facts: a warm-up 21.2. Two general formulas 51.3. The ring of arithmetical functions 131.4. Dirichlet series: as formal series 161.5. Dirichlet series: as complex functions 191.6. The analytic continuation of ζ(s) 281.7. Some elementary results 291.8. The Prime Number Theorem 39

Chapter 2. Primes in arithmetic progressions 61

Chapter 3. Sieve methods 713.1. Eratosthenes-Legendre’s sieve 713.2. Selberg’s Λ2-method 773.3. Sifting more classes 833.4. Two sets with positive density 91

Chapter 4. Sumsets 103

Chapter 5. Waring’s problem 1135.1. First step: cancellation in exponential sums 1155.2. Second step: integral representation 120

Appendix. Bibliography 129

ii

Notation

• Let f and g : R→ [0,+∞) functions. Then• f(x) = O(g(x)) as x → x0 ∈ R := R ∪ ±∞ means that the quotientf(x)/g(x) is locally bounded in a neighborhood of x0, i.e. that there exista constant C ∈ R+ and an open set U(x0) such that

f(x)/g(x) ≤ C ∀x ∈ U(x0).

• f(x) g(x) as x → x0 ∈ R := R ∪ ±∞ is an equivalent notation forf(x) = O(g(x)).

• f(x) = o(g(x)) as x → x0 ∈ R := R ∪ ±∞ means that the quotientf(x)/g(x) tends to 0 as x→ x0 (in other words, the constant C in previousitem can be taken arbitrarily small).

• f(x) g(x) as x → x0 ∈ R := R ∪ ±∞ means that both the quotientsf(x)/g(x) and g(x)/f(x) are locally bounded in a neighborhood of x0, i.e.that there exist a constant C ∈ R+ and an open set U(x0) such that

1

Cg(x) ≤ f(x) ≤ Cg(x) ∀x ∈ U(x0).

• f(x) = Ω(g(x)) as x→ x0 ∈ R := R ∪ ±∞ means that f(x) = O(g(x))is false. This means that for every constant C ∈ R and every open setU(x0) there exists x ∈ U(x0) such that

|f(x)|/|g(x)| > C.

• Given x ∈ R, the integer part of x is defined bxc := maxn ∈ Z : n ≤ x.It must be not confused with dxe := minn ∈ Z : n ≥ x. The fractionalpart of x is x := x − bxc (in signal processing this is called sawtooth func-tion). According to the definition, x is a 1-periodic function R → R withdiscontinuities in every x ∈ Z:

limx→n+

x = 0, limx→n−

x = 1, ∀n ∈ Z.

• Usually we denote by s the complex argument of a complex function. InA. N. Th. it is customary to use σ and t to denote the real and the imaginaryparts of s, respectively. In other words, s = σ + it with σ, t ∈ R.

• Given two integers m and n, m|n means that m divides n. (m,n) denotestheir greatest common divisor, and [m,n] their smallest common multiple, sothat mn = (m,n)[m,n].

• e(x) denotes the function e(x) := e2πix.

1

Chapter 1

Prime number theorem

1.1. Preliminary facts: a warm-up

Let P be the set of prime numbers. For every x ∈ R+, let

π(x) := ]p ∈ P : p ≤ x.

How much large π(x) can be?

Proposition 1.1 (Euclid) P is not finite, therefore π(x)→∞ as x→∞.

Proof. Let p1, . . . , pn be any set of primes. Let N := 1 + p1p2 · · · pn. N is aninteger, thus it has a prime factor p. p is not equal to any pj , since N = 1(mod pj).

The argument can be modified in such a way to produce a quantitative result.

Proposition 1.2 π(x) ln lnx as x→∞.

Proof. Let p1 = 2 < p2 < . . . be the complete set of primes (an infinite set,according to the previous result). We use the previous argument to prove that

pn ≤ 22n−1for every n. In fact, the claim is true for n = 1 (because p1 = 2 ≤ 221−1

).By induction on n and following the argument proving Proposition 1.1, we knowthat

pn ≤ 1 + p1p2 · · · pn−1 ≤ 1 +

n−1∏j=1

22j−1= 1 + 2

∑n−1j=1 2j−1

= 1 + 22n−1−1 ≤ 22n−1.

For every x ≥ 2, let n be such that 22n−1 ≤ x < 22n . Then

π(x) ≥ π(22n−1) ≥ n ≥ log2 log2 x ln lnx.

There are several alternative and elementary proofs of these facts.

Polya. For every n ∈ N let Fn := 22n+1, the nth Fermat number. These numbersare pairwise coprime, i.e.

(Fn, Fm) = 1 ∀n 6= m.

Proof. The sequence of Fermat numbers satisfies a kind of recursive formula;in fact

∏m−1k=0 Fj = Fm − 2 for every m ≥ 1, an identity which can be easily

proved by induction. The formula shows that Fn divides Fm − 2 whenevern < m; in particular every prime dividing both Fn and Fm is also a factor ofFm and Fm − 2, hence it must be 2. Nevertheless, Fermat’s numbers are odd,therefore they cannot have 2 as common factor.

2

CAP. 1: PRIME NUMBER THEOREM 3

The co-primality implies that every Fn has a special prime factor, pn, whichdivides Fn and does not divide every Fm with m 6= n. In particular, there areinfinity many primes (because there are infinity many Fermat’s numbers) and

the nth prime is lower than Fn−2 = 22n−2+ 1 ≤ 22n−1

as proved before. (Notethat we use here the fact that p1 = 2 and p2 = 3 = F0).

Polya (variation). For every n ∈ N let Mn := 2n − 1, the nth Mersenne num-ber. These numbers satisfy the relation Mm = 2m−nMn + Mm−n for everym ≥ n, so that the greatest common divisor of Mm and Mn is M(m,n). Letp1 = 2, . . . , pk be any set of distinct primes. Then Mp1 , . . . ,Mpk are pairwisecoprime, therefore there are at least k distinct odd prime numbers (becauseevery Mn is an odd number), in particular there is at least one odd prime num-ber which is greater than pk, and hence one more prime. The argument provesthat if 2 = p1 < p2 < p3 < · · · is the sequence of prime, then pn+1 < 2pn .This upper bound for pn can be used to produce a lower bound for π(x), butof incredibly low quality.

Erdos. Every integer n can be written in a unique way as a product of a squarem2 and a squarefree q. Fix x > 2 and apply that decomposition to everyinteger n ≤ x. There are

√x possible values for m, and 2π(x) values for q, at

most. Hence

bxc = ]n ∈ N : n ≤ x ≤ ]m · ]q ≤√x · 2π(x)

implying that π(x) lnx. Note that also that this simple argument alreadyimproves Proposition 1.2.

Euler. Consider the product∏p≤x

(1− 1/p)−1 =∏p≤x

(1 +

1

p+

1

p2+

1

p3+ · · ·

).

Every integer n can be written in a unique way as product of prime powersand when n is ≤ x then also the primes appearing in its factorization are ≤ x(trivial). Therefore the previous product gives

(1.1)∏p≤x

(1− 1/p)−1 =∏p≤x

(1 +

1

p+

1

p2+

1

p3+ · · ·

)≥∑n≤x

1

n.

The right hand side diverges in x, hence this inequality already proves theexistence of infinitely many primes: only in this case the product appearingto the left hand side diverges.The following argument deduces an interesting lower bound from this argu-ment. The inequality (1 − y) ≥ e−2y holds whenever y ∈ [0, 1/2], hence

4 1.1. PRELIMINARY FACTS: A WARM-UP

(1− 1/p)−1 ≤ e2/p for every prime p, so that

e∑p≤x

2p =

∏p≤x

e2/p ≥∏p≤x

(1− 1/p)−1 ≥∑n≤x

1

n,

i.e., ∑p≤x

1

p≥ 1

2ln(∑n≤x

1

n

).(1.2)

This inequality proves that∑

p≤x1p ln lnx because

∑n≤x

1n ∼ lnx.

Exercise. 1.1 The following steps improve the lower bound (1.2) by removing theconstant 1/2 appearing there.

1) Prove that (1−x)ex = 1+O(x2) for x→ 0 and use this equality to prove that

e∑p≤x

1p =

∏p≤x

e1p =

∏p≤x

(1− 1

p

)−1·∏p≤x

(1 +

O(1)

p2

).

2) Prove that∏p≤x(1 + c · p−2) converges to a nonzero constant for every fixed

constant c ≥ 0; deduce that∑p≤x

1

p= ln

(∏p≤x

(1− 1

p

)−1)+O(1).

3) Use (1.1) to deduce that∑p≤x

1

p≥ ln

(∑n≤x

1

n

)+O(1).

4) Conclude that

(1.3)∑p≤x

1

p≥ ln lnx+O(1).

In Proposition 1.14 we will see that ln lnx is the right behavior for the sum ofinverse of primes, since ∑

p≤x

1

p∼ ln lnx.

This is part of a very famous set of results proved with totally elementary toolsby Mertens long before the proof of the Prime Number Theorem.


1.2. Two general formulas

The following formula is due to Abel. It is essentially the discrete version ofthe well known formula for the partial integration.

Proposition 1.3 (Partial summation formula: 1 v.) Let f, g : N → C. Let F (x):=∑

1≤k≤x f(k) for x ≥ 1, and F (x) := 0 if x < 1. Then

N∑n=1

f(n)g(n) = F (N)g(N)−N−1∑n=1

F (n)(g(n+ 1)− g(n)).

Proof. f(n) = F (n)− F (n− 1) for every n ≥ 1, hence

N∑n=1

f(n)g(n) =

N∑n=1

(F (n)− F (n− 1)

)g(n) =

N∑n=1

F (n)g(n)−N∑n=1

F (n− 1)g(n)

=

N∑n=1

F (n)g(n)−N−1∑n=1

F (n)g(n+ 1)

= F (N)g(N)−N−1∑n=1

F (n)(g(n+ 1)− g(n)).

Suppose now that g ∈ C1([0,+∞)), then g(n + 1) − g(n) =∫ n+1n g′(x) dx so that

the formula becomesN∑n=1

f(n)g(n) = F (N)g(N)−N−1∑n=1

F (n)

∫ n+1

ng′(x) dx.

Here F (x) = F (n) for x ∈ [n, n+ 1), therefore

= F (N)g(N)−N−1∑n=1

∫ n+1

nF (x)g′(x) dx

= F (N)g(N)−∫ N

1F (x)g′(x) dx.

In this way we have proved the following useful formula.

Proposition 1.4 (Partial summation formula 2 v.) Let f : N→ C, g : [0,+∞)→C, g ∈ C1([0,+∞)). Let F (x) :=

∑1≤k≤x f(k) for x ≥ 1, and F (x) := 0 if x < 1.

Then

(1.4)

N∑n=1

f(n)g(n) = F (N)g(N)−∫ N

1F (x)g′(x) dx.

6 1.2. TWO GENERAL FORMULAS

The importance of the partial summation formula comes from the fact thatthe function F (x) :=

∑n≤x f(n) (sometime called the cumulating function of f)

is no less regular than f . The following result is a simple instance of this fact.

Proposition 1.5 (Cesaro mean value) Let f : N → R and suppose that f(k) →` ∈ R as k diverges. Then 1

xF (x)→ `, too.

Note that 1xF (x) is the mean value of the set f(k)k≤x, thus the proposition

claims that the mean value of f is at least as regular as the original sequence f .

Proof. Suppose ` ∈ R. Let ε > 0 be fixed. There exists an integer K such thatk ≥ K implies f(k) ∈ (`− ε, `+ ε). Let x ≥ K and let x ∈ N, then

1

xF (x)− ` =

1

x

∑k≤x

f(k)− ` =1

x

∑k≤x

(f(k)− `)

so that∣∣∣1xF (x)− `

∣∣∣ ≤ 1

x

∑k≤x|f(k)− `| = 1

x

( ∑k≤K|f(k)− `|+

∑K≤k≤x

|f(k)− `|)

≤ c

x+

1

x

∑K≤k≤x

ε ≤ c

x+ ε,

where c is independent of x. This formula proves the claim when x diverges in N.Trivial bounds involving x and bxc prove the claim for the general case x ∈ R. Atlast, the statement for ` ∈ ±∞ can be proved in similar way.

Usually F (x) has a better behavior (in some sense) than f(n). The followingexercise show this principle in action: there we have f(n) = exp(2πinθ) oscillatesas a function of n, but F has a finite mean value, and this allows the convergenceof the series.

Exercise. 1.2 Recall that e(x) := exp(2πix).

1) Prove that

F (x, θ) :=∑

0≤n≤xe(nθ)=

∑0≤n≤bxc

e2πinθ =e2πi(bxc+1)θ − 1

e2πiθ − 1= eπibxcθ

sin(π(bxc+ 1)θ)

sin(πθ)

for every θ ∈ R\Z.

2) Deduce that for every θ ∈ R\Z there is a constant cθ > 0 such that |F (x, θ)| ≤cθ independently of x.

3) Using Proposition 1.4 deduce that the series

∞∑n=1

e(nθ)

nconverges for every θ ∈ R\Z.


Exercise. 1.3 Let zjNj=1 be any finite set of distinct complex numbers, with

|zj | = 1 for every j. Let α1, . . . , αN be complex numbers, not all equal to zero.Prove that

sk := α1zk1 + · · ·+ αNz

kN = Ω(1) as k →∞,

i.e., that the claim limk→∞ sk = 0 is false.Hint: by absurd, suppose that the limit exists and is 0. Suppose α1 6= 0. Deducea contradiction by computing the limit of 1

x

∑k≤x skz

−k1 in two different ways.

The following formula compares a sum with the corresponding integral.

Proposition 1.6 (Euler, Maclaurin) Let c be an integer and f : [c,+∞)→ C be aC1 function. Then

(1.5)

N∑k=c

f(k) =

∫ N

cf(x) dx+

1

2

(f(N) + f(c)

)+

∫ N

cf ′(x)(x − 1

2) dx.

Proof. It is immediate to verify that

1

2

(g(0) + g(1)

)=

∫ 1

0g(x) dx+

∫ 1

0g′(x)(x− 1

2) dx

for every g ∈ C1([0, 1]). Now write this formula for the functions gk(x) := f(x+k)where k ∈ N is arbitrarily fixed; we get

1

2

(f(k) + f(k + 1)

)=

∫ 1

0f(x+ k) dx+

∫ 1

0f ′(x+ k)(x− 1

2) dx.

In the integrals we take the shift x+ k → x, so that

1

2

(f(k) + f(k + 1)

)=

∫ k+1

kf(x) dx+

∫ k+1

kf ′(x)(x− k − 1

2) dx.

For x ∈ [k, k + 1), we have x− k = x, hence the equality can be written also as

1

2

(f(k) + f(k + 1)

)=

∫ k+1

kf(x) dx+

∫ k+1

kf ′(x)(x − 1

2) dx.

Now we add the equality for k = c, . . . , N − 1, obtaining

N∑k=c

f(k)− 1

2(f(N) + f(c)) =

∫ N

cf(x) dx+

∫ N

cf ′(x)(x − 1

2) dx,

which is the claim.

Exercise. 1.4 We have proved Proposition 1.6 directly, but it can be deduced alsofrom the partial summation formula (1.4).Hint: set in that formula f(x) = 1, so that F (x) = bxc = x− x, and integrateby part the integral containing the factor x.


Example. 1.1 Applying the formula to f(x) = 1/x we have

(1.6)N∑k=1

1

k= lnN + γ +O(1/N)

where γ := 1−∫∞

1xx2

dx = 0.5772 . . . is the Euler–Mascheroni constant. In fact,

N∑k=1

1

k=

∫ N

1

1

xdx+

1

2

(1 +

1

N

)−∫ N

1

x − 12

x2dx.

The integral∫∞

1x−1/2

x2dx converges absolutely because | x − 1

2 | ≤12 , thus we

can write the equality as

= lnN +1

2

(1 +

1

N

)−∫ ∞

1

x − 12

x2dx+

∫ ∞N

x − 12

x2dx.(1.7)

Now the claim follows by setting γ := 12 −

∫∞1x−1/2

x2dx = 1 −

∫∞1xx2

dx andnoticing that

(1.8)∣∣∣ ∫ ∞

N

x − 12

x2dx∣∣∣ ≤ 1

2

∫ ∞N

1

x2dx =

1

2N.

The constant γ is probably the most famous and important constant of the Ma-thematics after 0, 1, π, e and i, and is not totally well understood. For instance,the known algorithms for its computation are not very efficient (when comparedto the analogous algorithms for other constants, π for instance). One conjecturesthat γ is a transcendental number, but it is still unknown if γ ∈ Q.

Exercise. 1.5 From (1.7) and (1.8), deduce that

0 ≤N∑k=1

1

k− lnN − γ ≤ 1

N∀N ≥ 1.

This formula can be used to compute the value of γ, but it is not very efficient (itneeds N terms to get γ with an approximation 1/N).

Exercise. 1.6 Use (1.6) to prove that:

N∑k=1k even

1

k=

1

2ln(N/2) +

γ

2+O(1/N),

N∑k=1k odd

1

k=

1

2ln(2N) +

γ

2+O(1/N).

Use this result to prove that

−N∑k=1

(−1)k

k= ln 2 +O(1/N).


With the same tool, prove that for every m,n ≥ 1 the alternating sum

1 + 13 + · · ·+ 1

2m−1︸︷︷︸m terms

−(12 + 1

4 + · · ·+ 12n︸︷︷︸

n terms

)

+ 12m+1 + 1

2m+3 +· · ·+ 14m−1︸︷︷︸

m terms

−( 12n+2 + 1

2n+4 +· · ·+ 14n︸︷︷︸

n terms

) + · · · = 1

2ln(4m/n).

This is a concrete example of the phenomenon proved by Riemann: a series con-verges unconditionally (i.e. the convergence and the value of the series are inde-pendent of any reordering of its terms) if and only if it converges absolutely. Inother words, if a series converges only simply, then there are reorderings of thesame numbers producing new series which converge to different values.

Exercise. 1.7 Use Proposition 1.6 to prove that for every ` ∈ N,

N∑k=1

(ln k)` = N(lnN)` +O(N(lnN)`−1).

With an inductive argument on ` and using also Proposition 1.4 prove the moreprecise equality

N∑k=1

(ln k)` = NP`(lnN) +O((lnN)`),

where P`(x) is the polynomial∑`

n=0(−1)`−n `!n! xn.

The formula (1.5) can be considerably extended. For every integer n, letBn(x) be the set of Bernoulli polynomials, i.e. the family of polynomials whichare defined recursively as:

B0(x) = 1, B′n(x) = nBn−1(x),

∫ 1

0Bn(x) dx = 0, ∀n ≥ 1.

For example:

B0(x) = 1, B4(x) = x4 − 2x3 + x2 − 130 ,

B1(x) = x− 12 , B5(x) = x5 − 5

2x4 + 5

3x3 − 1

6x,

B2(x) = x2 − x+ 16 , B6(x) = x6 − 3x5 + 5

2x4 − 1

2x2 + 1

42 ,

B3(x) = x3 − 32x

2 + 12x, B7(x) = x7 − 7

2x6 + 7

2x5 − 7

6x3 + 1

6x.

Exercise. 1.8 The following steps give a uniform bound for Bn(x).

1) Deduce from∫ 1

0 Bn(x) dx = 0 that Bn has at least a root in (0, 1) when n ≥ 1.


2) Using Lagrange’s intermediate values theorem deduce that ‖Bn‖∞ ≤ ‖B′n‖∞,when n ≥ 1, where the sup norms are for x ∈ [0, 1].

3) Use the recursive definition of Bn to deduce that ‖Bn‖∞ ≤ n‖Bn−1‖∞, andby induction conclude that ‖Bn‖∞ ≤ n!.

This is not the correct order of growth for ‖Bn‖∞, since it is known that ‖Bn‖∞ n!

(2π)n , but the simple argument captures the main features of the polynomials,

i.e. their over-exponential growth: I’m indebt with Guglielmo Beretta for thisargument (Thanks!).

Exercise. 1.9 The following steps prove some relations among Bernoulli polyno-mials.

1) Let F (x, t) :=∑∞

n=0Bn(x) tn

n! . Prove that

F (x, t) =text

et − 1.

Hint: use the recursive formula to deduce that ∂xF (x, t) = tF (x, t), so thatF (x, t) = a(t)ext for some function a(t). Then, integrating term by term the

definition of F (x, t) prove that a(t) et−1t =

∫ 10 F (x, t) dx = 1.

2) From 1) deduce that F (1−x, t) = F (x,−t), i.e. that Bn(1−x) = (−1)nBn(x).

3) From 1) deduce that F (0, t) + t2 is an even function of t, and that therefore

Bn(0) = 0 when n is odd and > 2.

4) Conclude that Bn(1) = Bn(0) for every n ≥ 2.

5) Prove that Bn(x) =∑n

k=0

(nk

)Bk(0)xn−k for every n.

Hint: use the equality F (x, t) = extF (0, t).

6) Specializing the previous formula to x = 1 and using 4), deduce a recursiveformula for the sequence Bn(0), n ∈ N.

7) Prove that |Bn(x)| ≤ n!e|x| for every x ∈ R and every n.Hint: use the formula in Step 5 and the bound in Ex. 1.8.

8) Prove that∑n

k=0

(n+1k

)Bk(x) = (n+ 1)xn for every n.

Hint: use the equality (et − 1)F (x, t) = text.

9) Prove that Bn(x+ 1)−Bn(x) = nxn−1 for every n.Hint: use the equality F (x+ 1, t)− F (x, t) = text.

10) For every N ≥ 0 and n ∈ N, n ≥ 1 let Sn(N) :=∑N

k=0 kn. Prove that

Sn(N) = 1n+1(Bn+1(N + 1)−Bn+1(0)).


This is Faulhaber’s formula, giving the value of the sum of nth power of integersup to N as polynomial in N .

Note that B1(x) = x − 1/2, so that x − 1/2 is B1(x). Suppose that f ∈ C2

and let k be any integer, then∫ k+1

kf ′(x)(x − 1

2) dx = limε→0+

∫ k+1−ε

k+εf ′(x)B1(x) dx

where we have introduced the parameter ε because 12B2(x) is a primitive for B1(x)

for every x, but 12B2(x) is a primitive for B1(x) only for x ∈ R\Z. Now we

can integrate by parts, getting

= limε→0+

[f ′(x)

B2(x)2

∣∣∣k+1−ε

k+ε−∫ k+1−ε

k+εf ′′(x)

B2(x)2

dx]

= limε→0+

f ′(x)B2(x)

2

∣∣∣k+1−ε

k+ε−∫ k+1

kf ′′(x)

B2(x)2

dx

= limε→0+

f ′(k + 1− ε)B2(1− ε)2

− f ′(k + ε)B2(ε)

2−∫ k+1

kf ′′(x)

B2(x)2

dx

= f ′(k + 1)B2(1)

2− f ′(k)

B2(0)

2−∫ k+1

kf ′′(x)

B2(x)2

dx

=B2(0)

2(f ′(k + 1)− f ′(k))−

∫ k+1

kf ′′(x)

B2(x)2

dx.

Summing for k = c, . . . , N − 1 we get∫ N

cf ′(x)(x − 1

2) dx =B2(0)

2(f ′(N)− f ′(c))−

∫ N

cf ′′(x)

B2(x)2

dx,

so that (1.6) becomes

(1.9)N∑k=c

f(k) =

∫ N

cf(x) dx−B1(0)(f(N) + f(c))

+B2(0)

2(f ′(N)− f ′(c))− 1

2

∫ N

cf ′′(x)B2(x) dx.

Now it is evident in which way the formula can be further iterated for sufficientlyregular functions.

Remark. 1.1 Why the polynomial Bn are normalized by setting∫ 1

0 Bn(x) dx = 0?Because this condition says that integral mean value of Bn is zero, and this is

convenient for the estimation of the remainder term∫ N

0 f ′′(x)B2(x) dx (or even

the more general∫ N

0 f (n)(x)Bn(x) dx).


Example. 1.2 Applying (1.9) to f(x) = lnx we have

(1.10) ln(N !) =

N∑k=1

ln k = N lnN −N +1

2lnN + c+O(1/N),

where c is a constant that the method does not allow to determine but whichcan be found with a different argument (for instance as a consequence of Wallis’formula): c = 1

2 ln(2π). The resulting formula for ln(N !) is due to Stirling.

Exercise. 1.10 Probably you are curious about Wallis’ formula we mentionedbefore, i.e. about a possible way to identify the constant c in (1.10). Here thesketch of the proof.

1) Let In :=∫ π

0 (sinx)n dx. Prove that I0 = π, I1 = 2 and that In = n−1n In−2 for

every n ≥ 2.

2) Recall that the double factorial !! is defined as 0!! = 1!! = 1 and n!! :=n(n − 2)(n − 4) · · · for n ≥ 2, where the product decreases up to 1 or 2,according to the parity of n. Prove that

(2k)!! = 2kk!, (2k + 1)!! =(2k + 1)!

(2k)!!=

(2k + 1)!

2kk!.

3) Prove that I2k+1=∫ π

0 (sinx)2k+1 dx≤∫ π

0 (sinx)2k dx=I2k<I2k−1=2k+12k I2k+1 and

deduce thatI2k/I2k+1 → 1 as k diverges.

4) From 1) and 2) deduce that

I2k

I2k+1=

(2k − 1)!!(2k + 1)!!

(2k)!!(2k)!!

I0

I1=

(2k − 1)!(2k + 1)!k

24kk!4π.

5) The result in Example 1.2 can be written as N ! = (N/e)Nec√NeO(1/N), so

that N ! ∼ (N/e)Nec√N . Inserting this asymptotic into 4), after some simpli-

fications prove that the right hand side tends to 2πe−2c. This constant must

be 1, by 3). This proves that c = ln(2π)2 , i.e. that

N ! ∼ (N/e)N√

2πN.

There are several alternative proofs. Some of them are collected in the first chapterof [EL].

Exercise. 1.11 Use (1.9) and the fact that |B2(x)| ≤ 16 for x ∈ [0, 1] to prove that

− 1

6N2≤

N∑k=1

1

k− lnN − 1

2N− γ ≤ 0 ∀N ≥ 1.


This formula can be used to compute the value of γ and is more efficient thanthe one in Ex. 1.5 (it needs

√N terms to get γ with an approximation 1/N).

This formula for γ can be further improved using the Euler-Maclaurin formula athigher levels (involving Bernoulli polynomials of higher order), and was used byEuler exactly for this purpose.

Exercise. 1.12 Let α ∈ (0, 1). Prove that there exists a constant cα such that∑n≤N

1

nα=N1−α

1− α+ cα +

ξ(α,N)

Nα∀N ≥ 1,

with ξ(α,N) ∈ [0, 1] for every N .

1.3. The ring of arithmetical functions

In Analytic Number Theory it is customary to call arithmetical function everyfunction f : N\0 → C; the notion of arithmetical function therefore overlapswith the one of sequence. The set of arithmetical functions has a natural structureof commutative and associative ring with unit, with respect to the pointwise sum:

(f + g)(n) := f(n) + g(n) ∀n ∈ N\0,

and the Dirichlet product :

(f ∗ g)(n) :=∑d|n

f(d)g(n/d) =∑d,d′

dd′=n

f(d)g(d′).

The second representation shows immediately the equality f ∗ g = g ∗ f . Theunit with respect to this product is the function δ which is defined as: δ(1) = 1,δ(n) = 0 for every n > 1.

Exercise. 1.13 Prove that f is invertible in the ring of the arithmetical functionsif and only if f(1) 6= 0.

A function f is called:

-) multiplicative, when f(mn) = f(m)f(n) for every couple of coprime integersm,n;

-) completely multiplicative, when f(mn) = f(m)f(n) for every couple of integersm,n;

-) additive, when f(mn) = f(m)+f(n) for every couple of coprime integers m,n;

-) completely additive, when f(mn) = f(m) + f(n) for every couple of integersm,n.

14 1.3. THE RING OF ARITHMETICAL FUNCTIONS

We will see several examples of arithmetical functions having (one of) these pro-perties. The most interesting is certainly the multiplicativity, as a consequence ofthe following fact.

Proposition 1.7 Let f and g be multiplicative, then f ∗ g is multiplicative, too.

Proof. Let m and n be coprime. Then every divisor of mn can be factorized in aunique way as product of two integers d and d′, with d dividing m and d′ dividingn. Vice-versa, every couple d, d′ of divisors of m and n respectively, produces adivisors dd′ of mn. As a consequence

(f ∗ g)(mn) =∑d|md′|n

f(dd′)g(md

n

d′)

=∑d|m

∑d′|n

f(dd′)g(md

n

d′)

=∑d|m

∑d′|n

f(d)f(d′)g(md

)g( nd′)

=∑d|m

f(d)g(md

)∑d′|n

f(d′)g( nd′)

= (f ∗ g)(m)(f ∗ g)(n),

where for the intermediate equality we have used the fact that (m,n) = 1, d|m,d′|n imply (d, d′) = 1 = (m/d, n/d′), and the multiplicativity of f and g.

Exercise. 1.14 Prove that if f is invertible and multiplicative, then also f−1

is multiplicative; this proves that the multiplicative and invertible arithmeticalfunctions form an abelian group.

Here we recall some of the most useful arithmetical functions.

δP , the characteristic function of the primes, defined as: δP(n) = 1 when n ∈ P,0 otherwise;

δ, the delta function, defined as: δ(1) = 1, δ(n) = 0 for all n > 1;

1, the unit function, defined as: 1(n) = 1 for all n ≥ 1;

I, the identity function, defined as I(n) := n for all n;

ω, the omega function, defined as: ω(n) := #p : p|n (i.e., the number of distinctprimes dividing n);

µ, the Mobius function, defined as: µ(n) := 0 if n not squarefree, µ(n) = (−1)ω(n)

if n is squarefree;

Λ, the Von Mangoldt function, defined as: Λ(n) := 0 if n not a prime power,Λ(n) := ln p if n = pk for some prime p and some power k > 0;

στ , the τ -divisor function, defined as: στ (n) :=∑

d|n dτ for every n, when τ is a

fixed parameter in C.


d, the divisor function, defined as: d(n) := σ0(n) =∑

d|n 1 for every n;

σ, the sum of divisors function, defined as: σ(n) := σ1(n) =∑

d|n d for every n;

ϕ, the totient Euler function, defined as ϕ(n) := the cardinality of the set ofintegers in 1, . . . , n which are coprime to n; in other words, ϕ(n) = ](Z/nZ)∗.

The following facts can be verified directly:

1) ω is additive;

2) µ,1, στ , d, σ are multiplicative;

3) ϕ is multiplicative, for instance as a consequence of the Chinese remaindertheorem, and ϕ(n) = n

∏p|n(1− 1/p);

4) d = 1 ∗ 1, σ = 1 ∗ I, στ = 1 ∗ Iτ .

The following equality is less trivial and is fundamental for our purposes. It iscalled second form of the Mobius identity (the first one being a slightly differentidentity):

(1.11) 1 ∗ µ = δ,

i.e.,∑

d|n µ(d) = 0 for every n > 1. This equality shows that µ is the inverse of 1

in the ring of the arithmetical functions.

Proof. Both δ and 1 ∗ µ are multiplicative, therefore it is sufficient to prove theirequality for prime powers. Let k > 0 and p be any prime number, then

(1 ∗ µ)(pk) =∑d|pk

µ(d) = µ(1) + µ(p) = 0 = δ(pk),

which proves the claim.

The associativity of the ring implies that

F = 1 ∗ f ⇐⇒ f = µ ∗ F,(1.12)

i.e., that

F (n) =∑d|n

f(d) ⇐⇒ f(n) =∑d|n

F (d)µ(n/d).(1.13)

Exercise. 1.15 (Erdos) Let f(n) : N→ (0,+∞) be a multiplicative and monotonefunction. Then, there exists α ∈ R such that f(n) = nα for every n.1

1The original proof was very difficult but now it has been considerably simplified, for instance seeE. Howe: A new proof of Erdos’ theorem on monotone multiplicative functions, Amer. Math.Monthly 93(8), 593–595, 1986. See also [T], Ch. 1.2 Ex. 10.

16 1.4. DIRICHLET SERIES: AS FORMAL SERIES

Exercise. 1.16 Prove that I = 1 ∗ ϕ and that ϕ = I ∗ µ, i.e. that n =∑

d|n ϕ(d)

and ϕ(n) =∑

d|n dµ(n/d).

Exercise. 1.17 Prove that ln = 1∗Λ and that Λ = µ∗ln, i.e. that lnn =∑

d|n Λ(d)

and Λ(n) =∑

d|n µ(d) ln(n/d). Deduce that Λ(n) = −∑

d|n µ(d) ln d.

Exercise. 1.18 Let h be a completely additive map. Let Dh be defined on thering of arithmetical functions by setting Dhf : n → (Dhf)(n) := f(n)h(n) (i.e.,the pointwise multiplication by the values of h). Prove that Dh is a derivation,i.e. that

Dh(f ∗ g) = (Dhf) ∗ g + f ∗Dhg.

The ring of arithmetical functions supports also other derivations which are notof this kind.2

Exercise. 1.19 The ring of arithmetical functions is a Unique Factorization Do-main, i.e. every arithmetical function can be written in a unique way (up toreordering) as product of irreducible arithmetical functions.3. This is not knownfor the ring of somewhere converging Dirichlet series and this is a pity because,if proved, such unique factorization property would have many important conse-quences for the number theory.

1.4. Dirichlet series: as formal series

To every arithmetical function f we associate a formal series F and vice-versa,in the following way:

f : N\0 → C ⇐⇒ F (s) :=∞∑n=1

f(n)

ns.

Note that we are not assuming any hypothesis about the convergence of the seriesdefining F (s), so that we consider it (for the moment) only as a formal series. Anyseries of the form F is called Dirichlet series. For instance

1 ⇐⇒∞∑n=1

1

ns=: ζ(s) Riemann’s zeta function,

δ ⇐⇒∞∑n=1

δ(n)

ns= 1,

2See H. N. Shapiro: On the convolution ring of arithmetic functions, Comm. Pure Appl. Math.25, 287–336, 1972.

3See E. D. Cashwell, C. J. Everett: The ring of number-theoretic functions, Pacific. J. Math. 9,975–985, 1959; and Formal power series Pacific. J. Math. 13, 45–64, 1963.


I ⇐⇒∞∑n=1

n

ns= ζ(s− 1),

Iτ ⇐⇒∞∑n=1

nτ

ns= ζ(s− τ).

As usual when dealing with formal series, we consider two Dirichlet series F andG as equal if and only if they have the same coefficients. Also the set of formalDirichlet series is a ring, with respect to the pointwise sum

F (s) =∞∑n=1

f(n)

ns, G(s) =

∞∑n=1

g(n)

ns, =⇒ (F +G)(s) =

∞∑n=1

f(n) + g(n)

ns,

and product

(FG)(s) :=( ∞∑m=1

f(m)

ms

)( ∞∑n=1

g(n)

ns)

=

∞∑m=1

∞∑n=1

f(m)g(n)

(mn)s=

∞∑n=1

(f ∗ g)(n)

ns.

The last formula shows that the ring of arithmetical functions (with the ∗ Dirichletproduct) and the ring of formal Dirichlet series (and pointwise sum and product)are isomorphic, so that we can prove identities about arithmetical functions simplymultiplying the corresponding Dirichlet series (and vice-versa, of course). Forinstance, we have

d = 1 ∗ 1 =⇒∞∑n=1

d(n)

ns= ζ2(s),

σ = 1 ∗ I =⇒∞∑n=1

σ(n)

ns= ζ(s)ζ(s− 1),

στ = 1 ∗ Iτ =⇒∞∑n=1

στ (n)

ns= ζ(s)ζ(s− τ),

µ = 1−1 =⇒∞∑n=1

µ(n)

ns= ζ−1(s).

For multiplicative arithmetical functions f an alternative representation is pos-sible (for the moment only as formal identity, without any notion of convergence).In fact, from the unique factorization of every integer as product of prime powerswe have∏

p

(1 +

f(p)

ps+f(p2)

p2s+f(p3)

p3s+ · · ·

)=∞∑n=1

f(pν11 ) · f(pν22 ) · · · f(pνkk )

ns

18 1.4. DIRICHLET SERIES: AS FORMAL SERIES

where n = pν11 · pν22 · · · p

νkk is the factorization of n in prime powers (for n = 1

the product is empty and its value is taken equal to 1, by definition). When fis multiplicative we have f(pν11 ) · f(pν22 ) · · · f(pνkk ) = f(n), so that we get the finalequality

(1.14)∏p

(1 +

f(p)

ps+f(p2)

p2s+f(p3)

p3s+ · · ·

)=∞∑n=1

f(n)

ns,

which is called representation as Euler product. When f is completely multiplica-tive the identity can be further elaborated because in this case

1 +f(p)

ps+f(p2)

p2s+f(p3)

p3s+ · · · = 1 +

f(p)

ps+f(p)2

p2s+f(p)3

p3s+ · · · =

(1− f(p)

ps

)−1,

again as formal identity between power series in the variable f(p)/ps. In this casethe Euler product becomes

(1.15)∏p

(1− f(p)

ps

)−1=

∞∑n=1

f(n)

ns.

The multiplicativity of the functions 1 and µ gives the representations∏p

(1− 1

ps

)−1=∞∑n=1

1

ns= ζ(s)

∏p

(1− 1

ps

)=∞∑n=1

µ(n)

ns= ζ−1(s),

so that now the equality (1.11) saying that 1 ∗ µ = δ simply becomes∞∑n=1

1

ns

∞∑n=1

µ(n)

ns= ζ(s)ζ−1(s) = 1,

and Equalities (1.12) and (1.13) can be formulated by saying that

ζ(s)F (s) = G(s) ⇐⇒ F (s) = ζ−1(s)G(s).

Exercise. 1.20 Using the multiplicativity and the representation as Euler productit is now easy to prove that:

∞∑n=1

ϕ(n)

ns=ζ(s− 1)

ζ(s).

Exercise. 1.21 Prove that ∑n squarefree

1

ns=∏p

(1 +

1

ps

);


deduce that ∑n squarefree

1

ns=

ζ(s)

ζ(2s).

Exercise. 1.22 This is a generalization of the previous exercise. Let r be a fixedpositive integer. An integer n is called r-power free when p|n implies that pr - n(i.e., when n is not divisible by any r-power which is not equal to 1). Prove that∑

n r-power free

1

ns=

ζ(s)

ζ(rs).

Exercise. 1.23 Again, using the multiplicativity and the representation as Eulerproduct prove that for every couple of arbitrarily fixed τ, ν ∈ C, one has

∞∑n=1

στ (n)σν(n)

ns=ζ(s)ζ(s− τ)ζ(s− ν)ζ(s− τ − ν)

ζ(2s− τ − ν).

This identity is due to Ramanujan. As special cases we have

∞∑n=1

|στ (n)|2

ns=ζ(s)ζ(s− τ)ζ(s− τ)ζ(s− τ − τ)

ζ(2s− τ − τ)∀ τ ∈ C,

∞∑n=1

|σiτ (n)|2

ns=ζ2(s)ζ(s− iτ)ζ(s+ iτ)

ζ(2s)∀ τ ∈ R,

∞∑n=1

d(n)2

ns=ζ4(s)

ζ(2s).

1.5. Dirichlet series: as complex functions

We have seen how the Dirichlet series are already useful when are consideredas formal series. Nevertheless, their full strength appears when they are consideredas complex functions, but for this we have to discuss their convergence as series.There is very good introduction to this topic in [Titch], Ch. IX, and Hardy andRiesz [HR] have dedicated an entire book to this subject.

Theorem 1.1 Suppose that the Dirichlet series F (s) :=∑∞

n=1 f(n)/ns convergesfor s = s0 ∈ C. Then it converges for every s with Re(s) > Re(s0) and theconvergence is uniform in the sector S` := s : |Im(s − s0)| ≤ `Re(s − s0), forevery ` > 0.

20 1.5. DIRICHLET SERIES: AS COMPLEX FUNCTIONS

Im(s)

SH

s

s0

Re(s)Re(s0)

Im(s0)

Figure 1.1.

Proof. Without loss of generality we can assume that F (s0) = 0, because we cansatisfy this condition simply by change the value of f(1) into f(1) − F (s0). LetS(x) :=

∑n≤x f(n)/ns0 , with S(x) = 0 when x < 1. By Proposition 1.4 we get

that for every M < N ∈ NN∑

n=M+1

f(n)

ns=

N∑n=M+1

f(n)

ns01

ns−s0

=S(N)

N s−s0 −S(M)

M s−s0 + (s− s0)

∫ N

MS(x)xs0−s−1 dx.

Now, fix ε > 0 and take M large enough to have |S(m)| < ε for every m ≥ M :such an M exists because F (s0) = 0, by hypothesis. Moreover, we take s withRe(s) ≥ Re(s0), so that we deduce that∣∣∣ N∑

n=M+1

f(n)

ns

∣∣∣ ≤ |S(N)|NRe(s−s0)

+|S(M)|MRe(s−s0)

+|s− s0|∫ N

M|S(x)|xRe(s0−s)−1 dx

≤ 2ε+ ε|s− s0|∫ N

MxRe(s0−s)−1 dx ≤ 2ε+ ε|s− s0|

∫ +∞

1xRe(s0−s)−1 dx

= ε(

2 +|s− s0|

Re(s− s0)

)≤ ε(

3 +|Im(s− s0)|Re(s− s0)

).

When s ∈ S` the last inequality becomes ≤ ε(3 + `), so that the convergence(uniform in S`) follows by the Cauchy test.

In analogy with the power series, the previous result motivates the introductionof the notion of abscissa of convergence, which is

σc := infRe(s0) : F (s) converges at s = s0,


with σc = +∞ when the series does not converge in any point, and σc = −∞when the series converges everywhere. In fact, from the previous theorem wededuce immediately the following fact.

Corollary 1.1 Suppose that the Dirichlet series F (s) :=∑∞

n=1 f(n)/ns convergessomewhere (hence σc ∈ [−∞,+∞)), then the series converges in the half-plainRe(s) > σc and the convergence is uniform in every compact subset.

Each function 1/ns = exp(−s lnn) is holomorphic in C, so that from Morera’stheorem we deduce immediately the following regularity result.

Corollary 1.2 Suppose that the Dirichlet series F (s) :=∑∞

n=1 f(n)/ns convergessomewhere (hence σc ∈ [−∞,+∞)), then F (s) is holomorphic in the half-plainH := s ∈ C : σ > σc and its derivative can be computed termwise, so that

F ′(s) = −∞∑n=1

f(n) lnn

ns∀s : Re(s) > σc.

Proof. F (s) is continuous in Re(s) > σc, because the convergence is uniform inevery compact (so we have uniform convergence in a suitable open neighborhoodof every fixed point) and the summand are evidently continuous. Let Γ be anyclosed (simple and regular) curve contained in H. Then∫

ΓF (s) ds =

∫Γ

∞∑n=1

f(n)

nsds =

∞∑n=1

∫Γ

f(n)

nsds

where the inversion of the sum and the integral is allowed by the uniform conver-

gence (the curve Γ is evidently a compact set). Each integral∫

Γf(n)ns ds is null,

because the map n−s is holomorphic in C, hence∫ΓF (s) ds = 0.

Therefore F (s) is a complex map which is continuous and whose integral over everyclosed curve is zero: Morera’s theorem allows to conclude that F (s) is holomorphicin Re(s) > σc.The proof of the formula for F ′(s) runs as follow. Let s0 ∈ H be fixed, and let Γbe a circle centered at s0, sufficiently small to be contained in H and positivelyoriented. Then the Cauchy formula for the derivative of a holomorphic functionsays that

F ′(s0) =1

2πi

∫Γ

F (z)

(z − s0)2dz =

1

2πi

∫Γ

∞∑n=1

f(n)n−z

(z − s0)2dz.


The uniform convergence in Γ allows to exchange the integral and the series, sothat

F ′(s0) =∞∑n=1

f(n)1

2πi

∫Γ

n−z

(z − s0)2dz.

The inner integral is the derivative of the map s 7→ n−s at s0 (because this map isholomorphic), thus its value is −n−s0 lnn, so that

F ′(s0) =

∞∑n=1

−f(n) lnn

ns0,

which is the claim.

Exercise. 1.24 Note that Corollary 1.2 shows that when F (s) is a Dirichlet series,then F ′(s) is a Dirichlet series too, and that if we denote by σ′c and σc the abscissaof convergence for F ′(s) and F (s) respectively, then σ′c ≤ σc. Prove that actuallyσ′c = σc.Hint: F (s) is a primitive of F ′(s), and the integration along every compact pathcan be done termwise, by the uniform convergence of F ′(s) in Re(s) > σ′c.

The notion of abscissa of convergence is similar to the one of radius of conver-gence for power series, nevertheless there is an important difference: a Dirichletseries can have a finite abscissa σc without having any singularity along the verticalline Re(s) = σc. For power series, there is always a singularity on the critical circle.This different behavior is due essentially to the lack of compactness (the verticalline is not compact, while the critical circle is evidently a compact set). An exam-ple of this phenomenon is the series

∑∞n=1(−1)n/ns for which σc = 0 (prove it, for

instance by using Proposition 1.3 with f(n) = (−1)n and g(n) = n−s to prove that

limN→∞∑N

n=1(−1)n/ns exists and is finite whenever Re(s) > 0) and that admitsan analytic continuation to C as holomorphic function (see Ex. 1.28). However,for Dirichlet series with non-negative coefficients this phenomenon cannot happen:this is the claim of the following result, due to Landau.

Theorem 1.2 (Landau) Suppose that the Dirichlet series F (s) :=∑∞

n=1 f(n)/ns

converges somewhere, and that F (s) has an analytic continuation in Ω\σc whereΩ is an open set containing the point s = σc. If f(n) ≥ 0 for every n, then σc isa singularity for F (s).

Proof. Without loss of generality we can assume that σc = 0 (because we canalways translate the problem to the analogous problem for F (s+σc) whose abscissaof convergence is 0). By absurd, suppose that F (s) is holomorphic in an open


neighborhood U of s = 0, so that its Taylor power series centered at 1

F (s) =

∞∑k=0

F (k)(1)

k!(s− 1)k

has a convergence radius strictly greater than 1. By Corollary 1.2

F (k)(1) = (−1)k∞∑n=1

f(n) lnk n

n∀k,

hence

F (s) =∞∑k=0

∞∑n=1

(1− s)k

k!

f(n) lnk n

n.

Let s ∈ U be negative, then each term appearing in this double series is non-negative (here we use the assumption f(n) ≥ 0 for all n) so that the series can beexchanged without modifying its value; in this way we get

F (s) =∞∑n=1

f(n)

n

∞∑k=0

(1− s)k lnk n

k!.

The inner sum here is exp((1− s) lnn) = n1−s, hence we have proved that

F (s) =

∞∑n=1

f(n)

ns

holds for some negative s. This means that the Dirichlet series converges for somenegative s, which is impossible since we have assumed that σc = 0.

In its essence, the previous theorem holds because a non-negative double seriescan be reordered without assuming its convergence. The following exercise providesan other instance of this fact, this time in the more familiar setting of the powerseries.

Exercise. 1.25 (Pringsheim’s Theorem) Let F (z) :=∑∞

n=0 a(n)zn be a powerseries with convergence radius equal to 1 and suppose that it has an analyticcontinuation in an open set containing z = 1. Prove that if a(n) ≥ 0 for every nthen the point z = 1 is a singularity for F (z).Hint: Imitate the proof of the Landau theorem. Suppose (by absurd) that F (s)is holomorphic in an open set containing z = 1. Consider the representation aspower series at s = 1/2 of F . The convergence radius of this power series is strictlygreater than 1/2 (why?). Use the representation as power series at z = 0 to find

F (k)(1/2) and substitute in the power series at z = 1/2. In this way you get adouble series which can be exchanged (because its terms are non-negative).


A Dirichlet series∑∞

n=1 f(n)/ns is called absolutely convergent in s0 ∈ C when∞∑n=1

∣∣∣f(n)

ns0

∣∣∣ =∞∑n=1

|f(n)|nRe(σ0)

<∞.

Note that the absolute convergence depends only on the behavior of the series atRe(σ0), so that if the series converges absolutely at s0 then it converges absolutelyin every point of the vertical line Re(s) = Re(s0). The absolute convergenceimplies the usual convergence (a simple application of the Cauchy test). Moreover,it is immediate to prove that if a Dirichlet series converges absolutely at s0, then itconverges absolutely in every point s with Re(s) ≥ Re(s0) and that the convergenceis uniform in every half-plain Hε := s : Re(s) > Re(s0)+ε. Therefore, it is usefulto introduce the notion of abscissa of absolute convergence, which is

σa := infσ ∈ R : F (σ) converges absolutely.Evidently σc is always ≤ σa.

Exercise. 1.26 Let F (s) =∑∞

n=1 f(n)/ns. Suppose that it converges somewhere,so that σc < +∞. Prove that σa ≤ σc + 1, i.e. that the convergence is absolute inevery point s with Re(s) > σc + 1.

The previous exercise proves that σc ≤ σa ≤ σc+1; in general it is not possibleto be more precise, in fact for every choice of u ∈ [0, 1] it is possible to define aDirichlet series with σc = 0 and σa = u.

Exercise. 1.27 Let ` ∈ N. Let F`(s) :=∑∞

n=1(−1)n/n`s. Note that this is aDirichlet series, since ` ∈ N. Prove that for this series σc = 0 (using the partialsummation formula) and σa = 1/`.As a more elaborated example, let α ≥ 1. Let Fα be the Dirichlet series

Fα(s) =

∞∑n=1

(−1)n

dnαes

where dxe := infn ∈ Z : x ≤ n. Prove that for this series σc = 0 and σa = 1/α.

For multiplicative functions the alternative representation as Euler product ispossible. We see now that this representation is valid whenever the Dirichlet seriesconverges absolutely.

Theorem 1.3 Let F (s) =∑∞

n=1 f(n)/ns converge absolutely at s0. Let f(n) bemultiplicative, then the infinite product

(1.16)∏p

(1 +

∞∑k=1

f(pk)

pks

)converges absolutely at s0 to F (s0).


Proof. Recall that an infinite product∏n(1 + an) converges absolutely (by defi-

nition), when∏n(1 + |an|) converges, and that this happens if and only if

∑n |an|

converges too. In fact, the inequality 1 + y ≤ ey implies that∏n

(1 + |an|) ≤ exp(∑n

|an|)

so that ∑n

|an| <∞ =⇒∏n

(1 + |an|) <∞.

On the other hand, for y ∈ [0, 1] we have ey/2 ≤ 1 + y. If∏n(1 + |an|) < +∞ then

|an| goes to zero4, so that |an| < 1 if n is large enough, n > N say. Then

(1.17) exp(12

∑n>N

|an|) ≤∏n>N

(1 + |an|)

so that ∏n

(1 + |an|) <∞ =⇒∑n

|an| <∞.

Therefore, in order to prove the absolute convergence of (1.16) at s0 it is sufficientto prove that ∑

p

∣∣∣ ∞∑k=1

f(pk)

pks0

∣∣∣ <∞.This is almost immediate, since∑

p

∣∣∣ ∞∑k=1

f(pk)

pks0

∣∣∣ ≤∑p

∞∑k=1

|f(pk)|pkRe(s0)

≤∞∑n=1

|f(n)|nRe(s0)

<∞,

by hypothesis.Now we have to prove that the product converges to F (s0). We fix P > 0 andconsider the finite product ∏

p≤P

(1 +

∞∑k=1

f(pk)

pks0

).

In this product each factor is a power series converging absolutely, therefore wecan rearrange the terms without modifying its value. The multiplicativity of f

4In fact, the sequence∏Nn=1(1 + |an|) increases and is larger than 1. Thus its limit `, say, is not

zero and 1 + |aN | =∏N

n=1(1+|an|)∏N−1n=1 (1+|an|)

goes to ``

= 1. Notice that this argument does not apply

to simply convergent products, since in that case it is not possible to exclude the case that thelimit ` is 0.


shows that the rearrangement is the sum∑n∈A

f(n)

ns0

where A denotes the set of integers whose prime factors are smaller than P . Wenote that

F (s0)−∑n∈A

f(n)

ns0=∑n∈B

f(n)

ns0

where B = Ac is the set of integers having at least one prime factor greater thenP . Hence∣∣∣ ∏

p≤P

(1 +

∞∑k=1

f(pk)

pks0

)− F (s0)

∣∣∣ =∣∣∣∑n∈B

f(n)

ns0

∣∣∣ ≤∑n∈B

∣∣∣f(n)

ns0

∣∣∣ ≤∑n≥P

∣∣∣f(n)

ns0

∣∣∣,since each integer in B is not lower than P . The last sum tends to 0 when P →∞,

since∑

n

∣∣f(n)ns0

∣∣ converges, by hypothesis.

The following proposition proves an important fact about the localization ofzeros of an absolute convergent product; its relevance for Dirichlet series withmultiplicative coefficients comes from the previous theorem.

Proposition 1.8 Let∏n(1+an) be an absolutely converging infinite product. Then

its value is 0 if and only if some factor 1 + an is zero.

Proof. For y ∈ [0, 1/2] we have e−2y ≤ 1 − y. By hypothesis∑

n an convergesabsolutely, thus |an| < 1/2 if n is large enough, n > N say. Then

exp(−2∑n>N

|an|) ≤∏n>N

(1− |an|) ≤∏n>N

|1 + an| =∣∣∣ ∏n>N

(1 + an)∣∣∣.

In particular∣∣∏

n>N (1 +an)∣∣ is strictly positive. This proves that

∏n(1 +an) can

be equal to 0 if and only if the finite product∏n≤N (1+an) is equal to zero, which

is the claim.

When applied to the Dirichlet series defining the Riemann zeta function, theprevious theorems prove that

Corollary 1.3 In the half-plain Re(s) > 1 the Dirichlet series ζ(s) =∑

n 1/ns

converges absolutely, is a holomorphic function, has the representation

ζ(s) =∞∑n=1

1

ns=∏p

(1− 1

ps

)−1

and is not equal to zero.


Remark. 1.2 It is sufficient to modify even only the sign of a finite set of coefficientsof ζ to produce a new function having zeros in Re(s) > 1, and therefore badlyviolating the Riemann hypothesis. The new function has no more a representationas Euler product: this fact suggests that if RH is true, then for its proof in someplace the arithmetic will have a fundamental contribution (see Ex. 1.53).

There is still one important aspect about the Dirichlet series that we have to dis-cuss. When we consider them as formal series, we define the equality

∑∞n=1 f(n)/ns

=∑∞

n=1 g(n)/ns by saying that this happens if and only if f(n) = g(n) for every n.What happen to this condition when the series converge somewhere and thereforecan be considered as functions of complex variable? The following proposition willallow to show that the series are equal as complex functions if and only if they areequal as formal series, i.e. that they are equal if and only if f(n) = g(n) for everyn.

Proposition 1.9 Let F (s) =∑∞

n=1 f(n)/ns converge somewhere. Let σ0 be anyreal number greater then σa, the abscissa of absolute convergence. Then

limT→∞

1

2T

∫ T

−T|F (σ0 + it)|2 dt =

∞∑n=1

|f(n)|2

n2σ0<∞.

As a consequence, if F (s) is the null function then f(n) = 0 for every n.

Proof. The absolute convergence at σ0 implies the absolute convergence of theseries in every point of the vertical line s : s = σ0 + it, t ∈ R, uniformly in t.The theorem about the absolute convergence of product of series proves that

|F (σ0 + it)|2 =∞∑m=1

f(m)

mσ0+it·∞∑n=1

f(n)

nσ0−it

=∞∑

m,n=1

f(m)f(n)

(mn)σ0

( nm

)itand that this double series converges absolutely and uniformly for t ∈ R. Theuniform convergence gives

1

2T

∫ T

−T|F (σ0 + it)|2 dt =

∞∑m,n=1

f(m)f(n)

(mn)σ01

2T

∫ T

−T

( nm

)itdt.

A simple computation shows that

1

2T

∫ T

−T

( nm

)itdt =

1 if n = msin(T ln(n/m))T ln(n/m) if n 6= m.

28 1.6. THE ANALYTIC CONTINUATION OF ζ(S)

Moreover,1

2T

∣∣∣ ∫ T

−T

( nm

)itdt∣∣∣ ≤ 1

2T

∫ T

−T

∣∣∣( nm

)it∣∣∣dt =1

2T

∫ T

−Tdt = 1, so that

the convergence of the double series in m,n is uniform in T . Hence the limit asT →∞ can be computed termwise, and we get

limT→∞

1

2T

∫ T

−T|F (σ0 + it)|2 dt =

∞∑m,n=1

f(m)f(n)

(mn)σ0limT→∞

1

2T

∫ T

−T

( nm

)itdt

=∞∑

m,n=1

f(m)f(n)

(mn)σ0δm,n =

∞∑n=1

|f(n)|2

n2σ0.

1.6. The analytic continuation of ζ(s)

From the previous section we know that ζ(s) =∑∞

n=1 1/ns is a holomorphicfunction in Re(s) > 1. In this section we prove that ζ(s) admits a ‘natural’extension as a function in C.For every fixed s, from Proposition 1.6 we get that

N∑n=1

1

ns=

∫ N

1

dx

xs+

1

2

( 1

N s+ 1)− s

∫ N

1

B1(x)xs+1

dx

=1

s− 1+N1−s

1− s+

1

2

( 1

N s+ 1)− s

∫ N

1

B1(x)xs+1

dx.

Now, suppose that Re(s) > 1, then we can take the limit N →∞, getting

ζ(s) =1

s− 1+

1

2− s

∫ ∞1

B1(x)xs+1

dx.

This relation is an identity for Re(s) > 1, but the integral exists in the largerregion Re(s) > 0 (the integrand here decays to ∞ as 1

xRe(s)+1 , because B1(x)is bounded). Moreover, this integral defines a holomorphic function in Re(s) >0 (standard argument, again based upon Morera’s theorem), and (s − 1)−1 isevidently a meromorphic function in C with a unique pole at s = 1, which issimple and where the residue is equal to 1. Thus, the previous equality can beused to define ζ(s) in the larger region Re(s) > 0, as meromorphic function inRe(s) > 0 and having a unique pole (simple, and residue equal to 1) at s = 1. Thegeneral theory of analytic continuation allows to conclude that this definition isthe unique one which extends ζ(s) as meromorphic function.

We can pursuit this argument. With an integration by parts we get∫ ∞1

B1(x)xs+1

dx = −B2(0)

2+s+ 1

2

∫ ∞1

B2(x)xs+2

dx, when Re(s) > 0.


As before, this is an equality in Re(s) > 0 but the integral to the right handside exists in Re(s) > −1 and defines a holomorphic function here. Thus thisequality can be used to define the function to the left hand side in this largerregion. Iterating this argument m-times we get a formula providing the analyticextension of ζ(s) in Re(s) > 1 − m. Concluding, we have proved the followingresult.

Corollary 1.4 The Riemann zeta function ζ(s) admits an analytic continuation inC as meromorphic function, with a unique pole at s = 1 which is simple and withRess=1 ζ(s) = 1.

There are more elegant ways to prove the same conclusion. For example wecan extend ζ to Re(s) > 0 with the previous argument and then use the functionalequation satisfied by ζ (a relation connecting the values of ζ(s) and ζ(1 − s)) toget the analytic continuation. The functional equation is a fundamental tool forthe comprehension of the deep analytical properties of ζ(s), but it is not usefulfor our limited purpose (the proof of the Prime Number Theorem). We will notmention it anymore in these notes.

Exercise. 1.28 Prove that f(s) :=∑∞

n=1(−1)n/ns = (21−s− 1)ζ(s) when Re(s) >1. Notice that this equality provides the analytic continuation to C of f(s), asholomorphic function.Hint: Look for a representation of

∑n even 1/ns and

∑n odd 1/ns in terms of ζ(s),

and subtract. For the second part, note that (21−s − 1) is holomorphic in C andhas a zero at s = 1.

1.7. Some elementary results

Before to face the Prime Number Theorem it is a good idea to begin with amore modest target. Mertens’ result (in its simplest formulation) and Dirichlet’sresult about the mean value of the divisor function will be two good tests. Weneed a preliminary upper bound for π(x).

When we split the integers ≤ x in couples, each couple contains at most oneprime; this simple remark gives the upper bound π(x) ≤ bx/2c which with a moreanalytical language we write as

π(x) ≤ x

2+O(1).

This idea can be pushed far away. Let we take another integer, say 6. We splitthe integers up to x into distinct blocks containing six consecutive integers eachone. Besides the first block, in each other block only two primes can appear, atmost, because any term which is not coprime to 6 cannot be a prime when it is

30 1.7. SOME ELEMENTARY RESULTS

not a divisor of 6 self, and there are only ϕ(6) = 2 integers coprime to 6 in everyblock. This proves that π(x) ≤

⌊2x6

⌋+R where R ≤ 6. In other words,

π(x) ≤ 2x

6+O(1).

Repeating this argument with a generic integer N , we get that

(1.18) π(x) ≤ ϕ(N)

Nx+O(N),

where O(N) denotes a quantity which is ≤ N . This innocent bound shows that

lim supx→∞

π(x)

x≤ ϕ(N)

N∀N,

which is an interesting upper bound, since now we have the possibility to act onthe parameter N in order to bound π(x) in a non-trivial way. For instance, let Nbe the product of all prime numbers below L (a new parameter): N =

∏p≤L p.

Thenϕ(N)

N=∏p|N

(1− 1

p

)=∏p≤L

(1− 1

p

).

The inequality 1− y ≤ e−y shows that

(1.19)ϕ(N)

N≤ e−

∑p≤L

1p .

We have already proved that∑

p≤L1p →∞ as L diverges, this implies that

lim infN→∞

ϕ(N)

N= 0

so that from (1.18) we conclude that

(1.20) π(x) = o(x).

With a slightly bigger effort we can improve this result. From (1.18), (1.19)and (1.3) with N =

∏p≤L p we get

π(x)

x≤ ϕ(N)

N+O

(Nx

)≤ e−

∑p≤L

1p +O

(Nx

)≤ O(1)

lnL+O

(LLx

)because N =

∏p≤L p ≤ LL. The bound suggests to select L in such a way that the

two terms have (approximatively) the same size, in other words in such a way thatLL/x ≈ 1/ lnL, i.e., LL lnL ≈ x. The equation LL lnL = x has a complicatedsolution for L = L(x), but it is easy to see that L(x) is asymptotic to lnx

ln lnx . Setting

L = lnxln lnx the bound becomes

π(x)

x≤ O(1)

ln lnx+O(exp(L lnL− lnx))


=O(1)

ln lnx+O

(exp

( lnx

ln lnxln( lnx

ln lnx

)− lnx

))=

O(1)

ln lnx+O

(exp

(− lnx ln ln lnx

ln lnx

))=

O(1)

ln lnx+O

( 1

(ln lnx)ln x

ln ln x

).

This bound proves the following result.

Proposition 1.10 π(x) xln lnx as x→∞.

The previous claim is interesting, but it is still far from the truth. Next resultis still elementary in tools but represents a major improvement on all previousresults because it claims that π(x) x

lnx : it is due to Chebyshev.

Let n be any positive integer, then the binomial(

2nn

)is lower than 4n, because(

2n

n

)≤

2n∑k=0

(2n

k

)= (1 + 1)2n = 4n.

On the other hands, every prime in (n, 2n] divides (2n)! but does not divide n!,hence ∏

n<p≤2n

p ≤(

2n

n

)≤ 4n

so that

(1.21)∑

n<p≤2n

ln p ≤ n ln 4.

Now, for every x ≥ 2 let k be such that 2k−1 < x ≤ 2k. Then from the previousbound for every interval (2`−1, 2`] we get

(1.22)∑p≤x

ln p ≤k∑`=1

∑2`−1<p≤2`

ln p ≤ ln 4k∑`=1

2`−1 ≤ 2k ln 4 ≤ x ln(16).

We also have∑n≤x

Λ(n) =∑pk≤x

ln p =∑p≤x

ln p+∑p2≤x

ln p+∑k≥3

∑pk≤x

ln p

=∑p≤x

ln p+O(√x)

+O(

lnx∑k≥3

π(x1/k))

=∑p≤x

ln p+O(√x) +O(x1/3 ln2 x)

=∑p≤x

ln p+O(√x)(1.23)


where for the intermediate step we have used (1.22) with x →√x, and for the

last result we have noticed that π(x) is zero when x < 2, so that in the sum only

lnx terms appear, each one bounded by π(x1/k) ≤ x1/3. From (1.22) and (1.23)we deduce the following upper bound.

Proposition 1.11∑

n≤x Λ(n) x as x→∞.

From (1.21) alone we can also deduce the following upper bound, which im-proves the conclusion of Proposition 1.10.

Proposition 1.12 π(x) ≤ (4 + o(1)) xlnx as x→∞.

Proof. From (1.21) we have

(π(2`)− π(2`−1)) ln(2`−1) ≤∑

2`−1<p≤2`

ln p ≤ 2` ln 2

for every integer `. In other words, we have

π(2`)− π(2`−1) ≤ 2`

`− 1, ∀`.

Adding these inequalities for ` = 2, . . . , k and then shifting the index `→ `− 1 weget

π(2k)− π(2) ≤ 2k−1∑`=1

2`

`, ∀k.

Now we can compare the sum and the integral (function 2`

` is monotonous incre-asing in [2,+∞)), and π(2) = 1, thus we get

π(2k) ≤ 5 + 2

∫ k

2

2`

`d` ∀k.

Since∫ k

22`

` d` ∼ 2k

ln(2k)as k →∞, we obtain

π(2k) ≤ (2 + o(1))2k

ln(2k)k →∞.

Let x ≥ 3, and let k such that 2k−1 < x ≤ 2k. Then

π(x) ≤ π(2k) ≤ (2 + o(1))2k

ln(2k)≤ (2 + o(1))

2x

ln(2x)≤ (4 + o(1))

x

lnx,

which is the claim.

The previous proposition claims that π(x) ≤ c xlnx holds for any c > 4 when

x is large enough. With some care in some steps we can improve the result toc = ln 4 = 1.386 . . ., for every x. With some extra tricks Chebyshev succeeded to


further refine it up to prove that 65 ln(21/231/351/5/301/30) = 1.105 . . . is a possible

value for c.Chebyshev was also able to apply his method to deduce a lower bound for π(x),namely that there exists a constant c′ > 0 such that π(x) ≥ c′ xlnx for every x, and

that when x is large enough one can take c′ = ln(21/231/351/5/301/30) = 0.921 . . ..Following a very nice argument of Nair5 we prove an only slightly weaker bound.

Proposition 1.13 π(n) ≥ ln 2 nlnn for every integer n ≥ 7; hence π(x) ≥ (ln 2 +

o(1)) xlnx as x→∞.

Proof. Let dn denote the least common multiple of 1, 2, . . . , n and for every pairof integers m,n with 1 ≤ m ≤ n let

I(m,n) :=

∫ 1

0xm−1(1− x)n−m dx.

Using the binomial formula we have

I(m,n) =

∫ 1

0

n−m∑k=0

(n−mk

)(−1)kxk+m−1 dx =

n−m∑k=0

(n−mk

)(−1)k

k +m,

proving that dnI(m,n) is an integer. On the other hand, with an integration byparts one gets that

I(m,n) = −xm−1(1− x)n−m+1

n−m+ 1

∣∣∣10

+m− 1

n−m+ 1

∫ 1

0xm−2(1− x)n−m+1 dx

=m− 1

n−m+ 1I(m− 1, n) ∀m ≥ 2,

and since it is evident that I(1, n) = 1n , by induction (in m) one proves that

I(m,n) = [m(nm

)]−1 for every m and n. Thus we have proved that m

(nm

)| dn so

that, in particular,

dn ≥ cn := dn/2e(

n

dn/2e

)∀n.

A direct computation proves that cn+2/cn ≥ 4 for every n6, and since c7 = 140 > 27

and c8 = 280 > 28, we deduce that cn ≥ 2n holds for every n ≥ 7. On the other

5see M. Nair: On Chebyshev-type inequalities for primes, Amer. Math. Monthly 89(2), 126–129,1982.

6In order to check the inequality it is convenient to split the cases according to the parity of n.When n = 2k (hence k ≥ 1) one gets

c2k+2

c2k=

(k + 1)(2k+2k+1

)k(2kk

) =(k + 1)(2k + 2)!k!2

k(k + 1)!2(2k)!=

2(2k + 1)

k> 4,


hand, dn ≤ nπ(n) because there are π(n) distinct primes dividing dn and whenpa‖dn then pa ≤ n. Thus

π(n) lnn ≥ ln dn ≥ ln cn ≥ n ln 2 ∀n ≥ 7,

which is the claim.

We can now prove Mertens’ result. We need the elementary equality

N ! =∏p

pνp , νp =∞∑k=1

⌊N

pk

⌋.

Note that in this product only finitely many factors are different from 1 (in fact⌊N/pk

⌋= 0 when pk > N), so that there is no need to discuss its convergence.

This equality comes from the remark that there are bN/pc integers among 1, . . . , Nwhich are divisible by p, and this is the first contribution to νp. Among these ones,there are

⌊N/p2

⌋which are divisible by p2, so that also

⌊N/p2

⌋must be added to

produce νp, and so on.In terms of the von Mangoldt Λ-function we deduce that

ln(N !) =∑p

νp ln p =∑p

∞∑k=1

⌊N

pk

⌋ln p =

∑n

⌊N

n

⌋Λ(n).

The sum is naturally restricted to integers n ≤ N . We approximate bN/nc byN/n, with an error which is bounded by 1 (it is the fractional part of N/n), sothat

ln(N !) = N∑n≤N

Λ(n)

n+O

( ∑n≤N

Λ(n))

so that by Proposition 1.11 we get

ln(N !) = N∑n≤N

Λ(n)

n+O(N).

At last, the Stirling formula (1.10) gives ln(N !) = N lnN +O(N), therefore∑n≤N

Λ(n)

n= lnN +O(1),

and when n = 2k − 1 (hence k ≥ 1, again) one gets

c2k+1

c2k−1=

(k + 1)(2k+1k+1

)k(2k−1k

) =(k + 1)(2k + 1)!k!(k − 1)!

k(k + 1)!k!(2k − 1)!=

2(2k + 1)

k> 4.


which is an interesting formula giving the exact asymptotic of a sum intimatelyconnected to the primes numbers (actually we have obtained also an explicit boundfor the remainder term). We can further modify this equality. In fact,∑

p

∑k≥2

ln p

pk=∑p

ln p∑k≥2

1

pk=∑p

ln p

p2

1

1− 1/p=∑p

ln p

p(p− 1)< +∞

therefore ∑n≤N

Λ(n)

n=∑p≤N

ln p

p+∑pk≤Nk≥2

ln p

pk=∑p≤N

ln p

p+O(1)

so that the previous result gives

(1.24)∑p≤N

ln p

p= lnN +O(1).

From this asymptotic we can extract the asymptotic of∑

p≤N 1/p, which is thecelebrated result of Mertens.

Proposition 1.14 There exists a constant c ∈ R such that∑

p≤N1p = ln lnN + c+

O(

1lnN

). Moreover, there exists another constant c′ ∈ R such that

∏p≤N

(1− 1

p

)=

ec′

lnN

(1 +O

(1

lnN

)).

A different and more complicated argument shows that c′ = −γ, the (oppositeof the) Euler–Mascheroni constant (see [HW], Theorem 429, or [MV] Theorem 2.7).

Proof. We write∑

p≤N1p as

∑n≤N δP(n) lnn

n ·1

lnn and we use the partial summa-

tion formula (Proposition 1.4) with f(n) = δP(n) lnnn and g(n) = 1/ lnn. Then∑

p≤N

1

p=∑n≤N

f(n)g(n) = F (N)g(N)−∫ N

2F (x)g′(x) dx =

F (N)

lnN+

∫ N

2

F (x)

x ln2 xdx.

By (1.24) we know that F (N) = lnN +R(N) with R(N) = O(1), so that∑p≤N

1

p=

lnN +R(N)

lnN+

∫ N

2

lnx+R(x)

x ln2 xdx

= 1 +O( 1

lnN

)+

∫ N

2

dx

x lnx+

∫ N

2

R(x)

x ln2 xdx

= ln lnN + 1− ln ln 2 +O( 1

lnN

)+

∫ N

2

R(x)

x ln2 xdx.


The last integral converges absolutely to +∞, since R(x) = O(1), thus

= ln lnN + 1− ln ln 2 +

∫ +∞

2

R(x)

x ln2 xdx+O

( 1

lnN

)−∫ +∞

N

R(x)

x ln2 xdx

which is the claim with c := 1− ln ln 2 +∫ +∞

2R(x) dx

x ln2 x, since

∫ +∞N

R(x) dx

x ln2 x 1

lnN .The second claim comes from the identity∑p≤N

ln(

1− 1

p

)=−

∑p≤N

1

p+∑p≤N

[ln(

1− 1

p

)+

1

p

].

In fact the second sum converges, since ln(1− x) + x = O(x2), so that

=−∑p≤N

1

p+∑p

[ln(

1− 1

p

)+

1

p

]+∑p>N

[ln(

1− 1

p

)+

1

p

]=−

∑p≤N

1

p+∑p

[ln(

1− 1

p

)+

1

p

]+O

(∑p>N

1

p2

)=−

∑p≤N

1

p+∑p

[ln(

1− 1

p

)+

1

p

]+O

( 1

N

),

and now the conclusion comes from the first claim, with c′ := c+∑

p

[ln(1− 1

p

)+

1p

].

Mertens spent a great effort in improving his result. In its stronger formulationhe proved that ∣∣∣ ∑

p≤N

1

p− ln lnN − c

∣∣∣ ≤ d

N,

for a couple of explicit constants c and d.

The second ‘preparatory’ result is the following statement, due to Dirichlet, givingthe mean value of the divisor function.

Proposition 1.15 Recall that d(n) := σ0(n) =∑

d|n 1. Then∑n≤N

d(n) = N lnN + (2γ − 1)N +O(√N) as N →∞.

By Stirling’s Formula (1.10), this result can be written also as

∆(N)√N, where ∆(N) :=

∑n≤N

(lnn− d(n) + 2γ).


This result shows that d(n) is lnn+ 2γ ‘in mean’, i.e., that a ‘typical’ integern has lnn + 2γ divisors: this claim acquires interest when compared to the factthat lnn/ ln p is exactly the number of divisors of n when n is a power of p, and isthe first manifestation of a general phenomenon (that can be exactly proved butthat is out of reach of this course): the ‘typical’ integer is a product of many smallprimes.

Proof. The result follows quite easily from a very clever idea, called hyperbolamethod, devised by Dirichlet. We note that∑

n≤Nd(n) =

∑n≤N

∑d|n

1 =∑m,n

mn≤N

1,

so that the sum counts the number of points in the lattice Z× Z which are in thefirst quadrant and below the hyperbola XY = N . The symmetry of the problemshows that this number is 2 times the points whose abscissa is ≤

√N diminished

by the points which are contained in the square of length√N , with an error of

order√N due to the points which could sit on the border of this square and that

are counted twice.

Y

XY = N

√N

√N

X

Figure 1.2.

In other words, we have∑n≤N

d(n) = 2∑n≤√N

⌊N

n

⌋−N +O(

√N).

Approximating⌊Nn

⌋with N

n we introduce an error term of order 1 in the summand

and hence of order√N to the sum, getting


∑n≤N

d(n) = 2N∑n≤√N

1

n−N +O(

√N),

and the claim follows by the formula (1.6).

The exact size of ∆(N) is not known and its determination is called the divisors

problem. Hardy and Landau proved in 1916 that ∆(N) = Ω(N1/4), and it is

believed that N1/4 is the ‘right’ order of ∆(N). The best bound known is due

to Huxley [2003]: ∆(N) is N131/416(lnN)2.26. This result is the culminatingpoint of a very complicated strategy with fundamental contributes from Voronoi,Littlewood, Chen, Vinogradov and Iwaniec (just to cite few of them).

Exercise. 1.29 Let f(z) :=∑∞

k=1zk

1−zk .

1) Prove that this function is well defined in z ∈ C : |z| < 1 and that the circle|z| = 1 is its natural boundary (i.e., f cannot be analytically continued in anyopen set containing the disk |z| ≤ 1).

2) Prove that f(z) =∑∞

n=1 d(n)zn for |z| < 1.

3) Prove that f(z) ∼ − ln(1−z)1−z for z → 1−, z ∈ R.

Hint: For the last step use the result in Proposition 1.15 and the result in [Titch],Sec. 7.5 p. 224.

Exercise. 1.30 Let τ ∈ C, Re(τ) > 1. Prove that∑n≤N

στ (n) =ζ(τ + 1)

τ + 1N τ+1 +O(NRe(τ)) as N →∞.

What happens if Re(τ) ∈ (0, 1)?Hint: Use the identity

∑n≤N στ (n) =

∑m,n : mn≤N n

τ =∑

m≤N∑

n≤N/m nτ and

Proposition 1.6 to deal with the inner sum.

Exercise. 1.31 Let g : N → C be the arithmetical function giving the Dirichletcoefficients of the Dirichlet series 1/ζ(2s).

1) Prove that g(k2) = µ(k) for every integer k, and that g(n) = 0 for everynon-square n.

2) Recalling Ex. 1.21, prove that |µ(n)| =∑

d|n g(d). Deduce that∑n≤x|µ(n)| =

∑n,mnm≤x

g(m) =∑n,k

nk2≤x

µ(k) =∑k≤√x

µ(k)∑

n≤x/k21 =

∑k≤√x

µ(k)⌊ xk2

⌋.


3) Use the previous equality to deduce that

]n ∈ N : n ≤ x, n is squarefree =∑n≤x|µ(n)| = x

ζ(2)+O(

√x).

Roughly speaking, this result shows that a randomly chosen integer lower than xis squarefree with a probability which is approximatively 61%, if x is large enough.

1.8. The Prime Number Theorem

Propositions 1.12 and 1.13 together show that

π(x) x

lnx;

the Prime Number Theorem claims that they are asymptotically equal, i.e. that

(1.25) π(x) ∼ x

lnxas x→∞.

It was conjectured by Legendre and independently by Gauss7. We will prove it inthe following stronger form.

Theorem 1.4 (Prime Number Theorem) Let li(x) :=∫ x

2du

lnu . For every constantA > 0,

π(x) = li(x) +OA

( x

lnA x

)as x→∞.

As asymptotic formula, this result was proved by Hadamard and de la Vallee-Poussin (independently) in 1896, following the ideas proposed by Riemann in hiscelebrated (and unique) paper in number theory published in 1851. A proof whichdoes not involve complex analysis was found in 1949 by Erdos and Selberg (it wasan ‘unintentional collaboration’8). We will follow a more recent approach, devisedby Bombieri and Wirsing, which has the merit to produce that estimation for theremainder term with a level of difficulty substantially equivalent to the one ofthe previous asymptotic proofs. The best estimation known for the remainder is x exp(−c(lnx)3/5/(ln lnx)1/5) for some c > 0, which is due to Vinogradov, butits proof is too difficult for a primer course in Analytic Number Theory.

7with only an intuitive meaning for the symbol ∼, since the notion of asymptotic equality wasnot yet formalized at that time. Actually, Legendre’s claim was that

π(x) ∼ x

lnx+ c

with an explicit c = −1.08366, while Gauss’ claim was that

π(x) ∼∫ x

2

du

lnu.

When ∼ is used with the modern meaning, both the claims become equivalent to (1.25).8To fully understand this claim see for example Goldfeld’s paper freely available at the webaddress: www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf

www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf

40 1.8. THE PRIME NUMBER THEOREM

Remark. 1.3 We notice that li(x) = xlnx −

2ln 2 +

∫ x2

duln2 u

with∫ x

2du

ln2 u∼ x

ln2 x. As

a consequence the theorem both proves that

|π(x)− li(x)| Ax

lnA xfor every A, and that ∣∣∣π(x)− x

lnx

∣∣∣ ∼ x

ln2 x,

while the bound as conjectured by Legendre would imply that∣∣∣π(x)− x

lnx

∣∣∣ ≥ (c+ o(1))x

ln2 x≥ (1.08366 + o(1))

x

ln2 x.

This shows that Gauss’ conjecture is closer to the truth than Legendre’s one.

Aim of this section is a proof of this theorem. Firstly we reformulate the problem.

Proposition 1.16 (Chebyshev) The following claims are equivalent:

1A. π(x) ∼ xlnx ,

2A. ϑ(x) ∼ x, where ϑ(x) :=∑

p≤x ln p,

3A. ψ(x) ∼ x, where ψ(x) :=∑

n≤x Λ(n).

Also the following claims are equivalent:

1B. π(x)− li(x) = OA(

xlnA x

), for every A > 1,

2B. ϑ(x)− x = OA(

xlnA x

), for every A > 1,

3B. ψ(x)− x = OA(

xlnA x

), for every A > 1.

Proof.Equality (1.23) says that ψ(x) = ϑ(x) + O(

√x), so that equivalences 2A ⇐⇒

3A and 2B ⇐⇒ 3B immediately follow.

1A =⇒ 2A. The definition of ϑ and the monotonicity of ln produce the doublebound

(1.26) (π(x)− π(x/ lnx)) ln(x/ lnx) ≤∑

xln x

<p≤x

ln p ≤ ϑ(x) ≤ π(x) lnx.

Assuming 1A we get that

(π(x)−π(x/ lnx)) ln(x/ lnx) =(

(1+o(1))x

lnx−(1+o(1))

x

ln2 x

)(lnx−ln lnx) ∼ x

and π(x) lnx ∼ x, therefore 2A follows.

2A =⇒ 1A. From (1.26) we have

ϑ(x)

lnx≤ π(x) ≤ ϑ(x)

lnx− ln lnx+ π

( x

lnx

).


Using the result in Proposition 1.12 (but also the weaker Proposition 1.10 or theeven weaker bound (1.20) suffice) we get

ϑ(x)

lnx≤ π(x) ≤ ϑ(x)

lnx− ln lnx+O

( x

ln2 x

)so that it is now clear that 2A implies 1A.

Now the proof of the other set of relations.1B =⇒ 2B. The partial summation formula and the decomposition ϑ(x) =∑

p≤x ln p =∑

n≤x δP(n) · lnn give

ϑ(x) = π(x) lnx−∫ x

2

π(u)

udu.

Moreover, ∫ x

2

li(u)

udu = ln(x) li(x)− x+ 2,

(because in our definition, li(2) = 0) therefore

ϑ(x)− x = (π(x)− li(x)) ln(x)−∫ x

2(π(u)− li(u))

du

u+O(1),

which shows immediately that 1B implies 2B.

2B =⇒ 1B. The partial summation formula and the decomposition π(x) =∑p≤x 1 =

∑n≤x δP(n) lnn · 1

lnn give

π(x) =ϑ(x)

lnx+

∫ x

2

ϑ(u)

u

du

ln2 u.

Moreover,

li(x) =x

lnx+

∫ x

2

du

ln2 u− 2

ln 2,

so that

π(x)− li(x) =ϑ(x)− x

lnx+

∫ x

2

(ϑ(u)

u− 1) du

ln2 u+O(1),

which shows immediately that 2B implies 1B.

The following statement is similar to the previous ones, but has a more com-plicated proof.

Proposition 1.17 Let M(x) :=∑

n≤x µ(n). Then:

1. ψ(x) ∼ x if and only if M(x) = o(x);

2. ψ(x)− x = OA(

xlnA x

)if and only if M(x)A

xlnA x

, for every A > 1.


Proof. We prove only the implications⇐=, for the opposite implications see [IK],p. 33. We have proved that Λ = µ ∗ ln, 1 = µ ∗ d and δ = µ ∗ 1. Therefore,

Λ(n)− 1 + 2γδ(n) =∑k|n

µ(k)(ln(n/k)− d(n/k) + 2γ).

Adding this equality for n ≤ x we have

ψ(x)− x+O(1) =∑n≤x

∑k|n

µ(k)(ln(n/k)− d(n/k) + 2γ)

=∑m,nmn≤x

µ(m)(ln(n)− d(n) + 2γ),

where the Big-O term appears to take account of the 2γ constant and the fact thatx can be non-zero. We can write this sum in two different ways, namely:

as∑n≤x

(ln(n)− d(n) + 2γ)∑

m≤x/n

µ(m),

and as∑m≤x

µ(m)∑

n≤x/m

(ln(n)− d(n) + 2γ).

Actually, both of them are not useful for us per se, however we introduce a newparameter K and we split the original sum according to the first formula whenn ≤ K, and to the second formula when K < n ≤ x. In this way we get:

ψ(x)− x+O(1) =∑m,nmn≤x

µ(m)(ln(n)− d(n) + 2γ)

=∑m,nmn≤xn≤K

µ(m)(ln(n)− d(n) + 2γ) +∑m,nmn≤xn>K

µ(m)(ln(n)− d(n) + 2γ)

=∑n≤K

(ln(n)− d(n) + 2γ)∑

m≤x/n

µ(m)

+∑

m≤x/K

µ(m)∑

K<n≤x/m

(ln(n)− d(n) + 2γ)

hence

ψ(x)− x+O(1) =∑n≤K

(ln(n)− d(n) + 2γ)M(x/n)

+∑

m≤x/K

µ(m)(

∆(x/m)−∆(K)).


In absolute value this means that

|ψ(x)− x| ≤ O(1) +∑n≤K| ln(n)− d(n) + 2γ| · |M(x/n)|

+∑

m≤x/K

(|∆(x/m)|+ |∆(K)|).

Recalling the result in Proposition 1.15 we get

|ψ(x)− x| ≤ O(1) +∑n≤K| ln(n)− d(n) + 2γ| · |M(x/n)|

+O( ∑m≤x/K

(√x/m+

√K)),

and recalling the result in Ex. 1.12 we conclude that there exists a constant c > 0independent of K such that

(1.27) |ψ(x)− x| ≤ O(1) +∑n≤K| ln(n)− d(n) + 2γ| · |M(x/n)|+ cx/

√K.

Suppose that M(x) = o(x). Let ε > 0 be arbitrarily fixed. We first choose K such

that c/√K < ε, then we choose xε large enough to have∑

n≤K| ln(n)− d(n) + 2γ| · |M(x/n)| ≤ εx ∀x ≥ xε :

such an xε exists, because M(x) = o(x) and K is fixed. As a consequence

|ψ(x)− x| ≤ 2εx+O(1) ∀x > xε,

and this means that ψ(x) = x+ o(x), i.e. ψ(x) ∼ x.

Now, suppose M(x)Ax

lnA x. Fix a (arbitrary) value for A, and take K = lnA/2 x

in (1.27). Then

|ψ(x)− x| ≤∑

n≤lnA/2 x

| ln(n)− d(n) + 2γ| · |M(x/n)|+O(x/ lnA/4 x)

≤ OA( ∑n≤lnA/2 x

(ln(n) + d(n) + 2γ)x/n

(lnx− lnn)A

)+O(x/ lnA/4 x)

= OA

( x

(lnx−A ln lnx)A

∑n≤lnA/2 x

(ln(n) + d(n) + 2γ))

+O(x/ lnA/4 x).


Recalling that∑

n≤N lnn ∼ N lnN (Stirling) and that∑

n≤N d(n) ∼ N lnN

(Dirichlet, see Prop. 1.15), we have

= OA

( x

(lnx−A ln lnx)AlnA/2 x ln lnx

)+O(x/ lnA/4 x)

= OA(x/ lnA/4 x),

which is the claim.

Exercise. 1.32 Prove that if ` ∈ N and Re(s) > 1, then∫ +∞

1

(lnx)`

xsdx =

`!

(s− 1)`+1.

Hint: Prove the equality for s ∈ R, s > 1 with the substitution x = eu/(s−1).Then observe that both RHS and LHS are holomorphic functions in Re(s) > 1and apply the identity principle for holomorphic functions to conclude that theequality holds in the entire half-plan Re(s) > 1.

Alternatively, apply the substitution x = eu/(s−1) to deduce that∫ +∞

1

(lnx)`

xsdx =

1

(s− 1)`+1

∫Γuè−u du

where Γ is a ray from 0 to ∞ contained in the right side of C (here the hypothesisRe(s) > 1 is used), then prove that Γ can be deformed to the positive real line and

recall that∫ +∞

0 uè−u du = `!.Alternatively (again!) prove the claim for ` = 0 directly, and deduce the claimfor general ` taking the `th derivative of claim for ` = 0 (but you need a goodargument allowing you to pass the derivative inside the integral).

Exercise. 1.33 Prove that if ` ∈ N and N →∞, then∫ +∞

N

(lnx)`

x2dx`

(lnN)`

N.

Hint: The claim is true when ` = 0. Then apply an integration by parts to get∫ +∞

N

(lnx)`

x2dx =

(lnN)`

N+ `

∫ +∞

N

(lnx)`−1

x2dx

and use the inductive hypothesis.

Exercise. 1.34 (Perron) Prove that for every σ > 0 and for every y > 0,

1

2πi

∫ σ+i∞

σ−i∞

ys

s2ds = ln+ y :=

ln y if y ≥ 1,

0 if y ∈ (0, 1].

Hint: Let T > 0. If y < 1 take the integral along the path σ − iT → T − iT →T + iT → σ + iT → σ − iT (this means that the original integration path is


deformed to a path extending itself to the right half-plain: this is useful becauseys is small here). The integral is zero by Cauchy, then put T → +∞. If y > 1,take the integral along the path σ− iT → −T − iT → −T + iT → σ+ iT → σ− iT .(this means that the original integration path is deformed to a path extendingitself to the left half-plain: this is useful because ys is small here). The integral isequal to ln y which is the residue at 0, then put T → +∞.

Exercise. 1.35 (Perron) Prove that for every σ > 0, for every y > 0 and for everyinteger κ ∈ N,

κ!

2πi

∫ σ+i∞

σ−i∞

ys

sκ+1ds = (ln+ y)κ.

The formula is correct also for κ = 0, but in this case the integral is not absolutely

convergent and must be defined as limT→∞∫ σ+iTσ−iT (and in that case produces the

value 0 for y < 1, 1/2 when y = 1 and 1 for y > 1). What happens if κ is taken inC? (difficult).

Exercise. 1.36 Let f be a function admitting derivatives of every order. Then

f( 1

f

)(`)= `!

∑a1,a2,a3,...≥0

a1+2a2+3a3+···=`

(a1 + a2 + a3 + · · · )!a1!a2!a3! · · ·

(−f ′1!f

)a1(−f ′′2!f

)a2(−f ′′′3!f

)a3· · ·

Hint: The claim is evident for ` = 0, then derive the equality for ` obtaining tothe LHS

f( 1

f

)(`+1)+ f

f ′

f

( 1

f

)(`).

Now move the second term to the RHS and applying the inductive hypothesis theclaim for ` + 1 follows (after a very tedious computation). Alternatively, this isthe case g(x) = 1/x of Faa di Bruno’s formula for the `th derivative of g(f(x)).9

A few words are necessary in order to understand the main points of theproof of the Prime Number Theorem. We will prove it by proving a bound forM(x) =

∑n≤x µ(n), and using Proposition 1.17 to deduce an analogous bound for

ψ(x) − x and Proposition 1.16 to get the proposition for π(x) − li(x). The sumM(x) can be represented as a complex integral (via a celebrated formula due toPerron) involving the Dirichlet series associated with the Mobius function. Thisfunction is 1/ζ(s), thus in order to get an upper-bound for the M(x) we need anupper-bound for 1/ζ(s), i.e. a lower-bound for ζ(s). This means that we needsome kind of control on the positions of the zeros of ζ(s) (besides the control onthe singularities of ζ(s)). The result in Proposition 1.18 here below is the key

9For a nice presentation of this result see the paper of Warren P. Johnson: The Curious Historyof Faa di Bruno’s Formula, Amer. Math. Monthly 109(3), 217–234, 2002.


ingredient allowing to deduce a lower-bound for ζ(s) from an analogous upper-bound for ζ(s): although given with a different language, it was already present inthe original works of Hadamard and de la Vallee-Poussin. The argument will beconcluded with a standard application of the Euler-Maclaurin formula giving thenecessary upper-bound for ζ(s) (see Proposition 1.19). The good bound for theremainder term will come out applying this approach to the generic derivative ofζ(s) and not to ζ(s) alone.

Proposition 1.18 Let σ > 1 and t ∈ R, then

ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2 ≥ 1.

Proof. In the region Re(s) > 1 the Riemann zeta function is represented by itsEuler product and has no zeros, therefore ln ζ(s) is well defined here and can becomputed as series (over p) of the logs of every prime-factor, thus

ln(ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2)

= ln(ζ(σ)3ζ2(σ + it)ζ2(σ + it)ζ(σ + 2it)ζ(σ + 2it))

= ln(ζ(σ)3ζ2(σ + it)ζ2(σ − it)ζ(σ + 2it)ζ(σ − 2it))

=−∑p

[3 ln

(1− 1

pσ

)+2 ln

(1−p

−it

pσ

)+2 ln

(1−p

it

pσ

)+ln

(1−p

−2it

pσ

)+ln

(1−p

2it

pσ

)]=∑p

∞∑m=1

1

mpmσ(3 + 2p−imt + 2pimt + p−2imt + p2imt)

=∑p

∞∑m=1

1

mpmσ(1 + pimt + p−imt)2 ≥ 0.

Corollary 1.5 ζ(1 + it) 6= 0 for every t ∈ R.

Proof. By absurd, suppose that ζ has a zero along the vertical line σ = 1, sothat there exists t0 such that ζ(1 + it0) = 0. t0 6= 0, because ζ has a simplepole at s = 1. ζ is holomorphic in an open neighborhood of 1 + it0, thereforeζ(s) ∼ c(s− 1− it0)n for some constant c ∈ C∗ and some integer n ≥ 1. Moreover,ζ(s) is regular in a neighborhood of 1 + 2it0 hence from the inequality proved inthe previous proposition we deduce that

1 ≤ ζ(σ)3|ζ(σ + it0)|4|ζ(σ + 2it0)|2 (σ − 1)−3(σ − 1)4n as σ → 1+,

which is impossible because 4n− 3 > 0.

Remark. 1.4 There are other ways to prove the previous corollary. One of them,actually a particularly elegant one, is due to Ingham and is worthy of mentionhere. Suppose by absurd that ζ(1 + it0) = 0. Evidently t0 6= 0 and by the Schwarz


reflection principle it follows that ζ(1 − it0) is 0, as well. The function F (s) :=ζ(2s)

∑∞n=1 |σit0(n)|2n−s is a Dirichlet series with positive Dirichlet coefficients.

The Ramanujan identity in Ex. 1.19 shows that this function equals ζ2(s)ζ(s −it0)ζ(s + it0), so that it is holomorphic in C (the double pole of ζ2(s) at s = 1 iscancelled out by the two zeros ζ(1+it0) and ζ(1−it0)). By the Landau Theorem 1.2the Dirichlet series representing F (s) converges at every complex point, but thisis impossible since the representation F (s) := ζ(2s)

∑∞n=1 |σit0(n)|2n−s shows that

the Dirichlet coefficients f(n) in F (s) corresponding to squares are greater than1, so that the series cannot converge when Re(s) ≤ 1/2.

Proposition 1.19 Let Re(s) > 1, then

(−1)`ζ(`)(s) =`!

(s− 1)`+1+O`

((ln(3|s|))`+1

)where the symbol O` means that the implicit constant depends on `, so that theresult is not uniform in this parameter.

Proof. In the region Re(s) > 1 we have

(−1)`ζ(`)(s) =

∞∑n=1

(lnn)`

ns.

We introduce a parameter N whose value will be chosen later, and we split theseries as

∑n<N and

∑n≥N , since we bound in different ways these two terms:

(−1)`ζ(`)(s) =∑n<N

(lnn)`

ns+∑n≥N

(lnn)`

ns.

In the first sum we see that lnn ≤ lnN , and Re(s) > 1, therefore∣∣∣ ∑n<N

(lnn)`

ns

∣∣∣ ≤ ∑n≤N

(lnn)`

nσ≤ (lnN)`

∑n≤N

1

n (lnN)`+1.

For the second sum we use the Euler-Maclaurin formula, giving∑n≥N

(lnn)`

ns=

∫ ∞N

(lnx)`

xsdx+

1

2

(lnN)`

N s− s

∫ ∞N

(lnx)`

xsB1(x)

xdx

+ `

∫ ∞N

(lnx)`−1

xsB1(x)

xdx,

where the last term appears only if ` ≥ 1. Recalling the bound we have proved inEx. 1.33 we have

=

∫ ∞N

(lnx)`

xsdx+O

((lnN)`

N

)+O`

(|s|(lnN)`

N

)+O`

((lnN)`−1

N

),


(here we have used the assumption Re(s) > 1 and the bound |x−s−1| ≤ x−2).Recalling Ex. 1.32 we can write this bound also as

=

∫ ∞1

(lnx)`

xsdx−

∫ N

1

(lnx)`

xsdx+O`

(|s|(lnN)`

N

)=

`!

(s− 1)`+1+O

((lnN)`+1

)+O`

(|s|(lnN)`

N

).

Collecting the results, we see that

(−1)`ζ(`)(s) =`!

(s− 1)`+1+O

((lnN)`+1

)+O`

(|s|(lnN)`

N

),

when Re(s) > 1, for every choice of the parameter N ≥ 1. We get the claim bysetting N = 3|s|.

Finally we can attack the proof of the bound

M(x)Ax

lnA x∀A > 1.

Suppose s ∈ C, with Re(s) > 1 and ` ∈ N. Let

G(s) := (−1)`(1/ζ)(`)(s) =∞∑n=1

µ(n)(lnn)`/ns.

Perron’s formula in Ex. 1.34 gives10

1

2πi

∫ σ+i∞

σ−i∞G(s)Xs ds

s2=

1

2πi

∫ σ+i∞

σ−i∞

∞∑n=1

µ(n)(lnn)`(Xn

)s ds

s2

=∞∑n=1

µ(n)(lnn)`1

2πi

∫ σ+i∞

σ−i∞

(Xn

)s ds

s2

10 Recall that

1

2πi

∫ σ+i∞

σ−i∞

∞∑n=1

µ(n)(lnn)`(Xn

)s ds

s2=

1

2π

∫R

∞∑n=1

µ(n)(lnn)`(Xn

)σ+it dt

(σ + it)2

so that

1

2π

∫R

∞∑n=1

∣∣∣µ(n)(lnn)`(Xn

)σ+it 1

(σ + it)2

∣∣∣dt ≤ 1

2π

∫R

∞∑n=1

(lnn)`(Xn

)σ dt

σ2 + t2

= Xσ ·∞∑n=1

(lnn)`

nσ· 1

2π

∫R

dt

σ2 + t2< +∞,

because σ > 1 by hypothesis. By Fubini’s Theorem, the convergence of this integral allows oneto exchange the integral and the series.


=∑n≤X

µ(n)(lnn)` ln(X/n) =: F (X).

We use this equality to deduce a bound for F (X) from a bound for G(s). We needsome intermediate bounds. Proposition 1.19 gives

ζ(s) =1

s− 1+O(ln(3|s|)) Re(s) > 1.

From Proposition 1.18 we have for |t| > 1 and Re(s) ∈ (1, 2)

1 ≤ ζ(σ)3|ζ(σ + it)|4|ζ(σ + 2it)|2 1

(σ − 1)3|ζ(s)|4(ln(3|s|))2

(the bound ζ(σ) (σ−1)−1 holds because Re(s) ∈ (1, 2), and ζ(σ+2it) ln(3|s|)because |t| > 1) so that

(1.28)1

|ζ(s)| (σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |t| > 1.

This bound has been proved under the assumption that |t| > 1, but it is truealso for |t| ≤ 1 (and Re(s) ∈ [1, 2]), since in this set the function 1

|ζ(s)| is bounded

(the pole of ζ(s) at 1 produces a zero for 1/ζ(s), there are no zeros for ζ(s) hereby Corollaries 1.3 and 1.5, so that 1/ζ(s) is continue and hence bounded in thecompact [1, 2]× [−1, 1]). Therefore, we can conclude that

(1.29)1

|ζ(s)| (σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2),

which is the lower-bound for |ζ(s)| that we have mentioned in the introduction tothe proof of the Prime Number Theorem.Let ζ∗(s) := (s − 1)ζ(s). This function is essentially equivalent to the Riemannzeta function, but has a better behavior for s→ 1. Since

(ζ∗)(`)(s) = (s− 1)ζ(`)(s) + `ζ(`−1)(s),

from Proposition 1.19 (applied two times, to ζ(`)(s) and ζ(`−1)(s)) we get that

(1.30) (ζ∗)(`)(s)` |s|(ln(3|s|))`+1 Re(s) ∈ (1, 2),

and from (1.28) that

1

|ζ∗(s)| |s|−1(σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2), |t| > 1.

As before, this bound is proved for |t| > 1, but it holds also in |t| ≤ 1, because1

|ζ∗(s)| is continuous in the compact [1, 2]× [−1, 1]. Therefore, we have also

1

|ζ∗(s)| |s|−1(σ − 1)−3/4(ln(3|s|))1/2, Re(s) ∈ (1, 2),


and using this bound and (1.30) we deduce

(ζ∗)(`)(s)

ζ∗(s)` (σ − 1)−

34 (ln(3|s|))`+

32 Re(s) ∈ (1, 2).

The formula in Ex. 1.36 shows that

|(1/ζ∗)(`)(s)| 1

|ζ∗(s)|∑

a1,a2,a3,...≥0a1+2a2+3a3+···=`

( |(ζ∗)′(s)||ζ∗(s)|

)a1( |(ζ∗)′′(s)||ζ∗(s)|

)a2· · ·

so that

|(1/ζ∗)(`)(s)| (σ − 1)−34

|s|(ln(3|s|))

12

·∑

a1,a2,a3,...≥0a1+2a2+3a3+···=`

(σ − 1)−34

(a1+a2+··· )(ln(3|s|))(1+ 32

)a1+(2+ 32

)a2+···.

We take σ ∈ (1, 2); in this range (σ − 1)−34 > 1 and a1 + a2 + · · · ≤ `, hence

(σ − 1)−34

(a1+a2+··· ) ≤ (σ − 1)−34`. Moreover, (1 + 3

2)a1 + (2 + 32)a2 + · · · ≤ (1 +

32)a1 + 2(1 + 3

2)a2 + · · · = 52`, therefore (ln(3|s|))(1+ 3

2)a1+(2+ 3

2)a2+··· ≤ (ln(3|s|))

52`.

In this way we obtain the upper-bound

(1/ζ∗)(`)(s)` |s|−1(σ − 1)−34

(`+1)(ln(3|s|))52

(`+1) Re(s) ∈ (1, 2).

At last, using the representation 1/ζ(s) = (s− 1)/ζ∗(s) giving( 1

ζ(s)

)(`)= (s− 1)

( 1

ζ∗(s)

)(`)+ `( 1

ζ∗(s)

)(`−1),

and the previous bound, we get

(1.31) G(s) = (−1)`(1/ζ)(`)(s)` (σ − 1)−34

(`+1)(ln(3|s|))52

(`+1) Re(s) ∈ (1, 2).

Plugging this bound into the integral formula for F (X) we have

F (X) =1

2πi

∫ σ+i∞

σ−i∞G(s)Xs ds

s2∫R|G(s)|Xσ dt

σ2 + t2

`

∫R

(σ − 1)−34

(`+1)(ln(3|σ + it|))52

(`+1)Xσ dt

σ2 + t2

` Xσ(σ − 1)−

34

(`+1).

This bound being uniform in σ ∈ (1, 2), for σ = 1 + 1/ lnX we have

(1.32)∑n≤X

µ(n)(lnn)` ln(X/n) = F (X)` X(lnX)34

(`+1),

uniformly in X.


Remark. 1.5 We take a break, now, to discuss the result we have just proved andits importance for our purpose. Consider the quantity

∑n≤X µ(n)(lnn)` ln(X/n).

In this sum there are X terms; lot of them have order lnX, but they have differentsigns as a consequence of the factor µ(n). If we leave out the cancellation comingfrom the different signs, i.e. if we consider the sum∑

n≤X|µ(n)|(lnn)` ln(X/n),

then we have a sum which has order X(lnX)`: the upper bound follows by theEuler-Maclaurin summation formula, while the lower bound follows by noticingthat ∑

n≤X|µ(n)|(lnn)` ln(X/n) ≥

∑X3 <n<

X2

|µ(n)|(lnn)` ln(X/n)

≥ (ln(X/3))` ln 2∑

X3 <n<

X2

|µ(n)| X(lnX)`

(for the last step, recall the result in Ex. 1.31). The bound in (1.32) shows that the

original sum with the restored µ is significatively smaller: X(lnX)34

(`+1) againstX(lnX)`, and the first one is smaller than the second one when ` ≥ 4. This meansthat the approach has been able to prove that the presence of the oscillating termµ(n) produces an effective cancellation in the sum.

We have almost finished; now we extract from this bound an estimation (again!)for the simpler sum

H(X) :=∑n≤X

µ(n)(lnn)`.

We obtain such a bound comparing the values of F computed at X and X + Y ,where Y > 0 is parameter that we will choose later. In fact,

F (X + Y )− F (X) =∑n≤X

µ(n)(lnn)` ln(

1 +Y

X

)+

∑X<n≤X+Y

µ(n)(lnn)` ln((X + Y )/n),

and choosing Y = o(X) we can take advantage of the equality ln(1 + Y/X) =Y/X +O(Y 2/X2), so that the equality becomes

F (X + Y )− F (X) =Y

XH(X) +O

(Y 2

X2

∑n≤X

(lnn)`)


+O( ∑X<n≤X+Y

(lnn)` ln((X + Y )/n))

=Y

XH(X) +O

(Y 2

X(lnX)`

).

For Y = o(X) we have also

F (X)` X(lnX)34

(`+1),

F (X + Y )` (X + Y )(ln(X + Y ))34

(`+1) ` X(lnX)34

(`+1),

so that

Y

XH(X) =O`(X(lnX)

34

(`+1)) +O(Y 2

X(lnX)`

),

i.e.,

H(X)`X2

Y(lnX)

34

(`+1) + Y (lnX)`.

The two terms have the same order when Y = X(lnX)(3−`)/8, giving

H(X)` X(lnX)78`+ 3

8 .

Finally, we can extract the behavior for M(x) from this bound using the partialsummation formula. We have

M(x) = 1 +∑

2≤n≤xµ(n) = 1 +

∑2≤n≤x

µ(n)(lnn)` · 1

(lnn)`

= 1 +H(x)1

(lnx)`+ `

∫ x

2

H(u)

u(lnu)`+1du

` x(lnx)

78`+ 3

8

(lnx)`+ `

∫ x

2

(lnu)78`+ 3

8

(lnu)`+1du

` x(lnx)−`8

+ 38 + `

∫ x

2(lnu)−

`8− 5

8 du` x(lnx)−`8

+ 38

which is the claim, since ` can be taken arbitrarily large.

We have finally reached our purpose: the proof of the Prime Number Theoremwith a convenient estimation for the remainder term. What ever next? We cannotleave the present topic without an explicit mention to the Riemann Hypothesis andto its relevance for the Prime Number Theorem. What is the ‘right’ estimationfor the remainder term in the P.N.Th.? Using the argument in Propositions 1.16and 1.17 it can be proved that

π(x) = li(x) +O(xδ) for some fixed δ ∈ (1/2, 1),


if and only if

ψ(x) = x+O(xδ) for some fixed δ ∈ (1/2, 1),

if and only if

M(x) xδ for some fixed δ ∈ (1/2, 1),

if and only if

1

ζ(s)=∞∑n=1

µ(n)

nsconverges for every s with Re(s) > δ,

if and only if

ζ(s) 6= 0 for every s with Re(s) > δ.

The general theory of Hadamard on Weierstrass products (or the more completeNevanlinna theory about the distribution of values of entire functions) connectingthe growth of an entire function to the number of its zeros implies that ζ(s) hasinfinitely many zeros in the vertical strip Re(s) ∈ [0, 1]. The functional equationconnecting ζ(s) with ζ(1−s) proves that the vertical strip Re(s) ∈ [1/2, 1] containsinfinitely many zeros for ζ(s), therefore the best (smallest) possible value for δ is1/2 + ε ,with ε > 0 and arbitrarily small. The Riemann Hypothesis (RH, in brief)actually claims that the prime number theorem holds with such a choice for δ, orin other words, that ζ(s) 6= 0 if Re(s) > 1/2.

To date, no δ is known: the best (largest) proved zero-free region for ζ(s) issimply the set

σ + it : σ > 1− c

(ln |t|)2/3(ln ln |t|)1/3|t| > 3

for some c > 0, and is due to a joint work of Korobov and Vinogradov.

Remark. 1.6 There are also other possible formulations of RH. The next two arequite striking because in apparently unrelated fields.

Nyman. for every α ∈ (0, 1), let ρα(x) := α/x−α 1/x. Let B be the closurein L2((0, 1)) of the C-span of functions ρα. Then, RH holds iff B = L2((0, 1)).11

Lagarias. Let Hn :=∑

j≤n j−1 (the nth harmonic number). Then, RH holds iff

σ(n) =∑d|n

d ≤ Hn + exp(Hn) ln(Hn) ∀n,

with equality only for n = 1.12

11B. Nyman, On some groups and semigroups of translations Ph. D. Thesis, Uppsala, 1950.12J. C. Lagarias, An elementary problem equivalent to the Riemann hypothesis, Amer. Math.

Monthly 109(6), 534–543 (2002).


In spite of the research that these conditions have triggered, at the present theyhave not been of any concrete utility towards a proof of RH.

Remark. 1.7 Let two bounded arithmetical functions f, g be given and supposethat

1

x

∑n≤x

f(n)g(n)

tends to a constant as x diverges. This limit is a kind of scalar product of f and g.In this setting, it is natural to call pairwise orthogonal every couple of functions forwhich the limit is 0. With this language, the fact that M(x) = o(x) can be statedby saying that the Mobius function and the constant function are orthogonal, andit would be very interesting to determine/classify the set of functions which areorthogonal to µ. In the next chapter we will see that µ is orthogonal to everyDirichlet character; for every couple of coprime integers m and q the functione(mq ·) : n 7→ e(mq n) := e2πimn/q can be written as linear combination of Dirichlet

characters modulo q. This fact shows immediately that µ and e(mq ·) are pairwise

orthogonal. H. Davenport in ’37 was able to prove that this is a particular caseof a more general result saying that µ and every e(α·) : n 7→ e(αn) := e2πiαn arepairwise orthogonal for each α ∈ R.13 In 2010 P. Sarnak proposed the conjecturethat actually every function g having ‘low complexity’ is orthogonal to µ (in atechnical sense connected to the entropy of the dynamic system generated by g).As a sub-conjecture, Sarnak proposed a concrete class of functions that should beorthogonal to µ, and this is exactly what B. Green has proved14.

At last, it is important to realize that RH is not the ‘definitive’ hypothesis,i.e. the key tool which will give an answer to all problems in Analytical NumberTheory. For example, some sharp predictions about the behavior of the gaps ofprime numbers are not consequence of RH in itself but follow from conjecturesabout the distribution of the zeros of ζ(s) along the vertical line Re(s) = 1/2.

We conclude the chapter with some exercises.

Exercise. 1.37 Let pn be the sequence of prime numbers. Prove thatπ(x) ∼ x/ lnx if and only if pn ∼ n lnn.

Exercise. 1.38 The Prime Number Theorem, in the form we have proved it, impliesthat π(x) = x/ lnx+x(1+o(1))/(lnx)2. Deduce that pn = n(lnn+ln lnn−1+o(1)).Remark: A stronger conclusion is proposed as Exercise 5 to Ch. 6.2 in [MV].

13See for example, Th. 13.10 in [IK].14B. Green, On (not) computing the Mobius function using bounded depth circuits, Combin.

Probab. Comput. 21(6), 2012, 942–951.


Exercise. 1.39 Fix w > 0. Prove that the set Sw := pw/qw : p, q ∈ P is dense inR+.Hint: firstly prove the claim for w = 1 using Ex. 1.37, then use the fact that themap x 7→ xw is locally Lipschitz in (0,+∞).

Exercise. 1.40 Let f : R → R be a locally Holderian map (i.e., a map for whichfor every compact set K there exist c > 0 and α > 0 (in general depending on K)such that |f(x) − f(y)| ≤ c|x − y|α holds for every x, y ∈ K). Prove that if S isdense in R then f(S) is dense in f(R). Use this remark to extend the claim inEx. 1.39.

Exercise. 1.41 (Primes in ‘fat’ intervals) Use the Prime Number Theorem in theform π(x) ∼ x/ lnx to prove that in [x, x + h] there are asymptotically h/ lnxprime numbers when h = h(x) x.

Exercise. 1.42 (Prime in ‘not so fat’ intervals) Let B be a fixed positive number.Use the Prime Number Theorem in the form π(x) = li(x)+OA(x/(lnx)A) to provethat for every B > 0, in [x, x+ h] there are asymptotically h/ lnx prime numberswhen h = h(x)B x/(lnx)B.

Exercise. 1.43 (Prime in ‘extra short’ intervals) Prove that under RH in [x, x+h]

there are asymptotically h/ lnx prime numbers whenever h = h(x)ε x1/2+ε.

Exercise. 1.44 Let M(x) :=∑

n≤x µ(n).

1) Prove that for every N ≥ 1, K ≥ 0 and s ∈ C:

N+K∑n=N+1

µ(n)

ns=M(N +K)

(N +K)s− M(N)

N s+ s

∫ N+K

N

M(x)

xs+1dx,

so that for σ ≥ 1∣∣∣ N+K∑n=N+1

µ(n)

ns

∣∣∣ ≤ |M(N +K)|(N +K)

+|M(N)|N

+ |s|∫ N+K

N

|M(x)|x2

dx.

2) Using the upper-bound for M(x), deduce that the series∑∞

n=1 µ(n)/ns con-verges in the closed half-plain σ ≥ 1, uniformly in every bounded subset.

3) Deduce that∑∞

n=1 µ(n)/ns = 1/ζ(s) in σ ≥ 1. In particular∞∑n=1

µ(n)

n1+it=

1

ζ(1 + it), ∀t ∈ R.

4) By Step 3. deduce that∞∑n=1

µ(n)

n= 0.


Exercise. 1.45 Suppose Re(s) > 1. How to compute the value of ζ(s)? The firstidea is by using the representation ζ(s) =

∑∞n=1 1/ns, but its rate of convergence

is not very good.

1) Using the Euler-Maclaurin formula, in fact, prove that∣∣∣ ∞∑n=S

1

ns

∣∣∣ ≤ ∞∑n=S

1

nσ≤(

1 +S

σ − 1

) 1

Sσ,

so that ∣∣∣ζ(s)−S−1∑n=1

1

ns

∣∣∣ ≤ (1 +S

σ − 1

) 1

Sσ

and in order to get the value of ζ(s) with an error below 10−4, for example,

you need to take S > 104/(σ−1).

2) Recall that ζ(s) = (1 − 21−s)−1ξ(s), where ξ(s) :=∑∞

n=1(−1)n+1/ns. LetF (x) :=

∑n≤x(−1)n+1. Using the partial summation formula prove that

∞∑n=S

(−1)n+1

ns= −F (S)

Ss+ s

∫ +∞

S

F (x)

xs+1dx

so that ∣∣∣ ∞∑n=S

(−1)n+1

ns

∣∣∣ ≤ 1 + |s|/σSσ

.

It follows that

(1.33)∣∣∣ζ(s)− (1− 21−s)−1

S−1∑n=1

(−1)n+1

ns

∣∣∣ ≤ (1 + |s|/σ)|1− 21−s|−1

Sσ.

With this formula, to get the value of ζ(s) with an error below 10−4 you need

to take S > 104/σ: the convenience with respect to the previous formula isevident. (For example, in order to compute ζ(1.1 + i) within 10−4 using theformula in Step 1 you need S = 1050, while S = 14000 suffices with the secondformula).

3) The equality ζ(s) = (1 − 21−s)−1ξ(s) holds for every s ∈ C and the repre-sentation ξ(s) =

∑∞n=1(−1)n+1/ns holds for every s with Re(s) > 0. As a

consequence, we can still compute ζ(s) with Re(s) > 0 using that formula. Inparticular, the bound in (1.33) holds for every Re(s) > 0, s 6= 1.

Exercise. 1.46 The equality ζ(s) = (1−21−s)−1ξ(s) and its approximated versionin Step 2 of Ex. 1.45 is not very useful for the computation of ζ(1 + it) for t ∈ R(because the factor 1 − 21−s has zeros when t = 2πk/ ln 2 for any k ∈ Z, while ζ


is regular here. This means that ξ(1 + it) is zero at the same points so that theformula gives ζ(1 + it) here as quotient of two very small quantities, a fact whichintroduces instabilities in numerical computations). An alternative formula is thefollowing.

1) Recall that ζ(s) = 1s−1 + 1

2 − s∫∞

1B1(x)xs+1 dx for Re(s) > 0, and that

N∑n=1

1

ns=

1

s− 1+N1−s

1− s+

1

2

( 1

N s+ 1)− s

∫ N

1

B1(x)xs+1

dx.

Subtracting these identities prove that∣∣∣ζ(s)−N∑n=1

1

ns+N1−s

1− s+

1

2N s

∣∣∣ ≤ |s|2σNσ

Re(s) = σ > 0.

2) Deduce that

|ζ(σ + it)| N∑n=1

1

nσ+N1−σ

|t|+|s|σNσ

∀σ > 0, ∀|t| > 1.

For σ ∈ (0, 1) this bound gives

|ζ(σ + it)| σ |t|1−σ ∀σ > 0, ∀|t| > 1,

proving that ζ grows along the vertical lines at most as a power of t (but notethat the bound is not uniform in σ).

3) For σ = 1 the formula gives

|ζ(1 + it)| N∑n=1

1

n+O

( |t|N

)∀|t| > 1,

which with N = |t| shows that

|ζ(1 + it)| ln |t| ∀|t| > 1.

Exercise. 1.47 The function P (s) :=∑

p 1/ps is called Prime zeta function; alsothis Dirichlet series converges for σ > 1. The following steps will give a formulafor a quick computation of the values of P (s).

1) Let σ > 1. Using the representation of the Riemann zeta function as Eulerproduct, prove that

P (s) =∞∑n=1

µ(n)

nln(ζ(ns)).


2) Recall the result in Ex. 1.45 Step 1:∣∣∣ ∞∑k=2

1

ks

∣∣∣ ≤ ∞∑k=2

1

kσ≤(

1 +2

σ − 1

) 1

2σ.

3) Using the inequality ln(1 + y) ≤ y, prove that for every L ≥ 3∣∣∣ ∞∑n=L

µ(n)

nln(ζ(ns))

∣∣∣ ≤ ∞∑n=L

1

nln(

1 +∞∑k=2

1

knσ

)≤ 2(σ − 1)−1

2Lσ − 1,

so that ∣∣∣P (s)−L−1∑n=1

µ(n)

nln(ζ(ns))

∣∣∣ ≤ 2(σ − 1)−1

2Lσ − 1.

4) Use the previous formula to compute P (2) with an error lower than 10−4 (thenecessary values for ζ can be computed using the approximation in Ex. 1.45,or using their exact values ζ(2) = π2/6, ζ(4) = π4/90 and so on).

Exercise. 1.48 How to compute the value of c :=∏p>2

(1− 1

(p−1)2

)? The following

argument is due to Wrench.15

1. Check that − ln(1− x2

(1−x)2) =

∑∞m=2

2m−2m xm for |x| < 1/2.

2. Deduce that − ln c =∑∞

m=22m−2m Podd(m), where Podd(m) :=

∑p>2

1pm is the

odd primes zeta function, i.e. the restriction to the odd primes of the usualPrime zeta function already introduced in Ex. 1.47.

3. Prove that Podd(m) =∑

p>2 p−m ≤ 3−m +

∑n>4 n

−m ≤ 3−m + m41−m forevery m ≥ 2, so that∣∣∣− ln c−

K−1∑m=2

2m − 2

mPodd(m)

∣∣∣ ≤ 3

K

(2

3

)K+

8

2K.

The values of Podd(m) = P (m) − 2−m can be quickly computed using theresult in Ex. 1.47.

Exercise. 1.49 Use the Prime Number Theorem to reprove Proposition 1.14 (butnotice that we have proved that proposition without the PNT).

15John W. Wrench Jr., Evaluation of Artin’s constant and the twin-prime constant, Math. Comp.15, 396–398 (1961). On a possible solution of similar computational problems see also P. Moree,Approximation of singular series and automata, Manuscripta Math. 101(3), 385–399 (2000).


Exercise. 1.50 Use the Prime Number Theorem to prove that∑p≤x

ln p

p= lnx+ d+O

( 1

lnx

)for a suitable constant d. This relation improves (1.24).

Exercise. 1.51 Let α ∈ [0, 1). Use the Prime Number Theorem to prove that∑p≤x

1

pα=( 1

1− α+O

( 1

lnx

)) x1−α

lnx.

The result is uniform in α?

Exercise. 1.52 The following steps prove that if ψ(x) = x+O(xδ) for some δ > 0,then ζ(s) 6= 0 for Re(s) > δ.

1) Let

H(s) := −ζ′(s)

ζ(s)− ζ(s) =

∞∑n=1

Λ(n)− 1

ns.

Note that H(s) is meromorphic in C, is holomorphic in an open neighborhoodof s = 1 (the simple pole of ζ(s) is cancelled out) and has simple poles at everyzero of ζ.

2) Using the partial summation formula prove that

H(s) = s

∫ +∞

1

ψ(x)− bxcxs+1

dx

when Re(s) > 1.

3) By the principle of identity for meromorphic functions the previous equalityholds not only in Re(s) > 1, but also in every open and connected set of Cwhere both LHS and RHS are holomorphic. Assume that |ψ(x) − x| xδ,then the RHS of the previous equality is holomorphic in Re(s) > δ. Thismeans that H(s) is holomorphic here, i.e. ζ has no zeros here.

Exercise. 1.53 Let g(s) := 1−∑1000

n=2 n−s and h(s) :=

∑∞n=1001 n

−s.

1) Prove that g(s) has a real zero s0 in the interval [1.72, 1.73].

2) Let Γ := s ∈ C : |s − s0| ≤ 0.03. Prove that |h(s)| ≤ 0.012 and that|g(s)| ≥ 0.04 for every s ∈ Γ.

3) Let f(s) := g(s)+h(s). Using Rouche’s Theorem, deduce that f(s) has a zeroin the disk encircled by Γ.


Comments: The function f(s) differs from the Riemann zeta function only forthe signum of the coefficients whose index n is in 2, . . . , 1000; this ‘little’ changedoes not modify the general properties of the function (analytic continuation to Cas meromorphic function, unique pole at s = 1 with residue equal to 1, a kind offunctional equation, and so on), thus under this aspect we could consider f(s) asa ‘small’ change of ζ(s). Nevertheless, f(s) completely losses its connection withthe arithmetic: the coefficients of f(s) are no more a multiplicative function, sof(s) does not have a representation as Euler product. Hence, in this sense f(s) isa ‘great’ variation of the Riemann zeta function. Besides, f(s) has zeros in σ > 1,a behavior which is in total contrast with the one of ζ: this example suggests that,probably, the arithmetic (i.e., the existence of an Euler product representation)will have a fundamental role in the proof (?) of the Riemann Hypothesis.

Hint: the Euler-Maclaurin formula should be used (as in Ex. 1.45) to get someexplicit formulas giving good approximations of h(s). A software can be used toperform the (great quantity of) computations needed for Items 1 and 2.

Chapter 2

Primes in arithmetic progressions

Let a and q be fixed positive integers: How many primes p satisfy the con-gruence p = a (mod q)? Note that if (a, q) > 1 there are only finitely many ofthem, thus the question is interesting only under the assumption that a and q becoprime. For special values of a and q there are several elementary ways to provethat the set is actually infinite. For example, numbers Nk := 4p1 · · · pk − 1 canbe used to prove that there are infinitely many primes p with p = −1 (mod 4)1

and numbers Nk := (2p1 · · · pk)2 + 1 can be used to prove that there are infinitelymany primes p with p = 1 (mod 4)2. Similar arguments, for example based uponthe cyclotomic polynomials, give the analogous result for other choices of a and q3.Therefore there is a good evidence in favor to the conjecture that every arithmeticprogression a mod q contains infinitely many primes whenever (a, q) = 1. In spiteof the partial results afore mentioned, the conjecture resisted for several years,when finally was proved, in a much stronger version, by Dirichlet. Its proof was afar extension of the original argument of Euler proving the divergence of

∑p 1/p,

and in fact he proved the conjecture under the form

Theorem 2.1 (Dirichlet) Let (a, q) = 1, then∑p=a (q)

1

pσ∼ 1

ϕ(q)

∑p

1

pσas σ → 1+.

1Every odd prime is congruent to 1 or to −1 modulo 4. Let p1, . . . , pk be primes, all congruentto −1 modulo 4. Take Nk := 4p1 · · · pk − 1; Nk is = −1 (mod 4), so it has a prime factor whichis congruent to −1 modulo 4 (otherwise Nk would be a product of numbers congruent to 1 sothat it would be congruent to 1, too). This prime factor cannot be equal to any pj , because Nkis congruent to −1, not to 0, modulo pj .

2Let p be a prime divisor of Nk; Nk is odd, hence p is odd too. By its definition we have−1 = (2p1 · · · pk)2 (mod p), in particular −1 is a square modulo p, so that p = 1 (mod 4). Theproof concludes because it is evident that p 6= p1, . . . , pk.

3In 1918 Schur proved that if a2 = 1 (mod q) then there is a polynomial P ∈ Z[x] such that forall integers n large enough, all prime divisors of P (n) are = a (mod q): this fact immediatelyproves that there are infinitely many primes p = a (mod q), and the argument is essentiallywhat we have used before for primes in −1 (mod 4) (set P (x) = 4x − 1) and 1 (mod 4) (setP (x) = 4x2 + 1). In 1988 M. Ram Murty proved that this is also a necessary condition forsuch a kind of proof. In other words, he proved that if such a polynomial exists then a2 = 1(mod q). Original Murty’s paper Primes in certain arithmetic progressions, J. Madras Univ.(1988), 161–169 is not easily accessible, but he reprinted it with some improvements in Primenumbers in certain arithmetic progressions, Funct. Approx. Comment. Math. 35 (2006), 249–259. For a nice exposition of its theorem see also these notes of Keith Conrad www.math.uconn.

edu/~kconrad/blurbs/gradnumthy/dirichleteuclid.pdf .

61

www.math.uconn.edu/~kconrad/blurbs/gradnumthy/dirichleteuclid.pdf

www.math.uconn.edu/~kconrad/blurbs/gradnumthy/dirichleteuclid.pdf

62 CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS

As attested by the appearing of the σ variable, his proof was essentially real ana-lytic.4 After the proof of the Prime Number Theorem, Landau was able to mergethe two techniques and prove the result of Dirichlet in the stronger form

π(x; a, q) :=∑

p=a (q)p≤x

1 =1

ϕ(q)li(x) +Oa,q(xe

−c√

lnx).

His proof was complex analytic in essence. In this chapter we prove the followingintermediate result (stronger than Dirichlet’s result but weaker than Landau’sone).

Theorem 2.2 Let (a, q) = 1, then

π(x; a, q) :=∑

p=a (q)p≤x

1 =1

ϕ(q)li(x) +Oa,q,A

( x

(lnx)A

), ∀A > 0.

Actually, its proof is very similar to the one for the Prime Number Theorem, thuswe will skip the entire set of analytic lemmas and we will concentrate our attentiononly upon the new tools.

The first step is to introduce a way to distinct the integers satisfying a givenarithmetic progression a (mod q), with (a, q) = 1. This is done by using the cha-racters of the group (Z/qZ)∗, a tool which was devised by Dirichlet exactly for thispurpose and now has become fundamental for the theory of groups. Here we givea self contained proof of the main result we need, but for a better comprehensionof this topic we demand the reader to specialized texts.5

Let G be an abelian finite group. A character of G is a map χ : G→ C∗ whichis multiplicative, i.e. such that χ(ab) = χ(a)χ(b). The constant map χ0 definedby χ0(a) = 1 for every a is the simplest character.

Proposition 2.1

1) χ(1) = 1.

2) χ(a) is a ]G-th root of unity.

3) χ(a−1) = χ(a)−1 = χ(a).

4Dirichlet said that the idea for the proof came out during a visit he spent in Florence, watchingthe chandelier suspended at the top of the cathedral. Considering that the same chandelierinspired to Galilei four century before his law upon the (almost) independence of the periodof the oscillations of a pendulum of the amplitude, we can deduce that in case of a strongmathematical difficulty a visit to Florence could be a major step toward the solution.

5for example [H] and [I].

CAP. 2: PRIMES IN ARITHMETIC PROGRESSIONS 63

4) The set characters of G is itself a finite abelian group, which is denoted as G,

with ]G = ]G, and χ0 is the unit of this group.

5) 1]G

∑a∈G χ(a) is 1 when χ = χ0, 0 otherwise.

6) 1]G

∑a∈G χ(a)η(a) is 1 when χ = η, 0 otherwise.

7) 1]G

∑χ∈G χ(a)χ(b) is 1 when a = b, 0 otherwise.

Proof.

1) Evident.

2) a]G = 1 for every a ∈ G (this the little Fermat theorem), therefore (χ(a))]G =χ(a]G) = χ(1) = 1.

3) The first claim follows by the multiplicativity, the second claim follows by thesecond item.

4) Let G be the set of characters of G. In G we introduce a compositionlaw defining the product of two characters χ and η pointwise, i.e. setting(χη)(a) := χ(a)η(a) for every a ∈ G. It is evident that χη is a character, that

χ0 is the unit and that χ acts as the inverse of χ in G: this proves that G hasthe structure of abelian group. The difficult point is proving that G is finiteand that its order is equal to the one of G. There are several possible proofs;the simplest one consists in proving the claim for cyclic groups (in this case theset of characters can be concretely produced) and then using the fundamentaltheorem giving the structure of every abelian group G as direct product ofsuitable cyclic subgroups. Here we reproduce a different proof, which is longerbut more interesting.Let V (G,C) := f : G → C, the set of all maps from G to C. It is a vectorspace with respect to the pointwise operations. Since G is finite, it is evidentlyfinite dimensional with d := dim(V (G,C)) = #G.

Lemma 2.1 (Weil) Any set χ1, . . . , χn of distinct characters for G are line-arly independent elements in V (G,C).

Proof. By absurd, suppose that a set of linearly dependent characters exists,and let χ1, . . . , χn be such a set with minimal cardinality n. This means thatthere exist c1, . . . , cn ∈ C such that

∑nj=1 cjχj(a) = 0 for every a ∈ G, and

the minimality of n gives cj 6= 0 for every j. The values of any character arenon-zero, hence in such a sum at least two characters appear. Let α be an


element in G such that χ1(α) 6= χ2(α). Then

n∑j=1

cjχ1(α)χj(a) = χ1(α)n∑j=1

cjχj(a) = 0 ∀a ∈ G

and

n∑j=1

cjχj(α)χj(a) =n∑j=1

cjχj(αa) = 0 ∀a ∈ G.

Subtracting them we get

n∑j=1

cj(χ1(α)− χj(α))χj(a) = 0 ∀a ∈ G.

In this sum the j = 1 term does not appear, and the one with j = 2 has anon-zero coefficient, therefore the equality shows that the characters χ2, . . . , χnare C-linearly dependent. This is impossible since this set contains only n− 1characters so that it violates the minimality of n.

The lemma implies that any collection of n distinct characters contains atmost d elements, because d is the dimension of V (G,C). As a consequence G

is a finite set, with d := ]G ≤ d. The argument my be repeated for the new

group G, and proves thatˆd ≤ d where

ˆd := ]

ˆG.

Now, we prove that given any subgroup H ⊆ G and any character ψ ∈ H,there is always a character ψ′ ∈ G whose restriction to H coincides with ψ.In fact, let x be any element in G\H. Consider the group Hx := 〈H,x〉. Letq be the order of [x]H (the class of x modulo H) as element in G/H (here weare using the assumption that G is abelian to ensure that H is normal in G).Then xq ∈ H, so that ψ(xq) is well defined. Let ξ be any qth root of ψ(xq),and set

ψ′(hxm) := ψ(h)ξm

for every h ∈ H and m ∈ Z. We notice that ψ′ is well defined. In fact, ifhxm = h′xn for some h, h′ ∈ H and m,n ∈ Z, then n = m+kq for some k ∈ Z(because xn−m = hh′−1 ∈ H), so that

ψ(h) = ψ(h′xn−m) = ψ(h′(xq)k) = ψ(h′)ψ(xq)k = ψ(h′)(ξq)k = ψ(h′)ξn−m,

i.e., ψ(h)ξm = ψ(h′)ξn. Since G is abelian, this also proves that ψ′ is multipli-cative in Hx (here again we use this fundamental hypothesis on G), so that itis a character of Hx. Its definition also shows that ψ′ coincides with ψ on H.If G = Hx the proof terminates, otherwise we repeat the argument, producing


an even larger group. With finitely many steps we arrive to G (because G isfinite), so that the claim is proved in full generality.

Now, let a ∈ G be an element which is not the unit. Then there exists acharacter χ ∈ G such that χ(a) 6= 1. In fact, let q be the order of a in G, let

H := 〈a〉 and let ξ := e2πiq . Then setting ψ(an) := ξn = e

2πinq we produce a

character on H which is not trivial in a (because a 6= e, hence q 6= 1), and theprevious argument proves that ψ can be lifted to a character of G having thesame value in a.

Finally, for every a ∈ G, let eva be the map G → C∗ such that eva(χ) =

χ(a). This is evidently multiplicative, so that it is an element inˆG. Let the

evaluation map: ev : G→ ˆG be defined as ev(a) := eva. This is multiplicative,

too, and the claim we have just proved states that it is injective. This shows

that d ≤ ˆd. Thus

d ≤ ˆd ≤ d ≤ d,

which proves that d = d =ˆd.

5) The claim for χ = χ0 is evident. Let χ be not equal to χ0. Then there existsb ∈ G such that χ(b) 6= 1. Let S :=

∑a∈G χ(a). The multiplicativity implies

that χ(b)S =∑

a∈G χ(ab) =∑

a∈G χ(a) = S (because the product by b simplypermutes the elements of G), so that S = 0.

6) A simple consequence of Item 5, since χη is a character.

7) Let d be the cardinality of G (and hence of G, by Item 4). Let χ1, . . . , χdbe the complete set of characters and let a1, . . . , ad be the complete set ofelements in G. Let M be the square matrix

M =1√d

χ1(a1) χ2(a1) . . . χd(a1)χ1(a2) χ2(a2) . . . χd(a2)

......

χ1(ad) χ2(ad) . . . χd(ad)

.

Item 6 proves that M∗M = I. Then MM∗ = I too6, which is the statementof Item 7.

6We use here the fact that if the product AB of two square matrices A and B is the identity,then B is the inverse of A (and vice versa); in other words, the right-inverse of A is necessaryits (bilateral) inverse. The proof is the following. Let AB = I. Then the determinant of A isnot zero (because det(A) det(B) = det(AB) = 1), hence A is invertible. Multiplying by A−1 weget the equality A−1 = A−1AB = B, proving the claim.


For every a ∈ G, let δa : G→ C be the Dirac delta at a, i.e. the function whosevalue δa(b) is 1 when b = a and 0 otherwise. The family δaa∈G is evidently abasis for V (G,C), with the identity f =

∑a∈G f(a)δa giving the basic formula

showing the decomposition of f on the basis of delta’s. Proposition 2.1 shows thatthe elements in G, i.e. the characters, are an alternative basis for the same space,with claim 6) giving a way to compute the coordinates of f with respect to thisnew basis:

f =∑χ∈G

f(χ)χ, where f(χ) :=1

]G

∑a∈G

f(a)χ(a).

This is an example (the easiest one, actually) of Fourier duality and can be ex-tended to each abelian locally compact group (but the extension needs severalsophisticated tools).

Let G be the group of integers modulo q which are coprime with q, i.e. (Z/qZ)∗.This is an abelian group whose cardinality is ϕ(q). We extend every character χof G to every integer Z by setting

χ(n) =

0 if (n, q) 6= 1

χ(n (mod q)) if (n, q) = 1.

The arithmetical function we have generated in this way is called Dirichlet cha-racter modulo q and by abuse of notation it is still denoted as χ. It is totallymultiplicative, i.e. χ(mn) = χ(m)χ(n) for every couple of integers m, n not ne-cessarily co-prime.The key idea for the proof of the theorem is the following identity, which is animmediate application of the previous proposition (Item 7)∑

n=a (q)n≤x

Λ(n) =1

ϕ(q)

∑χ (mod q)

χ(a)∑n≤x

χ(n)Λ(n).

In this way the original sum on the arithmetic progression has been written asa (sum of a) more conventional sum on the entire set of integers n ≤ x, but forχ(n)Λ(n). In this sum we single out the term coming from the trivial characterχ0:

(2.1)∑

n=a (q)n≤x

Λ(n) =1

ϕ(q)

∑n≤x

χ0(n)Λ(n) +1

ϕ(q)

∑χ (mod q)χ 6=χ0

χ(a)∑n≤x

χ(n)Λ(n).

The effect of the presence of χ0(n) at the first term is the exclusion from thesum of the integers which are not coprime with q (because χ0(n) = 0 for them)


and therefore is essentially equal to the well-known sum ψ(x). Hence, the proofconsists in proving that

(2.2) ψχ(x) :=∑n≤x

χ(n)Λ(n) =

x+OA

(x

(lnx)A

)if χ = χ0,

OA

(x

(lnx)A

)if χ 6= χ0.

The Dirichlet series

L(s, χ) :=∞∑n=1

χ(n)/ns

is called Dirichlet L-function associated with the character χ. Since |χ(n)| ≤ 1,the series converges for Re(s) > 1, and since χ is a totally multiplicative map, weget also that

L(s, χ) =∏p

(1− χ(p)

ps

)−1Re(s) > 1.

From this representation as Euler product it is clear that

L(s, χ0) =∏p|q

(1− 1

ps

)· ζ(s);

this equality shows that L(s, χ0) admits an analytic continuation to C as meromor-phic function with unique pole (simple) at s = 1, and that its zeros in Re(s) > 0coincide with the zeros of ζ(s).

Let χ 6= χ0 and let Aχ(x) :=∑

n≤x χ(n). From Item 5 in Proposition 2.1

we have∑a+q

n=a+1 χ(n) = 0 for every integer a, therefore Aχ(x) is bounded (for

example by ϕ(q)/2, but the exact bound is not important here7). Hence by partialsummation it follows that the Dirichlet series defining L(s, χ) converges in Re(s) >0 (not absolutely in Re(s) ∈ (0, 1]) and that in this half-plan we have

(2.3) L(s, χ) = s

∫ +∞

1

Aχ(x)

xs+1dx if Re(s) > 0.

Summarizing, we have proved that the Dirichlet L-functions are represented byDirichlet series and Euler product in Re(s) > 1, admit a continuation to Re(s) > 0as meromorphic function, and that actually only L(s, χ0) has a pole in this region;it is reminiscent of the one of the Riemann zeta function and in fact it is simpleand located at s = 1.

Our ability to prove the Prime Number Theorem was based upon our ability tofind upper bounds for the derivatives of the inverse of the Riemann zeta function,i.e. a lower bound for the zeta itself and an upper bound for its derivatives. Inparticular, we have deduced the lower bound for ζ from an upper bound, using

7The Polya-Vinogradov inequality says that maxx |Aχ(x)| < √q ln q.


the special identity in Proposition 1.18. The upper bound for the derivatives ofL(s, χ) can actually be proved following word by word the argument we have usedfor ζ; the existence of a special identity for L(s, χ) is more delicate, but we arelucky and such a formula exists.

Proposition 2.2 Let σ > 1 and t ∈ R, then

L(σ, χ0)3|L(σ + it, χ)|4|L(σ + 2it, χ2)|2 ≥ 1.

Proof. In the region Re(s) > 1 all the involved functions have a representationas (absolutely convergent) Euler product, so that

ln(LHS) = −∑p

[3 ln

(1− χ0(p)

pσ

)+ 2 ln

(1− χ(p)p−it

pσ

)+ 2 ln

(1− χ(p)pit

pσ

)+ ln

(1− χ2(p)p−2it

pσ

)+ ln

(1− χ2(p)p2it

pσ

)]=∑p

∞∑m=1

1

mpmσ(3χ0(pm) + 2χ(pm)p−imt + 2χ(pm)pimt

+ χ2(pm)p−2imt + χ2(pm)p2imt)

=∑p

∞∑m=1

1

mpmσ(χ0(pm) + χ(pm)p−imt + χ(pm)pimt)2 ≥ 0.

As we have done for the Riemann zeta function, we can use the previousinequality to prove the non-existence of zeros for the Dirichlet L-functions alongthe line Re(s) = 1, with the exception of the point s = 1. For the point s = 1 adifferent argument will be need.

Corollary 2.1 L(1 + it, χ) 6= 0 for every t ∈ R.

Proof. By absurd, suppose that L(s, χ) has a zero along the vertical line σ = 1, sothat there exists t0 such that L(1 + it0, χ) = 0. When χ is real, i.e. when χ2 = χ0,we further suppose that t0 6= 0. L(s, χ) is holomorphic in an open neighborhoodof 1 + it0, therefore L(s, χ) ∼ c(s − 1 − it0)n for some constant c ∈ C∗ and someinteger n ≥ 1. Moreover, L(s, χ2) is regular in a neighborhood of 1 + 2it0 (herewe use the assumption t0 6= 0 for the characters χ with χ2 = χ0) hence from theinequality proved in the previous proposition we deduce that

1 ≤ L(σ, χ0)3|L(σ+ it0, χ)|4|L(σ+ 2it0, χ2)|2 (σ−1)−3(σ−1)4n as σ → 1+,

which is impossible because 4n− 3 > 0.To complete the proof we need to prove that L(1, χ) 6= 0 when χ is a real character.


There are several ways to prove this fact, both algebraic (for example consideringL(s, χ) as a factor of the Dedekind zeta function of a quadratic field and using theclass number formula) and analytic. We reproduce here a classical analytic proof,based upon Landau’s Theorem 1.2. Let H(s) := ζ(s)L(s, χ). This function ismeromorphic in Re(s) > 0 and is represented by a Dirichlet series

∑+∞n=1 a(n)n−s

with nonnegative coefficients. In fact, a(n) =∑

d|n χ(d) (because H is the product

of two Dirichlet series, so that its coefficient is the ∗-product of their coefficients),hence it is multiplicative (because χ is multiplicative), and

a(pk) =k∑j=0

χ(pj) =k∑j=0

χ(p)j =

1 if χ(p) = 0

k + 1 if χ(p) = 112(1 + (−1)k) if χ(p) = −1

which proves that a(pk) ≥ 0 in any case. This computation also shows thata(p2k) ≥ 1 whenever p - q, so that a(n2) ≥ 1 for every integer n which is coprimeto q, yielding the lower bound

(2.4) H(σ) ≥+∞∑n=1

(n,q)=1

1

n2σ.

Now, suppose that L(1, χ) = 0. Then H(s) is regular for σ > 0 (the zero of L(s, χ)kills the simple pole of ζ at s = 1) and it is a Dirichlet series with nonnegativecoefficients. Therefore its Dirichlet series converges in σ > 0 (by Landau Theo-rem 1.2). However this is impossible, since by (2.4) the Dirichlet series divergeswhen σ = 1/2.

The proof of Theorem 2.1 is almost immediate. In fact, let σ → 1+. by Eulerproduct

ln(L(σ, χ)) = −∑p

ln(

1− χ(p)

pσ

)=∑p

χ(p)

pσ−∑p

[ln(

1− χ(p)

pσ

)+χ(p)

pσ

]=∑p

χ(p)

pσ+O

(∑p

1

p2σ

)=∑p

χ(p)

pσ+O(1).

When χ 6= χ0 we know that L(1, χ) 6= 0, thus ln(L(σ, χ)) = O(1) and the previouscomputation shows that ∑

p

χ(p)

pσ= O(1) as σ → 1+.


This bound and the orthogonality of characters give∑p=a (q)

1

pσ=

1

ϕ(q)

∑χ

χ(a)∑p

χ(q)

pσ=

1

ϕ(q)

∑p

χ0(q)

pσ+

1

ϕ(q)

∑χ 6=χ0

χ(a)∑p

χ(q)

pσ

=1

ϕ(q)

∑p

1

pσ+O(1)

which is Theorem 2.1. In particular, we notice that its claim depends only thebehavior of Dirichlet L functions on the real set (1,+∞).On the contrary, the proof of Theorem 2.2 is longer and starts with formulas (2.1)and (2.2), and then reproduces in this new setting the tools we have already usedfor the proof of the Prime Number Theorem: all key ingredients hold also in thisnew case, and only ancillary (yet fundamental!) computations are needed. Theyare proposed in the following exercises.Concluding, we notice that the proof of Theorem 2.2 relies on the behavior ofDirichlet L functions on the closed half-plan Re(s) ≥ 1.

Exercise. 2.1 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that∑n≤x χ(n) lnnq lnx.

Exercise. 2.2 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Let Mχ(x) :=∑n≤x χ(n)µ(n) and ψχ(x) :=

∑n≤x χ(n)Λ(n). Prove that

Mχ(x)Ax

(lnx)A∀A > 0 =⇒ ψχ(x)A

x

(lnx)A∀A > 0.

Suggestion: imitate the proof of Proposition 1.17 and use the previous exercise.

Exercise. 2.3 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that

L(`)(s, χ)`,q (ln(3|s|))`+1 for Re(s) > 1,

where the symbol `,q means that the implicit constant depends on ` and q, sothat the result is not uniform in these parameters.Suggestion: imitate the proof of Proposition 1.19.

Exercise. 2.4 Let q > 1 and let χ be a character modulo q, χ 6= χ0. Prove that|L(1, χ)| ≤ maxx |Aχ(x)|, so that

|L(1, χ)| ≤ ϕ(q)/2.

The better bound |L(1, χ)| ≤ √q ln q comes from the Polya-Vinogradov inequality.Suggestion: use (2.3).

Chapter 3

Sieve methods

With the exception of some original divagations and some clarifications, thischapter is based upon [MV], Ch. 3.

3.1. Eratosthenes-Legendre’s sieve

Sylvester’s inclusion–exclusion principle has many formulations, one of them isthe following: let χA be the characteristic function of the set A and let A1, . . . , Anbe sets of some kind, then

(3.1) χA1∪A2∪···∪An=∑j

χAj −∑j1<j2

χAj1∩Aj2+

∑j1<j2<j3

χAj1∩Aj2∩Aj3+ · · · .

Proof. The characteristic function of Ac is 1−χA and the characteristic functionof A ∩B is χAχB , thus

1− χA1∪A2∪···∪An= χ

(A1∪A2∪···∪An)c= χ

Ac1∩Ac2∩···∩A

cn

= χAc1χAc2· · ·χ

Acn

= (1− χA1)(1− χA2

) · · · (1− χAn )

= 1−∑j

χAj +∑j1<j2

χAj1χAj2

−∑

j1<j2<j3

χAj1χAj2

χAj3+ · · ·

= 1−∑j

χAj +∑j1<j2

χAj1∩Aj2−

∑j1<j2<j3

χAj1∩Aj2∩Aj3+ · · · .

Let (X,µ,Σ) be any measure space and suppose that the sets Ajj belong tothe algebra Σ. Integrating (3.1) with respect to µ we get

(3.2) µ(A1 ∪A2 ∪ · · · ∪An)

=∑j

µ(Aj)−∑j1<j2

µ(Aj1 ∩Aj2) +∑

j1<j2<j3

µ(Aj1 ∩Aj2 ∩Aj3) + · · · .

When µ is a probability measure (i.e., a measure for which µ(X) = 1) the previousidentity becomes the classical formula for the probability of the event

⋃j Aj , and

when µ is the counting measure the identity becomes the well known formula forthe number of elements in

⋃j Aj . Formula (3.2) can be used also to deduce upper

and lower estimations for µ(⋃j Aj). This is exactly the content of Bonferroni’s

inequalities. To state them in a compact way we simplify a bit our notation: letY :=

⋃nj=1Aj and let

Ξk(x) :=

χY (x) if k = 0,∑

j1<j2<...<jkχAj1∩Aj2∩···∩Ajk

(x) if k > 0,, Sk :=

∫Y

Ξk(x) dµ,

71

72 3.1. ERATOSTHENES-LEGENDRE’S SIEVE

then

Proposition 3.1 (Bonferroni) (−1)`∑`

k=0(−1)kSk ≥ 0 for every `, so that

S1 − S2 ≤ S0 ≤ S1

S1 − S2 + S3 − S4 ≤ S0 ≤ S1 − S2 + S3

S1 − S2 + S3 − S4 + S5 − S6 ≤ S0 ≤ S1 − S2 + S3 − S4 + S5

. . . ≤ S0 ≤ . . .

Proof. For every x ∈ Y , let I(x) be the number of sets Aj containing x. Thenwhen k > 0∫

YΞk(x) dµ(x) =

∫Y

∑j1<j2<...<jk

χAj1∩Aj2∩···∩Ajk(x) dµ(x) =

∫Y

(I(x)

k

)dµ(x)

because we can extract(I(x)k

)k-ples of ordered indexes from a set of I(x) indexes.

The same equality holds also for k = 0, by direct inspection. As a consequence,∑k=0

(−1)kSk =∑k=0

(−1)k∫Y

Ξk(x) dµ(x)

=∑k=0

(−1)k∫Y

(I(x)

k

)dµ(x) =

∫Y

∑k=0

(−1)k(I(x)

k

)dµ(x).

Since

(3.3)∑k=0

(−1)k(ω

k

)= (−1)`

(ω − 1

`

)∀`, ∀ω,

the claim is now evident.1

As we see, Formula (3.2) shows that it is possible to detect the measure of acertain set by a kind of over-counting/under-counting strategy. At this level of

1 Prove it, for example by induction on `, or using the power series identity( ∞∑`=0

x`)( ∞∑

`=0

(ω

`

)(−x)`

)=

∞∑`=0

(ω − 1

`

)(−x)`

which is only another way to write (1 − x)−1 · (1 − x)ω = (1 − x)ω−1. Equating the coefficientof x` to the left-hand side and to the right-hand side, we get the equality∑

j=0

(−1)j(ω

j

)= (−1)`

(ω − 1

`

), ∀ω,∀`.

Note that if(ωj

)is considered as ω·(ω−1)···(ω−j+1)

j!, then the same argument proves the identity

also for non integer values of ω.

CAP. 3: SIEVE METHODS 73

generality the Eratosthenes’ sieve is a very similar way to reach the same purpose,for the set of primes.In order to appreciate this remark, recall that the Eratosthenes’ sieve is a methodto build a complete list of all prime numbers which are lower than a fixed integerN . This algorithm runs as follows:

1. write down all the integers 2, 3, . . . , N in an ordered list,2. save the first element of the list and erase all its proper multiples,

3. repeat the second step until you have reached the integer⌈√

N⌉

,

4. the saved integers and the remaining entries are the complete list of primes ≤ N .

For example, let N = 20, so that we begin with

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20.

We save the number two and we delete every proper multiple of two (i.e. numbers2n with n > 1): we have

↓2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .

Now we repeat the step, with the previous list: we save the number three and wedelete every multiple of three which is not equal to 3 itself: we have

↓2

↓3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .

Note that some integers have been excluded (boxed) twice. One more run, withthe previous list: we save the number five and we delete every multiple of fivewhich is not equal to 5 itself. We get

↓2

↓3 4

↓5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .

The algorithm stops here, because we have reached⌈√

20⌉

= 5. Hence the primenumbers lower than 20 are:

2 3 5 7 11 13 17 19.

Why there is no need to repeat the second step for every integer below N , butit is sufficient to reach

√N? Because if n ≤ N is not a prime number, then it is

a product of the form uv with u, v ≥ 2 and at least one of them, say u, is lower

than⌈√

N⌉. Thus, the number n has already been erased when a prime divisor

of u has been counted.Now, how can we cast this algorithm in a formula for π(N)? The first step

gives an evident upper bound π(N) ≤ N − 1 (we subtract one because 1 is not a


prime). The second step erases the proper multiples of all primes which are below√N , hence

π(N) ≈ N − 1−∑p≤√N

(⌊Np

⌋− 1)

;

note that bN/pc − 1 appears here, not simply bN/pc, because we subtract thenumber of proper multiples. This difference actually provides a lower bound forπ(N), i.e.

π(N) ≥ N − 1 + π(√N)−

∑p≤√N

⌊N

p

⌋,

because the integers which are divisible by two primes lower than√N have been

excluded away two times in∑

p. To restore them we add a new term:

π(N) ≈ N − 1 + π(√N)−

∑p≤√N

⌊N

p

⌋+

∑p1<p2≤

√N

⌊N

p1p2

⌋.

This time the sum provides an upper bound for π(N), i.e.

π(N) ≤ N − 1 + π(√N)−

∑p≤√N

⌊N

p

⌋+

∑p1<p2≤

√N

⌊N

p1p2

⌋,

because now the integers which are product of three primes lower than√N have

been considered 1 time in N , cancelled three times in∑

p and restored three times

in∑

p1<p2. Thus now we can refine our bound inserting another term, which

produces a lower bound:

π(N) ≥ N−1+π(√N)−

∑p≤√N

⌊N

p

⌋+

∑p1<p2≤

√N

⌊N

p1p2

⌋−

∑p1<p2<p3≤

√N

⌊N

p1p2p3

⌋.

The analogy with the inclusion-exclusion procedure should be now evident.2 Theformula we get in this way can be written in a more compact way using the Mobiusfunction as

π(N)+1 = π(√N)+

∑d∈D(N)

µ(d)

⌊N

d

⌋, D(N) := n ∈ N : p|n =⇒ p ≤

√N.

2This is much more than a simple analogy. Gian Carlo Rota proved in late ’60 that they are simplytwo realizations of a general principle which is valid for Partial Ordered Sets. See E. A. Benderand J. R. Goldman: On the applications of Mobius inversion in combinatorial analysis, Amer.Math. Monthly 82(8), 789–803, 1975.


Legendre organized this argument in a more flexible way. Given an integer P ,let S(x, y;P ) be the number of integers in (x, x+ y] which are coprime with P : ina formula

S(x, y;P ) :=∑

x<n≤x+y(n,P )=1

1.

Since P enters in the definition only via the condition (n, P ) = 1, there is noloss of generality assuming that P is squarefree. P is a parameter whose presencegives flexibility to our argument: for example, when P = 1 we are simply countingthe integers in (x, x+ y]; when P is the product of the primes below

√x+ y then

S(x, y;P ) counts the primes in (x, x+y] (Why?). The expected value for S(x, y;P )

is ϕ(P )P y, since there are ϕ(P ) integers which are coprime to P in every set of P

consecutive integers. Actually the validity of this claim depends on the size of x,y and P . The following proposition gives a first result in this direction.

Proposition 3.2 (Eratosthenes–Legendre) Let P be squarefree. Then

S(x, y;P ) =ϕ(P )

Py +O(2ω(P )).

Proof. The Mobius function satisfies the identity 1 ∗ µ = δ, so that∑d:

d|n, d|P

µ(d) =

1 if (n, P ) = 1,

0 otherwise.

We use this equality to write the restriction (n, P ) = 1 appearing into the definitionof S(x, y;P ) in a more convenient way:

S(x, y;P ) =∑

x<n≤x+y

∑d:

d|n, d|P

µ(d) =∑d|P

µ(d)∑

x<n≤x+yd|n

1.

Note that this equality realizes the ‘inclusion-exclusion’ principle, via an over-undercounting process. The inner sum can be computed exactly, thus

=∑d|P

µ(d)(⌊x+ y

d

⌋−⌊xd

⌋).

Substituting the integer parts with their argument we introduce a O(1) error ineach summand, so that

= y∑d|P

µ(d)

d+O

(∑d|P

|µ(d)|).


The proof concludes recalling that ϕ(P ) = (µ ∗ I)(P ), i.e. ϕ(P ) =∑

d|P µ(d)Pd ,

and that∑

d|P |µ(d)| = 2ω(P ).3

The previous result is non-trivial only when the explicit term ϕ(P )P y represents

the ‘main term’, i.e. when it is actually larger than the error term 2ω(P ). This isa condition on P with respect to y.We have already mentioned that Eratosthenes’ sieve corresponds to the claim

π(x+ y)− π(x) = S(x, y;P ) for P :=∏

p≤√x+y

p.

This holds as equality, but it needs a fixed and very large value for P : a solarge value, actually, that in this case the error term in Proposition 3.2 becomesexponentially larger than the explicit term appearing there, so that anything usefulcan be deduced from this equality.On the other hand, it is still possible to deduce something interesting from thistool: simply we need to give up the demand of an equality. Independently of ourchoice of P , each prime number has only two possibilities: either is in P or it iscoprime with P . As a consequence

(3.4) π(x+ y)− π(x) ≤ ω(P ) + S(x, y;P ).

This is now only an upper bound, but it holds for every P . Suppose P =∏p≤z p,

for a convenient z that we will choose later. Then ω(P ) = π(z) so that (3.4) andProposition 3.2 give

π(x+ y)− π(x) ≤ π(z) + y∏p≤z

(1− 1/p

)+O(2π(z)).

Recalling Mertens’ result∏p≤z

(1 − 1/p

)−1 ∼ eγ ln z and by setting z = ln y weconclude that

π(x+ y)− π(x) ≤ (e−γ + o(1))y

ln ln y.

This is a weak result (for example when y = x, the left-hand side has order x/ lnxwhile the right-hand side is x/ ln lnx, so that it does not capture the real order ofgrowth of the left-hand side), but it is still interesting since it is totally uniform inx. For example, it gives an upper bound for primes in extra short intervals whichare not accessible with the prime number theorem, neither assuming the RiemannHypothesis. At last, for x = 1 it also produces the upper bound

π(y) ≤ (e−γ + o(1))y

ln ln y

3Let F (m) :=∑d|m |µ(d)|, then F is multiplicative, since |µ| is multiplicative, and F (pk) = 2 for

every prime p and every power k ≥ 1.


which is not optimal but has been obtained without great effort.

3.2. Selberg’s Λ2-method

How one can improve this result? In 1915 Brun realized that in order to getan upper bound for S(x, y;P ) we can substitute the Mobius function µ with anyfunction λ+ such that

(3.5) λ+1 = 1,

∑d|m

λ+d ≥ 0 ∀m.

In fact, repeating with this generic function what we have done for the Mobius,we get

S(x, y;P ) =∑

x<n≤x+y(n,P )=1

1 ≤∑

x<n≤x+y

∑d:

d|n, d|P

λ+d

=∑d|P

λ+d

∑x<n≤x+y

d|n

1 =∑d|P

λ+d

(⌊x+ y

d

⌋−⌊xd

⌋)

= y∑d|P

λ+d

d+O

(∑d|P

|λ+d |).

An analogous lower bound can be deduced from any function λ− such that

(3.6) λ−1 = 1,∑d|m

λ−m ≤ 0 ∀m.

Therefore, we have the opportunity to improve the previous result (i.e., getting asmaller error term), with a suitable choice of λ±. The concrete realization of Brunof this principle was by setting

λ+d =

µ(d) if d ∈ D(2r)

0 otherwise, λ−d =

µ(d) if d ∈ D(2r − 1)

0 otherwise

where D(`) := n : ω(n) ≤ ` and r is a new parameter. In fact λ±1 = 1 and foreach squarefree m we have (See Equation (3.3) and Footnote 1)

∑d|m

λ+d =

2r∑j=0

∑d|m

ω(d)=j

µ(d) =2r∑j=0

(−1)j(ω(m)

j

)=

(ω(m)− 1

2r

)

78 3.2. SELBERG’S Λ2-METHOD

and

∑d|m

λ−d =

2r−1∑j=0

∑d|m

ω(d)=j

µ(d) =

2r−1∑j=0

(−1)j(ω(m)

j

)= −

(ω(m)− 1

2r − 1

),

proving that the conditions (3.5-3.6) are satisfied.

Exercise. 3.1 Let 0 ≤ a0 ≤ a1 ≤ a2 ≤ · · · . Prove that (−1)`∑`

j=0(−1)jaj ≥ 0 forevery `.Let a0, a1, a2, . . . , an be an unimodal sequence4 of nonnegative numbers and such

that∑n

j=0(−1)jaj = 0. Prove that (−1)`∑`

j=0(−1)jaj ≥ 0 for every `.

Remark: Since aj =(ω(P )j

)is unimodal and

∑nj=0(−1)jaj = (1− 1)ω(P ) = 0, we

can use this argument to prove that (−1)`∑`

j=0(−1)j(ω(P )j

)≥ 0 for every `.

With Brun’s choice of λ± it is a bit difficult (but perfectly possible) to estimatethe asymptotic behavior of

∑d|P λ

±d /d. In 1946 Selberg had a clever idea circu-

mventing this difficulty. He remarked that for every sequence Λn of real numberswith Λ1 = 1, one has (∑

d|m

Λd

)2≥

1 if m = 1

0 if m > 1,

so that the argument giving an upper bound for S(x, y;P ) becomes:

S(x, y;P ) =∑

x<n≤x+y(n,P )=1

1 ≤∑

x<n≤x+y

( ∑d:

d|n, d|P

Λd

)2

=∑d|Pe|P

ΛdΛe∑

x<n≤x+yd|n,e|n

1 =∑d|Pe|P

ΛdΛe

(⌊x+ y

[d, e]

⌋−⌊

x

[d, e]

⌋)

= y∑d|Pe|P

ΛdΛe[d, e]

+O((∑

d|P

|Λd|)2)

.

4i.e., a sequence for which there exists an index h such that 0 ≤ a0 ≤ a1 ≤ · · · ≤ ah ≥ ah+1 ≥ah+2 ≥ · · · ≥ an.


This fits with Brun’s remark: it amounts to taking λ+n =

∑d,e

[d,e]=n

ΛdΛe.

Our purpose is to find the sequence Λd minimizing the quadratic form∑d|Pe|P

ΛdΛe[d, e]

under the restriction Λ1 = 1, with the hope that for this sequence we also have a

small error term(∑

d|P |Λd|)2

. We also assume that Λd = 0 for d ≥ z, where z is a

parameter that we will set later: this assumption will contribute to control the sizeof the error term. With suitable transformations we can diagonalize the quadraticform. In fact, least common multiple [d, e] and greatest common multiple (d, e)satisfy the identity [d, e](d, e) = de, therefore∑

d|Pe|P

ΛdΛe[d, e]

=∑d|Pe|P

Λdd

Λee

(d, e).

Since I = ϕ ∗1, i.e. m =∑

f |m ϕ(f) for every m, we can write this sum as∑d|Pe|P

ΛdΛe[d, e]

=∑d|Pe|P

Λdd

Λee

∑f |df |e

ϕ(f) =∑f |P

ϕ(f)( ∑

d:f |d|P

Λdd

)2=∑f |P

ϕ(f)y2f ,

with

yf :=∑d:

f |d|P

Λdd.

This is a linear transformation Λdd|P → yff |P which is not singular since wecan easily invert the previous relation, obtaining that

Λd = d∑f :

d|f |P

yfµ(f/d).

In fact, ∑f :

d|f |P

yfµ(f/d) =∑f :

d|f |P

µ(f/d)∑h:

f |h|P

Λh/h =∑h:

d|h|P

Λh/h∑f :

d|f |h

µ(f/d).

Here both h and f are divisible by d, thus setting f = du and h = dh′, the innersum becomes

∑u|h′ µ(u) which is 1 when h′ = 1 (i.e. h = d) and 0 otherwise,

hence the sum is equal to Λd/d.


In terms of the new variables yf the restriction Λ1 = 1 becomes∑

f |P yfµ(f) = 1,

and the assumption Λd = 0 for d ≥ z becomes yf = 0 for f ≥ z.

Exercise. 3.2 For a finite set of indexes f , let cf , df be real numbers, with cf > 0for every f . Prove that the minimum of

∑f cfy

2f restricted to

∑f dfyf = 1 is

reached when yf =df/cf∑h d

2h/ch

and is equal to (∑

h d2h/ch)−1.

Hint: complete the squares, or use the Lagrange condition for the stationarypoints (to get the yf ) and the convexity (to prove that they produce a minimum).

LetLP (z) :=

∑n≤zn|P

µ(n)2/ϕ(n),

then for yf = µ(f)ϕ(f)LP (z) when f ≤ z the quadratic form assumes its minimum value

which is 1/LP (z). Now we have to find a convenient bound for the size of the errorterm. We have

Λd = d∑f :

d|f |P

yfµ(f/d) =d

LP (z)

∑f :

d|f |Pf≤z

µ(f)µ(f/d)

ϕ(f).

We are assuming that P is squarefree, therefore d and f/d are coprime, so thatµ(f) = µ(d)µ(f/d) and ϕ(f) = ϕ(d)ϕ(f/d), and µ(f/d)2 = 1, hence

Λd =dµ(d)

LP (z)ϕ(d)

∑f :

d|f |Pf≤z

1

ϕ(f/d)=

dµ(d)

LP (z)ϕ(d)

∑m:

m|(P/d)m≤z/d

1

ϕ(m).

Therefore∑d≤z|Λd| ≤

1

LP (z)

∑d:d|Pd≤z

d

ϕ(d)

∑m:

m|(P/d)m≤z/d

1

ϕ(m)≤ 1

LP (z)

∑d≤z

d

ϕ(d)

∑m≤z/d

1

ϕ(m)

=1

LP (z)

∑m≤z

1

ϕ(m)

∑d≤z/m

d

ϕ(d).

Note that dϕ(d) =

∑r|d

µ(r)2

ϕ(r) (prove it, for example using the multiplicativity) so

that ∑d≤y

d

ϕ(d)=∑d≤y

∑r|d

µ(r)2

ϕ(r)=∑r≤y

µ(r)2

ϕ(r)

∑d≤yr|d

1 ≤∑r≤y

1

ϕ(r)

⌊yr

⌋


≤ y∑r≤y

1

r ϕ(r) y

because∑

r1

r ϕ(r) =∏p(1 + p

(p−1)(p2−1)) <∞5. Hence∑

d≤z|Λd|

z

LP (z)

∑m≤z

1

mϕ(m) z

LP (z).

Summarizing, we have proved the following result.

Theorem 3.1 Let P be squarefree, and z ≥ 1. Then

S(x, y;P ) ≤ y

LP (z)+O

(( z

LP (z)

)2), where LP (z) :=

∑n≤zn|P

µ(n)2

ϕ(n).

The claim of the theorem is useful only if we have a lower bound for LP (z).The following lemma gives such a bound.

Lemma 3.1 ∑n≤z

µ(n)2

ϕ(n)≥ ln z.

Exercise 3.3 below gives a way to prove that the sum is asymptotic to ln z.

Proof. Given an integer m, let s(m) denote the squarefree-part of m, i.e. thegreatest squarefree integer dividing m. Let n be squarefree, then

1

ϕ(n)=

1

n

∏p|n

(1− 1

p

)−1=

1

n

∏p|n

(1 +

1

p+

1

p2+ · · ·

)=

∑m:

s(m)=n

1

m.

Since s(m) ≤ m, we have that m ≤ z implies s(m) ≤ z, therefore∑n≤z

µ(n)2

ϕ(n)=∑n≤z

∑m:

s(m)=n

1

m=

∑m:

s(m)≤z

1

m≥∑m≤z

1

m≥ ln z.

Anyway, the lemma gives a lower bound for LP (z) only if P is such that n ≤ zand squarefree already implies that n|P , and a way to realize this condition is bysetting P =

∏p≤z p. In this way we have:

Corollary 3.1 Let z ≥ 1 and let P =∏p≤z p. Then

S(x, y;P ) ≤ y

ln z+O

( z2

ln2 z

),

5Use the multiplicativity to write∑r

1/ϕ(r)rs

as an Euler product, and then take s = 1.


in particular, for z =√y we have

S(x, y;P ) ≤ 2y

ln y+O

( y

ln2 y

).

The second claim is a remarkable improvement on the result proved with theEratosthenes-Legendre sieve. Using the bound (3.4) we get that

Corollary 3.2

π(x+ y)− π(x) ≤ 2y

ln y+O

( y

ln2 y

).

Apart for the constant 2, this upper bound is essentially the best we can obtainwithout imposing additional conditions on the size of x and y. We can combinethe previous result with a general lemma, obtaining a more general conclusion.The lemma will be given as a part of the proof.

Theorem 3.2 Let P be squarefree. Then

S(x, y;P ) ≤ eγy( ∏

p|Pp≤√y

(1− 1

p

))(1 +O

( 1

ln y

)).

Proof. Let q be a squarefree integer, coprime with P . Then

(3.7) ϕ(q)S(x, y;P ) =

q∑m=1

S(x+mP, y; qP ).

In fact, by definition the right-hand side is equal toq∑

m=1

S(x+mP, y; qP ) =

q∑m=1

∑x+mP<n≤x+mP+y

(n,P )=1(n,q)=1

1 =

q∑m=1

∑x<n−mP≤x+y(n−mP,P )=1

(n,q)=1

1,

because n is coprime with P iff n−mP is coprime with P . Setting r := n−mPthe sum becomes

=

q∑m=1

∑x<r≤x+y(r,P )=1

(r+mP,q)=1

1 =∑

x<r≤x+y(r,P )=1

∑1≤m≤q

(r+mP,q)=1

1.

The inner sum is equal to ϕ(q), because the map j : Z/qZ → Z/qZ, defined byj([m]q) = [r +mP ]q, is injective (here we use the hypothesis (q, P ) = 1), thus

= ϕ(q)∑

x<r≤x+y(r,P )=1

1 = ϕ(q)S(x, y;P ).


Let M(y;P ) := maxx S(x, y;P ), then from (3.7) we get

(3.8) M(y;P ) ≤ q

ϕ(q)M(y; qP ).

Now, let P be given as in the statement of the theorem. Let

P1 :=∏p|Pp≤√y

p , q1 :=∏p-Pp≤√y

p,

with P1 and/or q1 equal to 1 if the corresponding product is empty. The productq1P1 is the product of the complete set of primes smaller than

√y, therefore the

bound of Corollary 3.1 applies to M(y; q1P1). By (3.8) we get

M(y;P1) ≤ q1

ϕ(q1)M(y; q1P1) ≤

(∏p|q1

(1− 1

p

)−1)·( 2y

ln y+O

( y

ln2 y

))= y(∏p|q1

(1− 1

p

)−1)· eγ( ∏p≤√y

(1− 1

p

))(1 +O

( 1

ln y

)),

where we have used the result in Ex. 1.49. Therefore,

= yeγ( ∏

p|Pp≤√y

(1− 1

p

))(1 +O

( 1

ln y

)).

Now the claim follows, because S(x, y;P ) ≤ S(x, y;P1) ≤M(y;P1) (the first claimfollows because P1|P , the second one by the definition of M(y;P1)).

3.3. Sifting more classes

In the previous section we have sifted (eliminated) the integers which are notcoprime with an integer P , i.e. which are in the class 0 modulo the primes dividingP . It is possible to modify the argument in order to consider the more generalcase where for every prime p|P there are more classes which must be avoided. Letwe fix our notation. For each prime p dividing the squarefree P let B(p) be a setof classes modulo p which are the ‘bad’ classes. We look for an upper bound of

SB(x, y;P ) :=∑

x<n≤x+yn6∈B(p) ∀p|P

1.

Let b(p) denote the cardinality of B(p): we can assume that b(p) is strictly lowerthan p, otherwise the problem is evidently without solution, and that b(p) ≥ 1,

84 3.3. SIFTING MORE CLASSES

otherwise p can be removed from P . Let

a(n) :=∏p|P

n∈B(p)

p,

then n is not in a class B(p) for any p|P if and only if a(n) = 1, and this happensif and only if (a(n), P ) = 1. Therefore,

SB(x, y;P ) =∑

x<n≤x+y(a(n),P )=1

1 ≤∑

x<n≤x+y

∑d|a(n)d|P

λ+d =

∑d|P

λ+d

∑x<n≤x+yd|a(n)

1.

In order to proceed we need a way to estimate the value of the inner sum. Letb : N → N be the completely multiplicative extension of b, i.e. b(

∏j p

νjj ) :=∏

j b(pj)νj . Actually, this formula defines b only at integers n which are divisi-

ble only for primes in P ; this is sufficient for us, because only this type of integerswill appear as argument of b for our applications, but to ease some computationwill be useful to define b(p) also for p - P in a proper way (see the proof of Propo-sition 3.3, for example).Suppose d be squarefree. Consider a set of d consecutive integers, then for aninteger n in this set we have

d|a(n) ⇐⇒ n : d|∏p|P

n∈B(p)

p

⇐⇒ n ∈ B(p) for every p|d⇐⇒ n ∈ one of the classes modulo d counted by b(d).

Among the integers in (x, x + y] there are by/dc blocks of d consecutive integers,plus a uncomplete block containing at most b(d) integers divisible by d, thereforepursuing the previous equation we get∑


1 ≤∑d|P

λ+d

∑x<n≤x+yd|a(n)

1 =∑d|P

λ+d

(b(d)

⌊yd

⌋+O(b(d))

)

= y∑d|P

b(d)

dλ+d +O

(∑d|P

b(d)|λ+d |).

Using Selberg’s Λ2-method, this bound is written as

SB(x, y;P ) =∑


1 =∑

x<n≤x+y(a(n),P )=1

1 ≤∑

x<n≤x+y

( ∑d|a(n)d|P

Λd

)2


=∑d|Pe|P

ΛdΛe∑

x<n≤x+yd|a(n),e|a(n)

1 =∑d|Pe|P

ΛdΛe

(b([d, e])

⌊y

[d, e]

⌋+O(b([d, e]))

)

= y∑d|Pe|P

b([d, e])

[d, e]ΛdΛe +O

(∑d|Pe|P

b([d, e])|ΛdΛe|).(3.9)

Now we can proceed with the usual approach, i.e. diagonalizing and minimizingthe first quadratic sum. We have∑

d|Pe|P

b([d, e])

[d, e]ΛdΛe =

∑d|Pe|P

b(d)Λdd

b(e)Λee

(d, e)

b((d, e)),

because [d, e](d, e) = de and b([d, e])b((d, e)) = b(d)b(e). Now, let g : N→ N be thecompletely multiplicative map such that

g(p) :=b(p)

p− b(p).

Then, for m squarefree

m

b(m)=∏p|m

p

b(p)=∏p|m

(1 +

p− b(p)b(p)

)=∏p|m

(1 +

1

g(p)

)=∑f |m

1

g(f),

thus ∑d|Pe|P

b([d, e])

[d, e]ΛdΛe =

∑d|Pe|P

b(d)Λdd

b(e)Λee

(d, e)

b((d, e))

=∑d|Pe|P

b(d)Λdd

b(e)Λee

∑f |df |e

1

g(f)

=∑f |P

1

g(f)

( ∑d:

f |d|P

b(d)Λdd

)2=∑f |P

1

g(f)y2f

where

yf :=∑d:

f |d|P

b(d)Λdd

.


Once again this linear transformation can be inverted, obtaining

(3.10)b(d)Λdd

=∑f :

d|f |P

yfµ(f/d),

so that the condition Λ1 = 1 becomes equivalent to the restriction∑

f |P yfµ(f) =

1, and the cut-off condition Λd = 0 for d ≥ z to the cut-off condition yf = 0 forf ≥ z.

Using the general result in Ex. 3.2 we conclude that the minimum for thequadratic form in lambdas is equal to 1/LP (z), where

LP (z) :=∑n≤zn|P

µ(n)2g(n),

and is reached when

(3.11) yf = µ(f)g(f)/LP (z) for f ≤ z, 0 otherwise.

In order to get the result for twin primes we need to analyze in greater detailthe situation where b(2) = 1 and b(p) = 2 for every p > 2.

Proposition 3.3 Let z ≥ 1 and let P =∏p≤z p; suppose b(2) = 1 and b(p) = 2 for

every odd prime p dividing P . Then,

SB(x, y;P ) ≤ 2cy

ln2 z+O

( y

ln3 z

)+O

( z2

ln2 z

),

where c := 2∏p>2

(1− 1

(p−1)2

)= 1.320323 . . ..

Proof. Our choice for P gives

LP (z) = L(z) :=∑n≤z

µ(n)2g(n).

Main term. Our first task is deduce a lower bound for L(z). The definition ofg(p) in our case becomes

g(p) =b(p)

p− b(p)=

1 if p = 2,

2p−2 if 2 < p < z,

but we extend g(p) = 2p−2 to all odd primes (hence, also larger than z). This

assumption does not affect the number LP (z), but simplifies the proof of its as-ymptotic. We consider the Dirichlet series associated with µ(n)2g(n); being mul-tiplicative, this Dirichlet series has an Euler product:

G(s) :=∞∑n=1

µ(n)2g(n)

ns= (1 + 2−s)

∏p>2

(1 +

2

p− 2

1

ps

).


The main term in this Euler product has order 2/ps+1, which is also the mainorder in ζ2(s+ 1), thus we write G(s) as ζ2(s+ 1)H(s), where H(s) is hence equalto

H(s) =∞∑n=1

h(n)n−s = (1 + 2−s)(1− 2−1−s)2∏p>2

(1 +

2

p− 2

1

ps

)(1− 1

p

1

ps

)2.

Note that the Euler factors in H(s) are equal to

1 +4

p− 2

1

ps+1− 4

p− 2

1

p2s+1+

1

p2s+2+

2

p− 2

1

p3s+2

thus their product converges absolutely for Re(s) > −1/2. The representation asEuler product of ζ2(s + 1) is

∑∞n=1[d(n)/n]/ns, where d(n) denotes the devisor

function, therefore the identity G(s) = ζ2(s+ 1)H(s) gives

µ(n)2g(n) =∑m|n

d(m)

mh( nm

).

As a consequence

L(z) =∑n≤z

µ(n)2g(n) =∑uv≤z

d(u)

uh(v) =

∑v≤z

h(v)∑u≤z/v

d(u)

u.

By partial summation we see that∑u≤m

d(u)

u=

∑u≤m d(u)

m+

∫ m

1

∑r≤u d(r)

u2du

and recalling the Dirichlet result∑

u≤m d(u) = m lnm+O(m) (see Prop. 1.15) weget ∑

u≤m

d(u)

u=

∫ m

1

u lnu

u2du+O(lnm) =

1

2ln2m+O(lnm),

therefore

L(z) =∑v≤z

h(v)∑u≤z/v

d(u)

u=∑v≤z

h(v)(1

2ln2(z/v) +O(ln(z/v))

)=∑v≤z

h(v)(1

2(ln2 z − 2 ln z ln v + ln2 v) +O(ln z))

).

Since the Euler product converges absolutely for Re(s) > −1/2, we have∑v≤z|h(v)| 1, since

∞∑n=1

|h(n)|ns

converges at s = 0,


∑v≤z|h(v)| ln v 1, since it gives the 1st derivative of

∞∑n=1

|h(n)|ns

at s = 0,

∑v≤z|h(v)| ln2 v 1, since it gives the 2nd derivative of

∞∑n=1

|h(n)|ns

at s = 0,

∑v≤z

h(v) = H(0) +O(1/zθ) ∀θ < 1/2,

so that L(z) =H(0)

2ln2 z +O(ln z).

Note that

H(0) =

∞∑n=1

h(n) =1

2

∏p>2

(1 +

2

p− 2

)(1− 1

p

)2=

1

2

∏p>2

(1− 1

(p− 1)2

)−1= c−1.

Error term.According to (3.9) the error term is equal to∑d|Pe|P

b([d, e])|ΛdΛe| =∑d|Pe|P

b(d)b(e)

b((d, e))|ΛdΛe| ≤

∑d|Pe|P

b(d)b(e)|ΛdΛe| =(∑d|P

b(d)|Λd|)2,

where we have used the fact that b(m) ≥ 1 for every m dividing P to simplify thebound.Recalling the relation (3.10) giving Λd as a function of yf and (3.11) giving ourchoice for yf , we have∑

d|P

b(d)|Λd| =∑d|P

d∣∣ ∑

f :d|f |P

yfµ(f/d)∣∣ =

∑d|P

d∣∣ ∑f :f≤zd|f |P

µ(f)µ(f/d)g(f)/L(z)∣∣

therefore, using the co-primality of d and f/d, we have∑d|P

b(d)|Λd| ≤1

L(z)

∑d≤zd|P

µ2(d)dg(d)∑

f/d≤z/d

µ2(f/d)g(f/d)

≤ 1

L(z)

∑d≤z

µ2(d)dg(d)∑

m≤z/d

µ2(m)g(m)

=1

L(z)

∑m≤z

µ2(m)g(m)∑

d≤z/m

µ2(d)dg(d).(3.12)

Hence we need an upper bound for∑

d≤X µ2(d)dg(d). This bound can be obtained

in the usual way, considering the Dirichlet series associated with the arithmetical


function µ2(n)ng(n):

G(s) :=∞∑n=1

µ2(n)ng(n)

ns= (1 + 21−s)

∏p>2

(1 +

2p

p− 2

1

ps

).

The main term appearing in the Euler product decays as 2p−s which is the samedecay for ζ2(s), therefore we write G(s) as ζ2(s)H(s), where

H(s) :=∞∑n=1

h′(n)

ns= (1 + 21−s)(1− 2−s)2

∏p>2

(1 +

2p

p− 2

1

ps

)(1− 1

ps

)2,

which converges absolutely for Re(s) > 1/2. Writing∑

d≤X µ2(d)dg(d) in terms

of d(n) and h′(n), we have∑d≤X

µ2(d)dg(d) =∑uv≤X

d(u)h′(v) =∑v≤X

h′(v)∑

u≤X/v

d(u)

∑v≤X|h′(v)|X

vln(X/v) X

∑v≤X

|h′(v)|v

lnX X lnX

where we have used Proposition 1.15 to bound∑

u≤X/v d(u) and the absolute

convergence of H(s) at s = 1 to conclude that∑

v≤X|h′(v)|v 1.

With this result (3.12) becomes∑d|P

b(d)|Λd| 1

L(z)

∑m≤z

µ2(m)g(m)z

mln(z/m) z ln z

L(z)

∑m≤z

µ2(m)g(m)

m.

The last sum converges to G(2) for z → ∞, and L(z) has order ln2 z, thus theresult follows.

Recall that two primes p, q are called twin primes when p − q = ±2. At thepresent it is not known if the twin primes are an finite/infinite set. Indeed, the firstmodern application of the sieves methods was given by Brun just in connectionwith this problem.

Theorem 3.3 (Brun) Let S2(x, y) denote the number of twin primes couples in(x, x+ y]. Then,

S2(x, y) ≤ 8cy

ln2 y

(1 +O

( ln ln y

ln y

)).

In particular, let π2(x) be the number of twin primes couples in (1, x). Then

π2(x) ≤ 8cx

ln2 x

(1 +O

( ln lnx

lnx

)).


Proof. In fact, let P =∏p≤z p for a z > 1. Let p, p + 2 be a twin primes couple

in (x, x+ y]. There are three possibilities:

1. p 6= 0 (mod pj) and p 6= −2 (mod pj) for every pj dividing P ,2. p = 0 (mod pj) for some pj dividing P ,3. p = −2 (mod pj) for some pj dividing P .

The primes in 1 are evidently at most SB(x, y;P ), those in 2 are π(z) at most,and those in 3 are π(z) at most as well, because p+ 2 is by hypothesis a prime sothat p = −2 (mod pj) implies that p+ 2 = pj . As a consequence

S2(x, y) ≤ 2π(z) + SB(x, y;P )

so that the claim follows by Proposition 3.3 where we have set z =√y/ ln y. The

claim for π2(x) follows immediately, because π2(x) ≤ S2(1, x).

As we see, we ignore if there are infinitely many twin primes, but for sure theycostitute a set of primes having zero density with respect to the full set of primes,because

π2(x)

π(x)=]p : p, p+ 2 ∈ P, p ≤ x

]p : p ∈ P ≤ x x/ ln2 x

x/ lnx→ 0.

Actually, this set is small enough to produce a converging series when their inversesare summed. In fact∑

p : p,p+2∈Pp≤x

1

p≤ π2(x)

x+

∫ x

2

π2(u)

u2du O(1) +

∫ x

2

u

u2 ln2 udu 1.

Let r be a positive, even integer, and let Pr be the set of prime p such thatp + r is a prime, too. In this setting the twin primes correspond to r = 2. Inwhich extent we have to modify our argument in order to get an upper bound alsofor the density of Pr? We notice that the classes 0 (mod p) and −r (mod p) aredistinct if and only if p - r. This means that b(p) equals 1 (not 2) for those oddprimes dividing r. This means that when we repeat the proof of Proposition 3.3we meet a different function Hr(s) which, in terms of the previous one is equal to

Hr(s) = H(s)∏p|rp>2

(1 + 1(p−1)ps

1 + 2(p−2)ps

).

The change involves only a finite set of primes, therefore the argument runs exactlyas before, the unique difference being now that

Hr(0) = H(0)∏p|rp>2

(p− 2

p− 1

).


This proves the following result.

Theorem 3.4 Let r be a positive and even integer; let Sr(x, y) denote the numberof primes p in (x, x+ y] such that also p+ r is a prime. Then,

Sr(x, y) ≤ 8c(r)y

ln2 y

(1 +O

( ln ln y

ln y

)),

where c(r) := c∏

p|rp>2

(p−1p−2

).

Tracing back the proof of this result, we can see that the claim is uniform inr, i.e., that it is correct also when r changes its value with x and/or y. This is animportant remark: for Theorem 3.5 we will need this claim with r y, indeed.

3.4. Two sets with positive density

We conclude the chapter with two typical results which will be further analyzedin next chapter. The first one involves the sum of two primes, a topic which isintimately connected with Goldbach’s conjecture: is it true that every integer Ncan be written as sum of two primes? Since there is only one even prime, thequestion is interesting only when N is even. Apart this restriction, there are manygood reasons to believe that the answer should be positive and that, besides, thenumber of representations should be quite large. In fact, there are N/ lnN primesup to N , hence there are (N/ lnN)2 couples of primes and ≈ 2N integers, thereforewe expect that ‘in mean’ each integer 2N can be represented in 2N/ ln2N differentways. Let

RP+P(2N) :=∑p,q∈Pp+q=2N

1,

then this simple probabilistic model suggests that

(Conjecture) RP+P(2N) 2N

ln2N.

The previous argument assumes that the choice for p and q can be considered asindependent, but this is unrealistic, since the equation p+ q = 2N shows that if pis randomly taken, then q must belong to a well defined class modulo any primedividing 2N . A more refined probabilistic model which takes account of thesedependencies shows that

(refined Conjecture) RP+P(2N)∏p|Np odd

(1 +

1

p

) 2N

ln2N,

where the implicit constant is independent of 2N . The known results are quitedistant from the statement of the conjecture, but still surprising; for example:

92 3.4. TWO SETS WITH POSITIVE DENSITY

1) in ’75 Montgomery and Vaughan proved that the exceptions to the conjecture,i.e. the set of even integers for which the conjecture fails (if it is not empty),contains at most N1−δ integers up to N , where δ is a small positive constant.Several people worked on this problem producing larger and larger values forδ. Up to now the record goes to Pintz, who proved this claim with δ = 0.28in 2018.

2) In ’73 Chen proved that every even (and large enough) integer can be writtenas sum of two primes or as sum of a prime and a product of two primes.

3) In ’37 Vinogradov proved that the analogous conjecture about the representa-bility of an odd integer as sum of three primes (the so called Goldbach’s ternaryproblem) is true for integers larger than m0, and the number of representationsagrees with the conjectured one.

4) The original argument of Vinogradov does not produce an explicit value form0, but later explicit bounds have been found, up to the spectacular resultof Helfgott in 2014 completely solving the problem: one can take m0 = 5,i.e. every odd number larger than 5 may be written as sum of three primenumbers.

Our results in sieve theory are sufficiently strong to prove the following fact.

Theorem 3.5

RP+P(2N)∏p|Np odd

(1 +

1

p

) 2N

ln2N.

In other words, we do not know if 2N can be written as sum of two primes,but if this is possible, then the number of such representations is essentially of thesame order which is foreseen by the conjecture.

Proof. As in Proposition 3.3 take P =∏p≤z p, with z =

√N/ lnN . Let p be a

prime such that 2N − p is a prime as well. There are three cases:

Case 1. p divides P ;Case 2. 2N − p divides P ;Case 3. both p and 2N − p are coprime to P .

An upper bound for primes in Cases 1 and 2 is π(z), while an upper bound forprimes in Case 3 is SB(1, 2N ;P ). Proposition 3.3 gives a bound for SB(1, 2N ;P ),

which we must correct by a factor∏p|Np>2

(p−1p−2

)to take account (as in the proof of

Theorem 3.4) of the fact that in the present case the classes 0 (mod p) and −2N(mod p) coincide when p is a divisor of N .


The proof concludes by noticing that∏1

:=∏

p|N,p odd

(p− 1

p− 2

)and

∏2

:=∏

p|N,p odd

(1 +

1

p

)are of the same size, i.e. that there exist two positive constant c1 < c2 independentof N such that

c1

∏1

≤∏

2

≤ c2

∏1

.

Theorem 3.6 (Schnirelmann) Let SP+P be the set of integers which can be writ-ten as sum of two primes. Then ]SP+P ∩ (0, x) x.

Proof. The result follows quite directly from a clever use of the Cauchy-Schwarzinequality and the upper bound for RP+P we have found in Theorem 3.5. In fact,let ankn=1 be a set of complex numbers, and set δn = 1 when an 6= 0, δn = 0otherwise. Then

k∑n=1

|an|=k∑

n=1

δn|an|≤[ k∑n=1

δ2n

]1/2[ k∑n=1

|an|2]1/2

=[]n : an 6= 0

]1/2[ k∑n=1

|an|2]1/2

thus

(3.13) ]n : an 6= 0 ≥[ k∑n=1

|an|]2/[ k∑n=1

|an|2].

This inequality shows that a lower bound for ]n : an 6= 0 comes from a lowerbound for the `1 norm of the sequence and an upper bound for its `2 norm. In thepresent case we apply the previous idea with an := RP+P(n), so that the numberof an 6= 0 with n ≤ x becomes the number of integers below x which can be writtenas sum of two primes.

The number p + q is lower than X whenever p, q are both lower than X/2.Hence there are π(X/2)2 X2/ ln2X such couples, so that

(3.14)∑n≤X

RP+P(n)∑

p,q≤X/2

1 π(X/2)2 X2/ ln2X.

On the other hand, using the upper bound in Theorem 3.5 we have:∑n≤X

R2P+P(n)

∑n≤X

∏p|np odd

(1 +

1

p

)2 n2

ln4 n X2

ln4X

∑n≤X

∏p|n

(1 +

1

p

)2

(in the last inequality we have used the fact that x/ ln2 x grows for x large enough).

Let f be the arithmetical function which is defined by f(n) :=∏p|n(1 + 1

p

)2. To


get an upper bound for∑

n≤X f(n) we consider the Dirichlet series

∞∑n=1

f(n)

ns=∏p

(1 +

(1 + 1/p)2

ps − 1

)that we write it as ζ(s)G(s). The Euler product defining G(s) converges absolutelyfor Re(s) > 0. Let G(s) be written as Dirichlet series, G(s) =

∑∞n=1 g(n)n−s, then∑

n≤Xf(n) =

∑uv≤X

g(v) =∑v≤X

g(v)

⌊X

v

⌋= X

∑v≤X

g(v)

v+O(

∑v≤X|g(v)|) X,

where the last bound follows by the fact that the series defining G(s) convergesabsolutely for Re(s) > 0.In this way we have proved that

(3.15)∑n≤X

R2P+P(n) X3

ln4X.

Now we can cast (3.14) and (3.15) into (3.13), to get

]SP+P ∩ (0, x) x4/ ln4 x

x3/ ln4 x x.

An analogous argument proves the following result.

Theorem 3.7 (Romanoff) Let SP+2N be the set of integers which can be writtenas sum of a prime and a power of 2. Then ]SP+2N ∩ (0, x) x.

In some sense it is still more surprising than Schnirelmann’s result: adding tothe zero density set of primes an even thinner set (the powers of 2) we still get aset of positive density.

Proof. Again we use the Cauchy-Schwartz inequality to deduce the claim froman upper bound for

∑n≤X R

2P+2N

(n) and a lower bound for∑

n≤X RP+2N(n).

The lower bound can be deduced easily: there are π(X/2) primes up to X/2 andlnX powers of 2 below X/2, therefore

(3.16)∑n≤X

RP+2N(n)∑p≤X/22y≤X/2

1 π(X/2) lnX X.

Besides, we have ∑n≤X

R2P+2N(n) =

∑p1,p2,k,jp1+2k≤Xp2+2j≤X

p1+2k=p2+2j

1.


The contribution to this sum of the ‘diagonal’ terms, i.e. those terms with k = j(and hence p1 = p2) is X since there are π(X) ∼ X/ lnX choices for p1 andlnX choices for k. The non-diagonal terms contribute by∑

k<j≤ln2X

∑p1,p2≤X

p1−p2=2j−2k

1 =∑

k<j≤ln2X

π2(X, 2j − 2k)

where π2(X, r) denotes the number of couples p, q of primes lower than X suchthat p− q = r. Using the result in Theorem 3.4 we obtain∑

k,jk<j≤ln2X

∑p1,p2≤X

p1−p2=2j−2k

1∑k,j

k<j≤ln2X

∏p|2j−k−1p odd

(1 +

1

p

) X

ln2X

X lnX

ln2X

∑m≤ln2X

∏p|2m−1p odd

(1 +

1

p

).

By multiplicativity∏p|2m−1p odd

(1 + 1

p

)=∑

d|2m−1d odd

µ(d)2

d ≤∑

d|2m−1d odd

1d , thus

X

lnX

∑m≤ln2X

∑d|2m−1d odd

1

d.

At last, exchanging the order of the sums and recalling the contribution of thediagonal term we get∑

n≤XR2P+2N(n) X

lnX

∑d≤Xd odd

1

d

∑m≤ln2Xd|2m−1

1 +O(X).

Let h2(d) be the order of 2 modulo d, then∑

m≤ln2Xd|2m−1

1 lnX/h2(d), therefore

∑n≤X

R2P+2N(n) X

∑d≤Xd odd

1

dh2(d)+O(X).

The difficult part of the proof is proving that∑

d odd1

dh2(d) < +∞. The original

proof of this fact was quite difficult, but Erdos simplified it in ’50, and now it runsas follow. We notice that the definition of h2(d) implies that d|(2h2(d) − 1). Thus,set

an :=∑d oddh2(d)=n

1

d,


and let Q :=∏n≤X(2n − 1). If h2(d) ≤ X, then d|Q, so that∑

n≤Xan ≤

∑m|Q

1

m≤∏p|Q

(1 +

1

p+

1

p2+ · · ·

)=

Q

ϕ(Q) ln lnQ

where for the last step we have used the bound ϕ(n)n 1

ln lnn which holds as n

diverges6.

The definition of Q shows that Q ≤ 2∑n≤X n 2X

2, therefore∑

n≤Xan lnX.

By partial summation we conclude that∑d≤Xd odd

1

dh2(d)=∑n≤X

ann< +∞.

In this way we have proved that

(3.17)∑n≤X

R2P+2N(n) X.

By (3.16), (3.17) and Cauchy-Schwartz’s trick (3.13), we finally conclude that

]SP+2N ∩ (0, x) x2

x x.

Let d and d be respectively the liminf and the limsup of the quotient

1

x]SP+2N ∩ (0, x)

6Its proof is the following. We split the set of primes dividing n into two sets, those ones whichare ‘small’ (i.e. smaller than lnn) and those one which are ‘large’ (i.e. larger than lnn), getting

ϕ(n)

n=∏p|n

(1− 1

p

)=

∏p|n

p<lnn

(1− 1

p

)·∏p|n

p≥lnn

(1− 1

p

).

The first term∏

p|np<lnn

(1− 1

p

)is 1

ln lnn, by Mertens’ result (see Prop. 1.14). Let p1, . . . , pρ be

the distinct prime factors of n which are ≥ lnn. Then

lnρ n ≤ p1 · · · pρ ≤ nproving that ρ is bounded by lnn/ ln lnn. Therefore we have∏

p|np≥lnn

(1− 1

p

)≥(

1− 1

lnn

)ρ≥ exp

( lnn

ln lnnln(1− 1

lnn

))= exp(−(1 + o(1))/ ln lnn) 1

and the claim follows. Actually, a more precise result is known, for example see [HW] Th. 328,pages 267 and 353.


as x→∞. Romanov’s theorems shows that d > 0, and the d ≤ 1/2 is immediate,since only even integers that can be represented in that way are x/ lnx (for themp either 2, or k is 1). Van der Corput and Erdos proved that d < 1/2, strictly.Recent results of Pintz7, Habsieger and Roblot8, and Elsholtz and Schlage-Puchta9,proved that

0.107648 < d ≤ d < 0.49095.

Based on some numerical evidence and a probabilistic model, Romani conjecturedthat d = d = 0.434 . . . .

7J. Pintz, A note on Romanov’s constant, Acta Math. Hungar. 112(1-2), 2006, p. 1–14.8L. Habsieger and X.-F. Roblot, On integers of the form p+2k, Acta Arith. 122(1), 2006, p. 45–50.9C. Elsholtz and J.-C. Schlage-Puchta, On Romanov’s constant, Math. Z. 288(3-4), 2018, p. 713–724.


Exercise. 3.3 The following steps provide the asymptotic behavior for the sum∑n≤X µ(n)2/ϕ(n).

1) Let cn := nµ(n)2/ϕ(n) and let F (s) :=∑

n cn/ns. Prove that

F (s) = ζ(s)G(s), with G(s) :=∏p

(1 +

1

(p− 1)ps− 1

(p− 1)p2s−1

)σ > 1.

The Euler product defining G(s) converges absolutely in Re(s) > 1/2, the-refore the previous equality gives the meromorphic continuation of F (s) inRe(s) > 1/2, with a unique, simple, pole at s = 1, coming from the pole of ζ.The residue of F (s) at s = 1 is 1, since G(1) = 1.

2) Let g(n) denote the Dirichlet coefficients of G(s), so that

cn =∑k|n

g(k).

Deduce that∑n≤X

cn =∑n≤X

∑k|n

g(k) =∑k≤X

g(k)

⌊X

k

⌋= X

∑k≤X

g(k)/k +O(∑k≤X|g(k)|).

3) Prove that:∑k≤X

g(k)/k → 1 as X →∞,

∑k≤X|g(k)| ≤ Xθ

∑k≤X|g(k)|k−θ Xθ ∀θ > 1/2,

∑k>X

|g(k)|/k ≤ Xθ−1∑k>X

|g(k)|/kθ Xθ−1 ∀θ > 1/2,

so that ∑n≤X

cn = X +Oθ(Xθ), ∀θ > 1/2.

4) Let dn := µ(n)2/ϕ(n), so that cn = ndn. The result in Step 3 can be writtenas ∑

n≤Xn(dn − 1/n) = Oθ(X

θ) ∀θ > 1/2.

By partial summation deduce that∑n≤X

(dn − 1/n) = O(1).


Conclude that ∑n≤X

dn = lnX +O(1).

Exercise. 3.4 The following steps provide a different argument giving the asymp-totic behavior for the sum

∑n≤X µ(n)2/ϕ(n).

1) Repeat Step 1 in Ex. 3.3, i.e. let cn := nµ(n)2/ϕ(n), let F (s) :=∑

n cn/ns

and prove that

F (s) = ζ(s)G(s), with G(s) :=∏p

(1 +

1

(p− 1)ps− 1

(p− 1)p2s−1

)σ > 1.

The Euler product defining G(s) converges absolutely in Re(s) > 1/2, the-refore the previous equality gives the meromorphic continuation of F (s) inRe(s) > 1/2, with a unique, simple, pole at s = 1, coming from the pole of ζ.The residue of F (s) at s = 1 is 1, since G(1) = 1.

2) Let σ > 0. Prove that 22πi

∫(σ)

Xs

s(s+1)(s+2) ds = (1−1/X)2 if X > 1, 0 otherwise.

3) Using the previous identity, prove that

H(X) :=2

2πi

∫ σ+i∞

σ−i∞F (s)

Xs

s(s+ 1)(s+ 2)ds =

∑n≤X

cn(1− n/X)2,

for every σ > 1.

4) Let σ > 0. Using the representation ζ(s) = 1s−1 + 1

2 − s∫∞

1B1(x)xs+1 dx prove

that ζ(s)− 1s−1 (1 + |t|) uniformly in Re(s) > 1/2.

5) Let σ > 1. Write H(X) as Σ1 + Σ2 + Σ3, where

Σ1 =2

2πi

∫ σ+i∞

σ−i∞

1

s− 1

Xs

s(s+ 1)(s+ 2)ds

Σ2 =2

2πi

∫ σ+i∞

σ−i∞

(ζ(s)− 1

s− 1

) Xs

s(s+ 1)(s+ 2)ds

Σ3 =2

2πi

∫ σ+i∞

σ−i∞(F (s)− ζ(s))

Xs

s(s+ 1)(s+ 2)ds.

Note that Σ1 = X/3 + O(1) (move σ → −∞ and collect the residues at 1, 0,−1 and −2). Use Step 4 to prove that Σ2 and Σ3 both are O(Xθ) for everyθ > 1/2 (move the integral line to Re(s) = θ). Therefore,

H(X) =X

3+Oθ(X

θ), ∀θ ∈ (1/2, 1).


6) Let 1 < a′ < a < 2 be parameters, and consider the linear combination

K(X) := (1 + b′ + b′′)−1(H(aX) + b′H(a′X) + b′′H(X)),

where

b′ := −a′2

a2

a− 1

a′ − 1, b′′ := −1

a+a′

a2

a− 1

a′ − 1.

Note that 1 + b′ + b′′ = (1 − 1/a)(1 − a′/a) thus it is strictly positive. Provethat

(3.18)∑n≤X

cn +∑

X<n≤a′Xcn

(1− naX )2 + b′(1− n

a′X )2

1 + b′ + b′′

+∑

a′X<n≤aXcn

(1− naX )2

1 + b′ + b′′= K(X) =

a+ a′b′ + b′′

1 + b′ + b′′· X

3+O

(1 + |b′|+ |b′′|1 + b′ + b′′

Xθ).

7) Note that cn ≤ n/ϕ(n) =∏p|n(1− 1/p)−1, so that (use Mertens)

cn ≤ exp(−∑p|n

ln(1− 1/p)) = exp(∑p|n

1/p+O(1)) exp(∑p≤n

1/p) lnn.

8) Use the fact that cn ≥ 0 to deduce from (3.18) that∑n≤X

cn +O((1− 1

a)2 + |b′|(1− 1a′ )

2

1 + b′ + b′′

∑X<n≤a′X

cn

)+O

( (1− a′

a )2

1 + b′ + b′′

∑a′X<n≤aX

cn

)=a+ a′b′ + b′′

1 + b′ + b′′· X

3+O

(1 + |b′|+ |b′′|1 + b′ + b′′

Xθ)

and from the bound in Step 7 deduce that∑n≤X

cn +O((1− 1

a)2 + |b′|(1− 1a′ )

2

1 + b′ + b′′(a′ − 1)X lnX

)+O

( (1− a′

a )2

1 + b′ + b′′(a− a′)X lnX

)=a+ a′b′ + b′′

1 + b′ + b′′· X

3+O

(1 + |b′|+ |b′′|1 + b′ + b′′

Xθ).

In this equation take a = 1 +X−η, a′ = 1 +X−η/2 for an η > 0, and deducethat ∑

n≤Xcn = X +O(X1−η lnX) +O(Xθ+2η).

Setting θ = 1/2 + ε and η = 1/6, conclude that∑n≤X

cn = X +Oε(X56 +ε) ∀ε > 0.


9) Repeat the Step 4 of Ex. 3.3 to deduce that∑n≤X

µ(n)2/ϕ(n) = lnX +O(1).

Remarks:

1. This exercise reaches the same conclusion of Ex. 3.3, with a more complicateargument: where is its convenience? The techniques employed here are moresofisticate and could be used to get stronger conclusions. For example we coulddeduce an explicit representation of the terms which here are represented asO(1) (see [MV], Ex. 2.1.17).

2. The complicated definition of K(X) in Step 6 comes from the necessity towrite a linear combination of H computed at multiples of X such that for thissum the coefficient of cn with n ≤ X be equal to 1.

3. In Step 3 we have used the kernel Xs

s(s+1)(s+2) , which produces the complicated

weighted sum∑

n cn(1− n/X)2, because in Step 4 only the very poor boundζ(s) 1+ |t| is proved, so that we need a term of order s3 at the denominator

of the integral∫ σ+i∞σ−i∞ in order to ensure the convergence. Actually ζ(s) grows

along the vertical lines in σ > 0 at most as |t|1/2+ε, thus a better kernel Xs

s(s+1)

(giving∑

n cn(1 − n/X)) could be used, but the proof of this stronger resultabout ζ(s) involves the Lindelof theorem and is more difficult.With a bit of care we could also use the kernel X

s

s (giving directly∑

n≤X cn),

but then the existence of the integral∫ σ+i∞σ−i∞

F (s)Xs

s ds is more delicate, becausethe integrand is no more absolutely integrable.

Exercise. 3.5 Prove that∑n≤X

1

ϕ(n)= c lnX +O(1) with c :=

∏p

(1 +

1

p(p− 1)

)= 1.943596 . . . .

Hint: repeat the argument in Ex. 3.3, the value of c has been computed using theargument in Ex. 1.48.

Exercise. 3.6 Recall that σ1(n) = (1 ∗ I)(n) =∑

d|n d. Prove that

∑n≤X

σ1(n)

ϕ(n)= cX +O(1) with c :=

∏p

(1 +

2p2 − 1

p(p+ 1)(p− 1)2

)= 1.075822 . . . .

Hint: repeat the argument in Ex. 3.3. The value of c has been computed usingthe argument in Ex. 1.48.


Exercise. 3.7 Recall that σa(n) = (1 ∗ Ia)(n) =∑

d|n da. Let a1, . . . , au and

b1, . . . , bv be positive integers, and let f(n) :=∏i σai(n)/

∏j σbj (n). Suppose that

a1 + a2 + · · ·+ au = b1 + b2 + · · ·+ bv. Prove that∑

n≤X f(n) = cX +O(1) witha positive constant c

c :=∏p

(1 +

∞∑k=1

∏i(p

a1(k+1) − 1)∏j p

b1(k+1) − 1)

1

pk

)(1− 1

p

).

Remark: the product defining c converges, since∑

i ai =∑

j bj .Hint: repeat the argument in Ex. 3.3.

Exercise. 3.8 Recall that an integer n is r-power free when pr - n for every primep. Prove that ∑

n≤Xr-power free

1 ∼ X

ζ(r).

Hint: Use the identity in Ex. 1.17 and repeat the argument in Ex. 3.3. Alterna-tively, imitate the approach suggested for Ex. 1.31.

Exercise. 3.9 Let a, b be integer numbers, with 1 < a ≤ b. Let

Ra,b := N ∈ N : if pa|n, then pb|n, for every prime p.In other words, Ra,b is the set of integers which are divisible by pb whenever aredivisible by pa. Prove that

F (s) :=∑

n∈Ra,b

1

ns= ζ(s)G(s), with G(s) :=

∏p

(1− 1

pas+

1

pbs

)σ > 1.

Prove that ∑n≤Xn∈Ra,b

1 ∼ cX, where c := G(1) =∏p

(1− 1

pa+

1

pb

).

Remark: What happens for a = b? And for b =∞?Hint: repeat the argument in Ex. 3.3.

Exercise. 3.10 Take b = 2a in the previous exercise, so that

c := G(1) =∏p

(1− 1

pa+

1

p2a

)=∏p

(1 +

ω

pa

)(1 +

ω2

pa

),

where ω is a non-trivial cubic root of unity. How you can use this equality toquickly compute c with 10 correct digits? Test your computations with a = 2.Hint: Use the Prime zeta function as in Ex. 1.47–1.48; the value for a = 2 is0.66922021803 . . ..

Chapter 4

Sumsets

Given two sets A and B in N ∪ 0, we denote by A + B the set a + b : a ∈A, b ∈ B, which is called sumset of A and B. Intuitively, we expect that A+ Bbe in some sense larger than A and B alone. Nevertheless, this in not alwaystrue. For example, when A = B = N, then evidently A + B = N too, and moregenerally, if A and B are the same arithmetical sequence nq : n ∈ N for a fixedq, then A + B = nq : n ∈ N. As we see, the sumset can be not larger than thesingle sets, but this happens only when in A (and B) there is a ‘structure’ of somekind (for example, in the previous cases A and B are semigroup). This is subjectof an intense research nowadays, with astonishing consequences and applications(for example the results of Green and Tao about the existence of linear sequencesof arbitrary length in the sequence of prime numbers is an application of some ofthese ideas, and Bourgain has used these techniques to get amazing consequencesabout the cancellation in a class of short exponential sums). For the moment wedo not pursuit the study of these ideas1, but we stress that from the qualitativepoint of view A+B is ‘small’ (with respect to the individual A and B) only if in Aand B there is a structure, a fact which is not in anyway ‘typical’. In other words,for ‘generic’ A and B we expect that A+ B be large.

There are many ways to measure the ‘size’ of a set of integers A; one of themost useful, although not very intuitive, is Shnirelman’s density, which is definedas follows. Let A(n) := ]a ∈ A, 1 ≤ a ≤ n. Note that A(n) counts the numberof positive integers in A but that we do not assume that 0 6∈ A: simply we do notcount 0 even in case 0 ∈ A. Then

Shnirelman density: δA := infn≥1A(n)/n.

A minor role, for the moment at least, is held by the asymptotic densities:

Lower density: dA := lim infn→∞A(n)/n,

Upper density: dA := lim supn→∞A(n)/n.

Note that 0 ≤ δA ≤ 1 and that if 1 ≤ k 6∈ A, then δA ≤ 1 − 1/k. In particularδA = 1 only if N ⊆ A and δA = 0 if 1 6∈ A. This means that δA is sensible tothe whole set A: modifying A also in a unique member, its Shnirelman densitygenerally changes: this is an important difference with the asymptotic densities.

Lemma 4.1 Suppose that 0 ∈ A∩B and that A(n)+B(n) ≥ n for a given n. Thenthere exist a ∈ A and b ∈ B, such that a+ b = n.

1but the interested reader can consult the paper of Imre Z. Ruzsa: Generalized arithmeticalprogressions and sumsets, Acta Math. Hungarica 65(4), 379–388, 1994, and the astonishingbook of Terence Tao and Van Vu [TV].

103

104 CAP. 4: SUMSETS

Proof. This is an application of the pigeon-hole principle. By hypothesis 0 ∈A∩ B, so that A and B contain A(n) + 1 and B(n) + 1 integers, respectively. Let0 = a0 < a1 < a2 < · · · < aA(n) ≤ n and 0 = b0 < b1 < b2 < · · · < bB(n) ≤ n bethe elements of the two sets. The numbers n− bj are B(n) + 1 and are in [0, n]. In[0, n] there are n+ 1 integers and by hypothesis (A(n) + 1) + (B(n) + 1) ≥ n+ 2 so

that in ajA(n)j=0 and n− bjB(n)

j=0 there is at least one common integer, i.e. thereare indexes u and v such that au = n− bv, which is the claim.

The lemma has an immediate consequence.

Proposition 4.1 Suppose that 0 ∈ A∩B and that δA+δB ≥ 1, then each integer canbe written as a sum of an element in A and an element in B, i.e. A+B = N∪0.

Proof. The claim follows by the previous lemma, because A(n) ≥ δAn and B(n) ≥δBn for every n. Thus, A(n) +B(n) ≥ (δA + δB)n ≥ n for every n.

Proposition 4.2 Suppose that 0 ∈ A ∩ B. Then

δA+B ≥ δA + δB − δAδB.

Proof. The claim is evident if δA or δB is zero, hence we can assume that δA, δB >0; in particular we can assume that 1 ∈ A∩B. Let 1 = a1 < a2 < . . . < aA(n) ≤ nbe the sequence of integers in A∩ [1, n]. For every index j, let gj := aj+1 − aj − 1and gA(n) := n− aA(n). Each gj gives the size of the gap between aj and aj+1, inA (and of the gap between n and aA(n), respectively). This gap produces at leastB(gj) numbers in A+B, since if b ∈ B ∩ [1, gj ] then aj + b ∈ A+B. The numbersproduced in this way with different js are distinct. We also have A ⊆ A + B,because 0 ∈ B. Also these ones are distinct from the previous ones, therefore

(A+B)(n) ≥ A(n) +

A(n)∑j=1

B(gj) ≥ A(n) + δB

A(n)∑j=1

gj

= A(n) + δB(n−A(n)) = A(n)(1− δB) + nδB

≥ n(δA(1− δB) + δB).

The claim follows, because n is arbitrary.

We notice that the conclusion of the previous proposition can be written also as

1− δA+B ≤ (1− δA)(1− δB).

In this form the claim can be extended by induction to an arbitrary number ofsets, in particular it can be extended to the case where several copies of the sameset A are considered, thus proving that

(4.1) 1− δhA ≤ (1− δA)h, i.e. δhA ≥ 1− (1− δA)h, ∀h.

CAP. 4: SUMSETS 105

It is customary to call additive basis a set A for which there exists an integer hsuch that N ⊆ hA, i.e. such that every integer can be written as sum of h elementsof A, and the smallest h with N ⊆ hA is called order of the basis.

Theorem 4.1 (Shnirelman) Suppose that 0 ∈ A and than δA > 0. Then A is abasis.

Proof. In fact, by (4.1) there exists h such that δhA ≥ 1/2, so that the claim for2hA = hA+ hA follows from Lemma 4.1.

How can we prove that Shnirelman’s density of a set A is positive? Thefollowing fact is a simple criterium:

δA > 0 iff 1 ∈ A and dA > 0.

Note that the claim does not state that δA = dA, thus applying this criterium wecan prove that δA > 0 and that therefore every integer is a sum of a fixed numberh of integers in A, but we have not an immediate way to compute h from dA.

Exercise. 4.1 Prove that there exists h such that every integer can be written assum of h squarefree numbers.

An application, which was devised by Shnirelman himself and was the firstindication toward the Goldbach conjecture: the set of primes P has asymptoticdensity equal to zero, but the set P + P (the set of integers which can be writtenas sum of two primes) has a positive asymptotic density (this is exactly the claimof Theorem 3.6). Adding 0 and 1 to this set we have a new set having positiveShnirelman’s density, so that we have proved that

Corollary 4.1 (Shnirelman) There exists an integer h such that every integer ncan be written as sum of w primes and h− w ones for some w ∈ 0, 1, 2, . . . , h.

Later we will see that it is possible to write every n as sum of a finite setof primes alone (i.e., without any need of adding ones) if n is large enough (seeCorollary 4.2).

The claim in Proposition 4.2 has already served for our purposes, but it isnot-optimal. For example, when δA ≥ 1/2 we know from Lemma 4.1 that δ2A = 1,while the Inequality in (4.1) only says that δ2A ≥ 3/4. Actually, the followingstronger result holds.

Theorem 4.2 (Mann) Let 0 ∈ A ∩ B. Then

δA+B ≥ min1, δA + δB.

106 CAP. 4: SUMSETS

This inequality was conjectured by Khinchin and after some partial results whichwas obtained by several peoples2 it was finally proved by Mann3.

Suppose we know the following fact.

Lemma 4.2 (Dyson) Let A and B be two collections of integers in [0, n], with0 ∈ A ∩ B. Suppose that there exists c ∈ (0, 1] such that

(4.2) A(m) +B(m) ≥ cm, ∀m ≤ n,then

(4.3) (A+B)(m) ≥ cm, ∀m ≤ n.

Then the theorem immediately follows, since under the hypotheses of the theoremwe can take c = min1, δA + δB in (4.2). Therefore our real task is proving thelemma.

Proof. We proceed by induction on n. When n = 1 the unique value for m is1 and (4.2) in this case says that A(1) + B(1) ≥ c, i.e. that 1 ∈ A ∪ B. Hence1 ∈ A+ B (because 0 ∈ A ∩ B) so that (A+B)(1) = 1 ≥ c which is (4.3).Now suppose the claim of the theorem holds for every n′ < n, but that it is false forn. Choose a counterexample A, B with B(n) as small as possible. Then B(n) ≥ 1,otherwise B would be equal to 0, (A+B)(m) would be equal to A(m) and (4.3)would hold since it coincides with (4.2). We construct two sets A′, B′ with

i. A′(m) +B′(m) ≥ cm for every m ≤ n,

ii. A′ + B′ ⊆ A+ B,

iii. B′(n) < B(n).

Together, these facts prove that A′, B′ are a counterexample to the claim with n(exactly as A, B) but with B′(n) < B(n), which is impossible since it contradictsthe minimality of B.For every a ∈ A, let B′′(a) := b ∈ B, a + b 6∈ A and let a0 be the smallestinteger for which B′′(a) 6= ∅ (such a0 exists, since A+B 6⊆ A, for example becausemaxA+ maxB 6∈ A, being B 6= 0). For sake of brevity we denote B′′(a0) byB′′. The minimality of a0 implies that for every r < a0

(4.4) b+ (A ∩ [0, r]) ⊆ A, ∀b ∈ B,2Khinchin himself proved that the claim holds under the restriction δA = δB ≤ 1/2 (Zur additivenZahlentheorie Rec. Math. Soc. Math. Moscou 39(3), 27–34, 1932), Erdos proved the inequalityδA+B ≥ δA + 1

2δB (On the asymptotic density of the sum of two sequences, Annals of Math.

42(1), 65–68, 1942) and Besicovitch proved a slightly weaker inequality (On the density of thesum of two sequences of integers, J. London Math. Soc. 10, 246–248, 1935).

3The original proof was quite complicated; I reproduce here the proof as it is given in [P] andwhich is due to F. Dyson. For several extensions see [HR].

CAP. 4: SUMSETS 107

thus

(4.5) (A+ B) ∩ [0, r] ⊆ A.Let now

A′ := A ∪ ((a0 + B′′) ∩ [0, n]), B′ = B\B′′.Note that the union defining A′ is disjoint, since the definition of B′′ ensures thatthe a0 + b′′ 6∈ A for every b′′ ∈ B′′. With these definitions, 0 ∈ A ⊆ A′, and 0 ∈ B′as well, since 0 ∈ B and 0 6∈ B′′. Now, A+ B′ ⊆ A+ B and

(a0 + B′′) + B′ = a0 + b′′ + b′, b′′ ∈ B′′, b′ ∈ B′⊆ (a0 + b′) + b, b ∈ B, b′ ∈ B′ ⊆ A+ B,

because a0 + b′ ∈ A (because b′ 6∈ B′′). This proves that A′ + B′ ⊆ A + B whichis ii). Moreover, condition iii) is immediate since B′′ 6= ∅, thus we last to prove i).Recall that the union defining A′ is disjoint, therefore

A′(m) +B′(m)

= A(m) + ]b ∈ B′′, 1 ≤ a0 + b′′ ≤ m+B(m)− ]b ∈ B′′, 1 ≤ b′′ ≤ m= A(m) +B(m)− ]b ∈ B′′, m− a0 + 1 ≤ b′′ ≤ m≥ A(m) +B(m)− ]b ∈ B, m− a0 + 1 ≤ b ≤ m.(4.6)

Let b0 be the smallest positive number in B∩[m−a0+1,m]. If this integer does notexist then the last set in the previous bound is empty and the inequality becomesA′(m) +B′(m) ≥ A(m) +B(m) and i) follows from (4.2). Hence, suppose that b0exists. Then (4.6) implies that

A′(m) +B′(m) ≥ A(m) +B(b0 − 1).

Write m = b0 + r, so that 0 ≤ r < a0 ≤ n. By inductive hypothesis (here we usethe fact that r < n)

(A+B)(r) ≥ cr.Since 0 ∈ A ∩ B and c ≤ 1,

]((A+ B) ∩ [0, r]) = 1 + (A+B)(r) ≥ 1 + cr ≥ c(r + 1).

Moreover, since r < a0, this result together (4.5) shows that ](A∩ [0, r]) ≥ c(r+1).But (4.4) implies that [b0, b0 + r] contains at least as many elements of A as [0, r],so that

](A ∩ [b0, b0 + r]) ≥ c(r + 1).

Consequently

A′(m) +B′(m) ≥ A(m) +B(b0 − 1)

= A(b0 + r)−A(b0 − 1) +A(b0 − 1) +B(b0 − 1)

≥ c(1 + r) + c(b0 − 1) = c(b0 + r) = cm.

108 CAP. 4: SUMSETS

Exercise. 4.2 Let A ⊆ N ∪ 0 be an additive basis with δA > 0. Using (4.1),deduce that the order hA of A is bounded by

hA ≤⌈−2 ln 2

ln(1− δA)

⌉.

The inequality proved by Mann (Th. 4.3) improves this bound to

hA ≤⌈

1

δA

⌉.

An asymptotic basis is a set A ⊆ N∪0 for which there exists an integer h suchthat every integer which is sufficiently large can be written as sum of h elementsin A. As we see, this notion is weaker than that one of basis, every basis beingalso an asymptotic basis. It is very frequent for a set A to be an asymptotic basiswithout being a basis, and even for sets being basis in the ordinary sense, theirdimension as asymptotic basis can be considerably smaller than their dimension asordinary basis4. This means that there is a considerable interest about the weaknotion of basis and its relation with the ordinary notion. Theorem 4.3 here belowgives a way to reckon an asymptotic basis, deducing the claim from Shnirelman’sTheorem 4.1 and from the following general result.

Proposition 4.3 (Schur) Let 0<a1<a2<· · ·<ak be integers, with (a1, a2,. . ., ak)=1. Then there exists nk ∈ N such that the equation

x1a1 + · · ·+ xkak = n

has a solution (x1, . . . , xk) ∈ Nk for every n ≥ nk.

We provide two independents proofs of this result. The first one is constructive,the second one is shorter. This simple problem actually is very complicated: theexact dependence of nk on ajs is known only for k = 2.5

Proof. The proof is by induction on k and as a first step we prove that n2 = a1a2

has the desiderated property for k = 2. In fact, let n ≥ a1a2. The coprimalityassumption (a1, a2) = 1 shows that

x1a1 + x2a2 = n

has a solution with x1, x2 ∈ Z. Evidently it is impossible that x1, x2 be bothnegative, and the claim is already true if both are nonnegative, thus assume thatx1 < 0 < x2 (the case x2 < 0 < x1 can be treated with a similar argument). Then

4For example, every integer can be written as sum of nine cubes but there are only a finite setof integers requiring nine terms, being proved that every integer large enough can be alreadyrepresented with seven cubes.

5 see A. Nijenhuis and H. Wilf, Representations of integers by linear forms in nonnegative integers,J. Number Theory 4, 98-106, 1972.

CAP. 4: SUMSETS 109

x2 > n/a2 ≥ a1. The transformation (x1, x2) → (x1 + a2, x2 − a1) produces anew solution, with a positive second component, and a first component which isstrictly greater than the previous one. We can iterate this argument until also thefirst argument becomes nonnegative.Now, let k ≥ 3 and suppose we have already proved the claim for k − 1. LetD := (a1, . . . , ak−1) and a′j := aj/D for j ≤ k − 1. By hypothesis (D, ak) = 1,therefore the equation

xD + xkak = n

has a nonnegative solution if n ≥ Dak. Suppose that xk ≥ 0 but x < n′k−1,where n′k−1 denotes the constant nk−1 associated with the numbers a′1, . . . , a

′k−1

whose existence is proved by the inductive hypothesis. Then xk > (n−Dn′k−1)/akwhich is greater than D whenever n ≥ D(ak +n′k−1). Thus, let nk be this numberD(ak+n′k−1), then xk is greater than D whenever x is lower than n′k−1, so that thetransformation (x, xk)→ (x+ak, xk−D) gives a new solution with a nonnegativevalue for xk and a strictly larger value for x. Iterating this process we get a solutionwith a positive xk and an x not lower than n′k−1. By induction the system

x1a′1 + · · ·+ xk−1a

′k−1 = x

admits a nonnegative solution, thus we get a nonnegative solution of the originalequation, in the form

x1a1 + · · ·+ xkak = D(x1a′1 + · · ·+ xk−1a

′k−1) + xkak = Dx+ xkak = n.

Exercise. 4.3 Prove that if the integers aj are pairwise coprime, then in theprevious proposition we can take nk =

∏j≤k aj .

A different proof of Proposition 4.3.

Proof. Let y1, . . . , yk ∈ Z be such that y1a1 + · · ·+ykak = 1; such a set of integersexists because the hypothesis implies that the ideal generated by a1, a2, . . . , ak is

trivial. Let Nk :=∑k

j=1 |yj |aj and let n ≥ Nk(Nk − 1). Write n as n = qNk + r,with 0 ≤ r < Nk. In this representation we have q ≥ Nk − 1, otherwise we haven = qNk + r ≤ (Nk − 2)Nk +Nk − 1 ≤ N2

k −Nk − 1 contradicting the assumptionabout n. Then

n = qNk + r = q

k∑j=1

|yj |aj + r

k∑j=1

yjaj =

k∑j=1

(q|yj |+ ryj)aj

and in this representation each q|yj | + ryj is nonnegative, since q ≥ Nk − 1 ≥ r.This argument proves the claim with nk := Nk(Nk − 1).

Theorem 4.3 Let A ⊆ N ∪ 0, with 0 ∈ A. If dA > 0 and GCD(A) = 1, then Ais an asymptotic basis.

110 CAP. 4: SUMSETS

Proof. The Z-ideal generated by A is a principal ideal actually generated byGCD(A). Z being a notherian ring, there exists a finite set a1, . . . , ak ∈ A suchthat 1 = GCD(A) = (a1, . . . , ak). Let n0 be large enough so that

n0 =

k∑j=1

ajxj , n0 + 1 =

k∑j=1

ajx′j ,

have both solutions in nonnegative integers xj , x′j (here we use Proposition 4.3).

Let ` := max∑k

j=1 xj ,∑k

j=1 x′j; the previous equalities show that n0 and n0 + 1

belong to À (also in case some xj or x′j is zero, because 0 ∈ A by hypothesis).

Let A′ := (À− n0) ∩N (i.e. the set n ∈ N : n+ n0 = a′1 + · · ·+ a′`, a′j ∈ A∀j).

Then 1 ∈ A′ and dA′ = dÀ ≥ dA > 0 (because 0 ∈ A implies that A ⊆ À).Therefore A′ is a basis. Let h1 be the order of A′. Then

N ⊆ h1A′ ⊆ h1(À− n0) = h1À− h1n0.

This proves that every integer ≥ h1n0 is a sum of h1` integers in A.

As a corollary, from Theorem 3.6 we can finally deduce that

Corollary 4.2 P is an asymptotic basis, i.e. there exists h such that every integerlarge enough can be written as sum of h primes, at most.

The original version of this result does not quantify explicitly the constant h.Later the constant has been estimated by several authors and now it is known thath ≤ 4 (see Table 1 here below).

Exercise. 4.4

1) Let A be an asymptotic basis and let h be its dimension. Prove that

](A ∩ [0, N ]) N1/h N →∞.

2) Let P be an integer and let AP be the set of integers multiplicatively generatedby primes dividing P , i.e.

AP := n ∈ N : every prime p dividing n is a divisor of P.

Prove that AP is not an asymptotic basis.

Hint: for the first point, let Rh(n) be the number of representations of n as sumof h elements in A, and prove that (]A∩ [0, N ])h ≥

∑n≤N Rh(n) ≥ N − c, where c

is a constant (independent of N). For the second step, prove that ](A∩ [0, N ]) ≤lnω(P )N , where ω(P ) is the number of prime divisors of P .

CAP. 4: SUMSETS 111

Exercise. 4.5 Let a, b be integer numbers, with 1 < a ≤ b. The asymptotic densityof the set

Ra,b := N ∈ N : if pc|n with c ≥ a, then pb|n, for every prime phas been computed in Ex. 3.9. What we can say about its Shnirelman’s density?How the density depends on a and b? There is some monotonicity in these para-meters? Is it possible to compute their value?Remark: This is an exercise only in a broad sense: actually it would be quali-fied more properly as Research Project, since some of these questions are not wellunderstood. The case b = ∞ (i.e., for a-power free integers) has been quite ex-tensively studied, but it is still open in several aspects. See Diananda–Subbarao,On the Shnirelman density of the k-free integers, Proc. Amer. Math. Soc. 62(1),1976, 7–10, and Erdos–Hardy–Subbarao, On the Shnirelman density of k-free in-tegers, Indian J. Math. 20(1), 1978, 45–56.Some non trivial lower bounds for the Shnirelman density of Ra,b may be deducedfrom a general result by Siva Rama Prasad–Bhramarambica, On the Shnirelmandensity of M -free integers, Fibonacci Quart. 27(4), 1989, 366–368.

112 CAP. 4: SUMSETS

Table 1. table of records for Goldbach-type problems.d

imen

sion

asas

ym

pto

tic

basi

s,i.

e.as

bas

isfo

rin

teger

sla

rge

enou

gh

dim

ensi

on

asb

asi

sfo

rin

tege

rs≥

6

= 3? 1742 Goldbach-Euler (conjecture)exists 1933 Shnirelman≤ 4 1937 Vinogradov6

≤ 6 · 109 1969 Klimov7

≤ 115 1972 Klimov, Pil′tjaı and Septickaja8

≤ 55 1975 Klimov9

≤ 27 1977 Vaughan10

≤ 26 1977 Deshouillers11

≤ 19 1983 Riesel and Vaughan12

≤ 7 1995 Ramare13

≤ 6 2012 Tao14

≤ 4 2013 Helfgott15

6Representation of an odd number as a sum of three primes, C. R. (Dokl.) Acad. Sci. URSS,n. Ser. 15, 169–172, 1937, and Some theorems concerning the theory of primes, Rec. Math.Moscou, n. Ser. 2, 179–195, 1937.

7Apropos the computations of Snirel′man’s constant, Volz. Mat. Sb. Vyp. 7, 32–40, 1969.8An estimate of the absolute constant in the Goldbach-Snirelman problem, in Studies in numbertheory, No. 4, 35–51, Izdat. Saratov. Univ., Saratov, 1972.

9Kuıbyev. Gos. Ped. Inst. Naucn. Trudy 158, 14–30, 1975.10On the estimation of Schnirelman’s constant, J. Reine Angew. Math. 290, 93–108, 1977.11Sur la constante de Snirel′man, Seminaire Delange-Pisot-Poitou, 17e annee: (1975/76), Theorie

des nombres: Fac. 2, Exp. No. G16, Paris, 1977.12On sums of primes, Ark. Mat. 21(1), 46–74, 1983.13On Snirel′man’s constant, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 22(4), 645–706, 1995.14Every odd number greater than 1 is the sum of at most five primes, Math. Comp. 83(286),

997–1038, 2014.15The ternary Goldbach problem, preprint 2014, see http://arxiv.org/pdf/1404.2224.pdf.

http://arxiv.org/pdf/1404.2224.pdf

Chapter 5

Waring’s problem

In a letter to Euler, Waring suggested the possibility that for every positiveinteger k there exists an integer hk such that every integer n can be written as sumof hk k-powers, at most. With the language we have introduced in the previouschapter, Waring’s conjecture says that the set Gk := nk : n ∈ N is an additivebasis. This problem attracted the attention of several authors and some instanceswas proved with algebraic tools1. The first complete proof of the conjecture isdue to Hilbert, which nevertheless was a pure existence proof, without an effectiveway to determine (or even bound) the value of hk. Later a considerable effort hasbeen directed towards the determination of hk, but a complete comprehension ofthe behavior of these constants as a function of k is still lacking. This chapter isdevoted to the proof of Waring’s conjecture in its weak formulation:

Theorem 5.1 (Waring’s conjecture) Gk := nk : n ∈ N is an additive basis forevery k ∈ N.

The proof that we will reproduce here is due to Weyl, Linnik and Newman,and is probably the shortest one. Several steps are tailored on the specific problembut in its general aspects (Hardy–Littlewood’s circle method, Farey arcs, appro-ximations of exponential sums, etc...) it is a good introduction to general andwidely used techniques in Analytic Number Theory.According to Theorem 4.1 we could attack the theorem by trying to prove thatSchnirelmann’s density of Gk is positive, and (since 1 ∈ Gk) we could deduce thisfact by proving that its lower density is positive. There is a difficulty in thisargument: the density of Gk is zero whenever k ≥ 2! We have already faced asimilar problem with the set of primes (Theorem 3.6), thus we know how we canovercome the difficulty: proving that there exists an integer h such that hGk hasa positive lower density. If we are able to prove it then we can conclude that itsSchnirelmann’s density is positive too (because 0 is a k power, therefore 1 ∈ hGk),and Theorem 5.1 follows by Theorem 4.1. Hence our true goal is the proof of thefollowing proposition.

1Lagrange proved that every integer can be written as sum of four squares using an identity whichnow we reckon as giving the multiplicativity of the quaternionic norm to reduce the problem tothe representability of primes, and the structure of the solutions of quadratic equations moduloprimes to conclude that every prime is sum of four squares. Moreover, Hilbert’s first proof ofthe full Waring’s problem comes from a suitable polynomial identity: for a modern proof ofthis identity see Nesterenko, On a Hilbert identity, Mat. Zametki, 66(4), 1999, p. 527–532. Hisoriginal approach is not constructive, but it can be refined, see Pollack On Hilbert’s solution ofWaring’s problem, Cent. Eur. J. Math., 9(2), 2011, p. 294–301. Te elementary cases k = 3, 4, 6, 8are also discussed in [HW], Ch. XXI.

113

114 CAP. 5: WARING’S PROBLEM

Proposition 5.1 Let dh,k be the lower density of hGk. For every k there exists hsuch that dh,k > 0.

Analogously to our way to prove the positivity of the density for the sets P+Pand P+2N (i.e. Theorems 3.6 and 3.7), also Proposition 5.1 can be recovered froman upper bound for the number of representations of an integer n as sum of k-powers. More explicitly, let Rh,k(n) be the number of representations of n as sumof h k-powers; in a formula:

Rh,k(n) := ](n1, . . . , nh) ∈ Nh : n = nk1 + · · ·+ nkh.It is easy to prove that

(5.1)∑n≤x

Rh,k(n)h,k xh/k.

Proof. In fact,∑n≤x

Rh,k(n) =∑n≤x

∑n1,...,nh∈Nnk1+···+nkh=n

1

=∑

n1,...,nh∈Nnk1+···+nkh≤x

1 ≥∑

n1,...,nh∈Nnj≤(x/h)1/k ∀j

1 (x/h)h/k.

Suppose we know the following bound.

Theorem 5.2 (Linnik) Let h, k be fixed with h ≥ 6k2k, then Rh,k(n)h,k nh/k−1.

Let

Sh,k(x) :=#n ≤ x : Rh,k(n) ≥ 1

=#n ≤ x : ∃(n1, . . . , nh) ∈ Nh with n = nk1 + · · ·+ nkh.

Then, under the assumption that h ≥ 6k2k, we have the lower bound

xh/k−1Sh,k(x)(1)h,k

∑n≤x

nh/k−1δn is sum of hk-powers

(2)h,k

∑n≤x

Rh,k(n)(3)h,k x

h/k,

where we have used in (1) the definition of Sh,k(x) (and the positivity of the

exponent h/k− 1, to write xh/k−1 ≥ nh/k−1), Theorem 5.2 in (2) and (5.1) in (3).This proves that

Sh,k(x)h,k x,

i.e. that hGk has a positive lower density when h ≥ 6k2k. Thus, our goal now isthe proof of Theorem 5.2. We split the proof in two major steps: the first one isthe proof of a certain cancellation in certain exponential sums (Theorem 5.3 herebelow), then we use the representation of Rh,k(n) as complex integral over the

CAP. 5: WARING’S PROBLEM 115

interval [0, 1) and a decomposition of [0, 1) in Farey arcs (a typical tool from theAnalytic number Theory), to deduce the claim from a bound for the integrand ineach arc.

5.1. First step: cancellation in exponential sums

Let f(x) = αxk+· · · be a polynomial with real coefficients, and suppose α 6= 0,so that its degree is k. Let

Sf (N) :=N∑n=1

e(f(n)),

where e(x) := exp(2πix). The absolute value of every summand is one, hence|Sf | ≤ N , and this holds as an equality when f is a constant. We need a resultproving that Sf is actually smaller as soon as the degree of f is positive. Theproof will be by induction on the degree, so we first need a result describing whathappens for linear polynomials.

Lemma 5.1 Let N ∈ N, α, β ∈ R. Then∣∣∣ N∑n=1

e(αn+ β)∣∣∣ ≤ minN, ‖α‖−1,

where ‖α‖ := minα , 1− α is the distance of α from the closest integer.

Proof. The bound ≤ N is trivial and the inequality is meaningful only when‖α‖ ≥ 1/N . In particular, we can assume that α is not an integer. In this case

N∑n=1

e(αn+ β) = e(α+ β)e2πiαN − 1

e2πiα − 1,

so that ∣∣∣ N∑n=1

e(αn+ β)∣∣∣ =|e2πiαN − 1||e2πiα − 1|

=| sin(παN)|| sin(πα)|

≤ 1

| sin(πα)|.

The function α 7→ | sin(πα)| is 1-periodic and even, therefore |sin(πα)|=|sin(π‖α‖)|= sin(π‖α‖). Note that ‖α‖ ∈ [0, 1/2], by its definition; in this range the down-convexity of the sine map shows that sin(π‖α‖) ≥ 2‖α‖, so that the inequality inthe claim follows.

Now we can state the next general result.

116 5.1. FIRST STEP: CANCELLATION IN EXPONENTIAL SUMS

Proposition 5.2 Let f = αxk + · · · be a polynomial with degree k and real coeffi-cients. Then

(5.2) Sf (N)k N(N−k

∑d1,...,dk−1−N≤dj≤N

minN, ‖αk!d1 · · · dk−1‖−1)1/K

,

where K := 2k−1 and the implied constant depends only on k (in particular it isindependent of all other terms in f).

Note that once the proposition is proved for a given k, then it can be imme-diately generalized to the analogue claim for the same k but where the sum in nruns in every set of N consecutive integers: in fact, the shift n 7→ n − n0 wheren0 is any integer simply changes f(x) to f(x − n0) which differs from f(x) onlyfor terms of powers strictly lower than k. Moreover, the function appearing tothe right–hand side is increasing in N : this implies that the claim can be furtherextended to sums over at most N consecutive integers. This remark will be usedin the inductive step of the proof.

Proof. When k = 1 the claim is intended as Sf minN, ‖α‖−1 and holds byLemma 5.1. Assume k > 1. We have

|Sf (N)|2 = Sf (N) · Sf (N) =∑

1≤n,m≤Ne(f(m)− f(n))

and reordering the terms we deduce that

|Sf |2 ≤∑

|dk−1|≤N

∣∣∣ ∑1≤n≤N

1≤n+dk−1≤N

e(f(n+ dk−1)− f(n))∣∣∣.

The inner term is again an exponential sum over a set of consecutive indexes whichcontains N terms at most, for a new polynomial f(n+dk−1)−f(n) = αkdk−1n

k−1+· · · whose degree is one less the previous one: we have the opportunity to trig aninduction here (this remark is due to Weyl). By inductive hypothesis we deduce

|Sf |2 k N∑

|dk−1|≤N

(N1−k

∑d1,...,dk−2−N≤dj≤N

minN, ‖αkdk−1(k − 1)!d1 · · · dk−2‖−1)2/K

= N∑

|dk−1|≤N

(N1−k

∑d1,...,dk−2−N≤dj≤N

minN, ‖αk!d1 · · · dk−1‖−1)2/K

.

Using the inequality∑|d|≤N |ad|2/K N1−2/K(

∑|d|≤N |ad|)2/K (which is evident

for K = 2 while for K > 2 follows by (p, q)-Holder’s inequality with p := K/(K−2)


and q := K/2) we get

|Sf |2 k N ·N1−2/K( ∑|dk−1|≤N

N1−k∑

d1,...,dk−2−N≤dj≤N

minN, ‖αk!d1 · · · dk−1‖−1)2/K

= N2(N−k

∑d1,...,dk−1−N≤dj≤N

minN, ‖αk!d1 · · · dk−1‖−1)2/K

which gives the claim when a square root is taken.

Now we are able to prove our first tool, which is due to Weyl.

Theorem 5.3 (Weyl) Let f(x) = axk + · · · be a polynomial of degree k with realcoefficients. Let the maximal degree coefficient a be an integer coprime to thepositive integer q. Then

(5.3) Sf (N) :=N∑n=1

e(f(n)

q

)k,ε (qN)εN

(1

q+

1

N+

q

Nk

)1/K,

where K := 2k−1 and the implied constant depends only on k and ε (in particularit is independent of a and of all other terms in f).

This result shows that the sum is considerably lower than N (in absolute value)when q is large with respect to N but not too large: the cancellation appears whenq = q(N) diverges but remaining o(Nk). At last, we notice that for k ≥ 2 (5.3)may be written also as

Sf (N)k,ε (qN)εN ·

1

q1/Kif q ≤ N

1N1/K if N ≤ q ≤ Nk−1( qNk

)1/Kif Nk−1 ≤ q.

Proof. By Proposition 5.2 we have

Sf (N)k N(N−k

∑d1,...,dk−1−N≤dj≤N

minN,∥∥∥aqk!d1 · · · dk−1

∥∥∥−1)1/K.

The contribution to the sum of terms with d1 · · · dk−1 = 0 isk Nk−2 ·N (because

one of djs is zero, the other k − 2 run in a set with 2N + 1 terms producing

≤ k(2N + 1)k−2 elements, each one contributing to the sum by N at most), thus

Sf (N)k N(N−1 +N−k

∑d1,...,dk−11≤dj≤N

minN,∥∥∥aqk!d1 · · · dk−1

∥∥∥−1)1/K.

118 5.1. FIRST STEP: CANCELLATION IN EXPONENTIAL SUMS

Let

τ`(m) :=∑

d1,...,d`d1···d`=m

1,

i.e. the number of ways we can write m as product of ` positive integers. Thenτ`(m)`,ε m

ε.2

Proof. In fact, τ` can also be recursively defined as: τ1 := 1, and τ` := τ`−1 ∗ 1when ` > 1, and the claim can be proved by induction in the index `. The claimis evident for τ1. Let m =

∏j p

νjj be the decomposition of m as product of primes.

Then τ2(m) =∑

u|m 1 =∏j(1+νj) (a fact which can be verified for example using

the multiplicativity). Fix ε > 0. Then τ2(m)mε =

∏j

1+νj

pενjj

. The factor 1+νpεν is greater

than 1 only for a finite set of primes p and exponents ν. In fact, 1 + ν ≥ pεν forcespε ≤ (1ν)1/ν ≤ 2, and 1 + ν ≥ pεν ≥ 2εν forces ν ≤ ν0(ε). Hence the right handside in the previous equality is bounded by a constant which is dependent on ε but

independent of m. This shows that τ2(m)mε ε 1, which is the claim for τ2. For the

generic τ` it is sufficient to remark that τ`(m) =∑

d|m τ`−1(d) `,ε∑

d|m dε `,ε

mετ2(m).

Therefore we get

Sf (N)k,ε N1+ε(N−1 +N−k

∑m≤Nk−1

minN,∥∥∥aqk!m

∥∥∥−1)1/K.

Extending the range for m to k!Nk−1 it becomes

Sf (N)k,ε N1+ε(N−1 +N−k

∑m≤k!Nk−1

minN, ‖am/q‖−1)1/K

.

Thus the claim follows immediately from the bound

(5.4)∑m≤M

minN, ‖am/q‖−1 ε qε(N + q +MN/q +M) ∀M.

We split the sum in blocks of q consecutive integers: there are bM/qc such blocks,plus eventually one more which is incomplete. All complete blocks contribute thesame to the sum (by q periodicity of the summand), and the contribution of theuncomplete block is smaller than the one of a full block. As a consequence, itis sufficient to bound the contribution of the first block, corresponding to m =1, . . . , q: the final bound will be produced multiplying this bound by 1 + bM/qc.The fraction am/q is an integer only for m = q, because a and q are coprime,

2Actually better bounds are known. See [HW] Th. 317 p. 262.


and in this case minN, ‖am/q‖−1 = N . For every other m we can bound theminimum with ‖am/q‖−1, thus

q∑m=1

minN, ‖am/q‖−1 ≤ N +

q∑m=1q-m

‖am/q‖−1.

We partition further the sum according to the values of (m, q), the greatest com-mon divisor of m and q. Thus, let u be a divisor of q, we have (m, q) = u if andonly if m = um′ with (m′, q/u) = 1, and m ≤ q if and only if m′ ≤ q/u, thus wehave

q∑m=1

minN, ‖am/q‖−1 = N +∑u|qu<q

∑m′≤q/u

(m′,q/u)=1

‖am′/(q/u)‖−1.

The inner sum is independent of a: in fact the map m′ 7→ ‖am′/(q/u)‖ is q/uperiodic and the sum runs on a full set of representatives of classes modulo q/uwhich are coprime with q/u, which is preserved by the multiplication by a (becausea and q/u are coprime). Hence it is

q∑m=1

minN, ‖am/q‖−1 = N +∑u|qu<q

∑m′≤q/u

(m′,q/u)=1

‖m′/(q/u)‖−1.

The map m′ 7→ ‖m′/(q/u)‖ is also even; as a consequence∑m′≤q/u

(m′,q/u)=1

‖m′/(q/u)‖−1 ≤ 2∑

m′≤q/(2u)(m′,q/u)=1

‖m′/(q/u)‖−1 = 2∑

m′≤q/(2u)(m′,q/u)=1

q/u

m′,

so thatq∑

m=1

minN, ‖am/q‖−1 ≤ N + 2∑u|qu<q

∑m′≤q/(2u)(m′,q/u)=1

(q/u)

m′≤ N + 2q

∑u<q

1

u

q∑m′=1

1

m′

≤ N + 2q ln2 q ε N + q1+ε.

Thus, ∑m≤M

minN, ‖m/q‖−1 ε (1 +M/q)(N + q1+ε),

which is equivalent to (5.4).

120 5.2. SECOND STEP: INTEGRAL REPRESENTATION

5.2. Second step: integral representation

Let β ∈ Z. The integral∫ 1

0 e(αβ) dα detects when β = 0, since it is 1 whenβ = 0 and 0 otherwise. We can use this remark to give an integral representationof Rh,k(n). Let f(α) :=

∑Nm=0 e(αm

k) for a suitable integer N , then the previouscomputation shows that∫ 1

0f(α)he(−αn) dα =

∑m1,...,mh0≤mj≤N

∫ 1

0e(α(mk

1 + · · ·+mkh − n)) dα

= ](m1, . . . ,mh) ∈ Nh : mj ≤ N ∀j, mk1 + · · ·+mk

h = n.

The condition mj ≥ 0 for every j forces each mj satisfying the equation mk1 +

· · ·+mkh = n to be lower than n1/k, therefore the full set of representations of n is

computed by the integral as soon as N ≥ n1/k, i.e.

Rh,k(n) =

∫ 1

0f(α)he(−αn) dα

whenever N ≥ n1/k. As a consequence, Linnik’s result (Theorem 5.2) is equivalentto the claim

(5.5)

∫ 1

0f(α)he(−αn) dαh,k N

h−k

for N =⌈n1/k

⌉. The simple upper bound∣∣∣ ∫ 1

0f(α)he(−αn) dα

∣∣∣ ≤ ‖fh‖∞ · 1 = ‖f‖h∞

is not strong enough, since ‖f‖∞ = f(0) = N + 1, so that it simply gives Nh,while we need Nh−k. We could try to get around this difficulty by splitting[0, 1) into two regions: a (small) interval I containing 0 and its complementaryset Ic := [0, 1)\I, and estimating the part of the integral which is in I as N · µ(I)(which is small, when the measure of I is small enough), and estimating the part ofthe integral which is outside I with ‖f‖h∞,Ic , in the hope that the sup of f in Ic issignificatively smaller than N . Unfortunately this is not true, essentially becausethe k powers of integers are unevenly distributed modulo q. For example, only 0and 1 are squares modulo 3, and m2 is 0 (mod 3) when 3|m, and equals 1 (mod 3)otherwise. Thus if we set k = 2 then

f(1

3

)=

N∑m=0

e(m2/3) = e(0)N

3+ e(1/3)

2N

3+O(1) =

iN√3

+O(1).


The same phenomenon appears for every q for which k and ϕ(q) are not coprime(for example, try with k = 3 and q = 7). As a consequence, in order to bound theintegral in [0, 1) we need to split its domain into a convenient set of arcs (calledFarey arcs) centered around certain fractions (see Equation (5.6) here below), thenwe will be able to prove that the integrand is small in size in each arc by Weyl’sresult. We start with a second lemma which we need in order to check that thecollection of Farey arcs actually covers the full interval [0, 1).

Lemma 5.2 (Dirichlet approximation lemma) Let Q be an arbitrary positiveinteger. For every α ∈ [0, 1) there are integers p, q with 0 < q ≤ Q such that∣∣∣α− p

q

∣∣∣ ≤ 1

qQ.

Proof. Let A := jα : j = 1, . . . , Q + 1. The elements in A are in [0, 1), bydefinition of fractional part, and are Q+ 1 in number3. Hence (by the pigeon-holeprinciple) two of them are less than 1/Q apart, i.e. there are two indexes j1 < j2such that | j2α − j1α | ≤ 1/Q. Let p := bj2αc − bj1αc and q := j2 − j1. Notethat q ≤ Q. Then

|qα− p| = |j2α− j1α− bj2αc+ bj1αc | = | j2α − j1α | ≤1

Q,

which is the claim.

Note that the integers p and q appearing in the previous lemma can be takencoprime. In fact, Let p′, q′ be coprime integers with p′/q′ = p/q. Then evidentlyq′ ≤ q ≤ Q, so that ∣∣∣α− p′

q′

∣∣∣ =∣∣∣α− p

q

∣∣∣ ≤ 1

qQ≤ 1

q′Q.

Let k ≥ 2 and let ν be a positive ‘small’ parameter that we will set later (ourchoice will be any ν ∈ (0, 1/3], with ν = 1/3 producing our best result). For everycouple 0 ≤ a < q ≤ Nk−ν of coprime integers and for every j ∈ N let Ij(q, a) bethe set

Ij(q, a) :=α ∈ [0, 1) :

∣∣∣α− a

q

∣∣∣ ≤ 1

qNk−ν ,∣∣∣α− a

q

∣∣∣ ∈ [ j

Nk,j + 1

Nk

).

Essentially, Ij(q, a) is the set of real numbers which we consider as well approxi-mated by the rational number a/q. Fixed a and q coprime, we have⋃

j

Ij(q, a) =α ∈ [0, 1) :

∣∣∣α− a

q

∣∣∣ ≤ 1

qNk−ν

;

3some of them could coincide when α is rational.


therefore, from the Approximation Lemma 5.2 (with Q = Nk−ν) we get that

(5.6) [0, 1) ⊆⋃

q≤Nk−ν

q−1⋃a=0

(a,q)=1

Nν/q⋃j=0

Ij(q, a),

where 1 ≤ q ≤ Nk−ν , and j ≤ Nν/q (otherwise Ij(q, a) is empty). The next threelemmas give a bound for the value of f(α) in Ij(q, a) according to the range of q.

Lemma 5.3 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If q ≤ N , then

f(α)kN

q1/2k(j + 1)1/k.

Proof. We split the proof into two cases.

Case 1: small q.Suppose q < N2ν . Let

A :=1

q

q∑m=1

e(aqmk).

Its definition immediately implies that |A| ≤ 1, while from Theorem 5.3 we have

Ak,ε1q q(q · q)

εq−21−k which for ε = 2−k−1 becomes

(5.7) Ak q−1/2k .

We write f(α) as 1 + F1(α) +AF2(α), with

F1(α) :=

N∑m=1

(e(aqmk)−A

)· e((α− a/q)mk)

F2(α) :=

N∑m=1

e((α− a/q)mk).

Bound for F1.The term F1 is

∑Nm=1 amg(m) with

am := e(aqmk)−A, g(m) := e((α− a/q)mk).

Let S(x) :=∑

1≤m≤x am, with S(0) := 0. am is q-periodic and S(q) = S(2q) =

· · · = 0 (because S(q) =∑q

m=1 e(amk/q) − qA = 0), therefore S(x) is q-periodic

too. In particular it is bounded by the maximum that it assumes in x ∈ [0, q],


which is ≤ 2q (since for x ∈ [0, q] there are q terms at most and the addends are≤ 2). Therefore ‖S(·)‖∞ ≤ 2q. By partial summation (Formula (1.4)) we have

F1 = S(N)g(N)−∫ N

1S(x)g′(x) dx,

so that

|F1| ≤ ‖S(·)‖∞ + ‖S(·)‖∞∫ N

1|g′(x)|dx ≤ 2q + 2q

∫ N

12π|α− a/q|kxk−1 dx

q(

1 +Nk · 1

qNk−ν

)= q +Nν N2ν .

Bound for F2.From the Euler–Maclaurin formula (1.5) it follows that

F2 =

∫ N

1g(x) dx+O(1) +O

(∫ N

1|g′(x)|dx

).

Setting v(β) :=∫ N

1 e(βxk) dx and recalling that here above we have proved that∫ N1 |g

′(x)| dx Nν/q Nν , we conclude that

F2 = v(α− a/q) +O(Nν).

Consequently

f(α) = 1 + F1 +AF2 = 1 +O(N2ν) +Av(α− a/q) +O(|A|Nν)

= Av(α− a/q) +O(N2ν)

so that (by (5.7))

(5.8) |f(α)| k N2ν + q−1/2k |v(α− a/q)|.

We need a bound for v(α− a/q). Trivially, we see that |v(α− a/q)| ≤ N : this factalready proves that

(5.9) |f(α)| k N2ν +

N

q1/2k.

We have N2ν N

q1/2k whenever q1/2k N1−2ν . By hypothesis we have q < N2ν ,

therefore the bound holds for sure if 2ν ≤ (1− 2ν)2k, i.e. if ν ≤ 12

2k

2k+1. Since we

are assuming k ≥ 2, it is true for sure when ν ≤ 2/5. Under this assumption (5.9)becomes

|f(α)| kN

q1/2k,


which is the claim for j = 0. Suppose that j ≥ 1. Then the change x = y/|β|1/kgives

v(β) =

∫ N

1e(βxk) dx = |β|−1/k

∫ N ·|β|1/k

|β|1/ke(sgn(β)yk) dy.

Since F±(z) :=∫ z

0 e(±yk) dy is bounded4 as function of z ∈ R, we conclude that

|v(β)| |β|−1/k. Therefore, when j ≥ 1 from (5.8) we have

|f(α)| N2ν + q−1/2k |α− a/q|−1/k ≤ N2ν + q−1/2k( j

Nk

)−1/k

≤ N2ν +N

q1/2kj1/k.

We have already noticed that j ≤ Nν . Using this restriction for j and recallingthat we are assuming q ≤ N2ν , it is immediate to verify that N2ν N

q1/2kj1/k

for

ν ≤ 1/3. Concluding, we have proved that

|f(α)| kN

q1/2kj1/k

also in this case.

Case 2: large q.Suppose N2ν ≤ q ≤ N . Then by Theorem 5.3 we have

f(aq

)= 1 +

N∑m=1

e(amk

q

)k,ε N(qN)εq−21−k

which for any ε ≤ ν2−k and under the assumption N2ν ≤ q ≤ N gives

(5.10) f(aq

)k Nq

−1/2k .

Moreover, ‖f ′‖∞ Nk+1 uniformly in R, therefore

(5.11)∣∣∣f(α)− f

(aq

)∣∣∣ ‖f ′‖∞ · ∣∣∣α− a

q

∣∣∣ ≤ Nk+1 · 1

qNk−ν =N1+ν

q.

We are assuming that N2ν ≤ q ≤ N , therefore the bound proves that

(5.12)∣∣∣f(α)− f

(aq

)∣∣∣ Nq−1/2k .

4In fact, the change y → z1/k and an integration by part show that

±∫e(±yk) dy = ±

∫e(±z)kz1−1/k

dz =e(±z)

2πikz1−1/k+

1− 1/k

2πik

∫e(±z)z2−1/k

dz

from which it is easy to deduce that limz→±∞ F±(z) exists and is finite whenever k > 1.


(In other words, we are claiming that N1+ν

q Nq−1/2k . This happens if and only if

Nν q1−1/2k . Since N2ν ≤ q, this happens for sure as soon as Nν N2ν(1−1/2k),i.e., as soon as 1/2k ≤ 1/2, which is true.)

Bounds (5.10) and (5.12) together imply that f(α) k Nq−1/2k . This concludes

the proof of this case since the assumption q ≥ N2ν and j ≤ Nν/q imply that 0 isthe unique possible value for j.

Lemma 5.4 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If N < q ≤ Nk−1, then j = 0 and

f(α)kN

N1/2k.

Proof. j is zero because this happens for every q > Nν . Moreover, in the givenrange for q by Theorem 5.3 we have

f(aq

)= 1 +

N∑m=1

e(amk

q

)k,ε N(qN)εN−2/2k

which for any ε ≤ 2−k/k and under the hypothesis N < q ≤ Nk−1 gives

f(aq

)k N

1−1/2k .

We get the claim since∣∣f(α) − f

(aq

)∣∣ N1+ν

q N

N1/2k(for the first bound

recall (5.11), for the second inequality notice that it can be written as Nν+1/2k q,that ν + 1/2k < 1 and that N ≤ q by hypothesis).

Lemma 5.5 Let ν ∈ (0, 1/3]. Let α ∈ Ij(q, a). If Nk−1 < q ≤ Nk−ν , then j = 0and

f(α)k N(q/Nk)1/2k .

Proof. j is zero because this happens for every q > Nν . Moreover, in the givenrange for q by Theorem 5.3 we have

f(aq

)= 1 +

N∑m=1

e(amk

q

)k,ε N(qN)ε(q/Nk)2/2k

which for any ε ≤ νk+12−k and under the assumption Nk−1 < q ≤ Nk−ν gives

f(aq

)k N(q/Nk)1/2k ,

and the claim follows since∣∣f(α) − f

(aq

)∣∣ N1+ν

q N(q/Nk)1/2k (for the first

bound recall (5.11), for the second inequality notice that it can be written as

Nν+k/2k q1+1/2k , that Nk−ν ≤ q by hypothesis, and that ν+k/2k ≤ (k−ν)(1+1/2k) holds for ν ≤ 1/3 and k ≥ 2).


We are now in position to complete the proof of Linnik’s result (Theorem 5.2).The size of Ij(q, a) is 2/Nk, at most, therefore from Lemma 5.3 we get∫

Ij(q,a)f(α)he(−nα) dαk

( N

q1/2k(j + 1)1/k

)h 1

Nkk

Nh−k

qh/2k(j + 1)h/k

when q ≤ N ,∑j

∣∣∣ ∫Ij(q,a)

f(α)he(−nα) dα∣∣∣k

( N

N1/2k

)h 1

Nkk

Nh−k

Nh/2k

when N < q ≤ Nk−1 from Lemma 5.4, and∑j

∣∣∣ ∫Ij(q,a)

f(α)he(−nα) dα∣∣∣k

(N(q/Nk)1/2k

)h 1

Nkk

Nh−k

Nhν/2k

when Nk−1 < q ≤ Nk−ν from Lemma 5.5. As a consequence, using the decompo-sition (5.6) we conclude that∫ 1

0f(α)he(−nα) dα

k

∑q≤N

q−1∑a=0

∑j

Nh−k

qh/2k(j + 1)h/k+

∑N<q≤Nk−1

q−1∑a=0

Nh−k

Nh/2k+

∑Nk−1<q≤Nk−ν

q−1∑a=0

Nh−k

Nhν/2k

≤Nh−k[∑q≤N

∑j

q

qh/2k(j + 1)h/k+

∑N<q≤Nk−1

q

Nh/2k+

∑Nk−1<q≤Nk−ν

q

Nhν/2k

]≤Nh−k

[∑q

q

qh/2k∑j

1

(j + 1)h/k+N2k−2ν

Nhν/2k

].

Setting h ≥ 2k2k/ν the double sum converges and the last term is infinitesimal.This proves that ∫ 1

0f(α)he(−nα) dαh,k N

h−k,

which is (5.5). The choice ν = 1/3 proves Theorem 5.2 with h = 6k2k.

It is customary to denote with g(k) the dimension of Gk as additive basisand with G(k) its dimension as asymptotic additive basis. The value of g(k) isstrongly affected by the arithmetic of k. In fact, Johann Albrecht Euler (son ofLeonhard Euler) noticed that the number 2k

⌊(3

2)k⌋− 1 is smaller than 3k, thus in

its representation as sum of k powers only 1k and 2k may appear. The shortest


possible representation needs⌊(3

2)k⌋− 1 terms of kind 2k and 2k− 1 terms of kind

1k, and therefore needs⌊(3

2)k⌋

+ 2k − 2 powers thus proving that

g(k) ≥ 2k +

⌊(3

2

)k⌋− 2.

He conjectured that this is the correct value of g(k). The joint work of Dickson,Pillai, Rubugunday and Niven proved that the conjecture holds for k whenever

(5.13) 2k(3

2

)k+

⌊(3

2

)k⌋≤ 2k.

This condition may be rephrased by noticing that if we write 3k as q2k + r with0 ≤ r < 2k, then r is 2k

(3

2)k

and q is⌊(3

2)k⌋

and the inequality simply says

that q + r ≤ 2k. Numbers q and r may be quickly recovered from the binaryrepresentation of 3k (because r is simply the number represented by the k lowerbinary digits, and q the number represented by the other binary digits), and thecondition q + r ≤ 2k may be tested directly from their binary representation.Moreover, the binary representation of q contains only dk ln2 3e − k digits, so it isshorter than r by 2k−dk log2 3e digits. As a consequence, the claim holds for surewhen among the 2k− dk log2 3e most significant digits of r at least one 0 appears.For example:

317 = (1111011001︸︷︷︸q

01000010111000011︸︷︷︸r

)2

thus

q + r =(1111011001)2 +

(01000010111000011)2

and this is lower than 2k since

r = (0100001︸︷︷︸zeros here

0111000011)2.

With this procedure the statement may be checked by any machine with a verymodest ability to manipulate integers but having a very huge memory to store thebinary representation of 3k and which is specially tuned to quickly manipulatedextremely long binary sequences. In this way the equality g(k) =

⌈(3

2)k⌉

+ 2k − 2

has been checked for every k ≤ 4 · 108 by Kubina and Wunderlich5.In its essence (5.13) is a Diophantine inequality, claiming a property of the distribu-tion of (3/2)k modulo 1. In this sense it is not surprising that in 1957 Mahler6 wasable to adapt a previous result of his ones to the then very recent Roth’s result onDiophantine inequalities to deduce that (5.13) has only finitely many exceptions,

5Extending Waring’s conjecture to 471, 600, 000, Math. Comp. 55(192), 815–820 (1990).6 On the fractional parts of the powers of a rational number. II, Mathematika 4, 122–124 (1957).


at most. Unfortunately Mahler’s result is not constructive, thus at the momentwe don’t know any bound neither about the number of possible exceptions, norabout the largest possible exception.

G(k) is in some sense more regular and apparently it should depend only onthe size of k; the argument we have proposed here may be modified to provide alower bound for the part of the integral corresponding to q < N2ν (the so calledmajor arcs), and in this way one proves that G(k) ≤ 6k2k. Much stronger upperbounds of the kind G(k)ε k

1+ε have been proved; in our opinion the completelyexplicit result of Karatsuba7 deserves a special mention, but other and even betterbounds are known. A good and complete account of the history of the problem andof the recent results is contained in Vaughan–Wooley, Waring’s problem: a survey,in Number theory for the millennium, III (Urbana, IL, 2000), 301–340 (2002).

Lastly, we mention that Linnik in 1943 provided a completely elementary proofof the Waring problem. Actually, his argument is called elementary because nocomplex analysis, cancellation in exponential sums, or integration methods areinvolved. However, it is not trivial, definitively. Moreover it can be used to treat

more general additive problems, for example∑h

j=1 f(xj) = n (where f is any fixedpolynomial assuming integer and positive values at positive integers, degree k, and

we are looking for a solution in x1, . . . , xk ∈ N, for every n) or∑h

j=1 fj(xj) = nwhere the polynomials fj share the degree but change with j.A ‘passionate’ exposition of Linnik’s method is in Khinchin’s book: Three pearlsof number theory8. A more recent exposition is in Nathanson’s book: Elementarymethods in number theory GTM 195, Springer-Verlag, 2000, Chapters 11 and 12.

7see Th. 3.4 in Arkhipov–Chubarikov–Karatsuba, Trigonometric sums in number theory andanalysis, de Gruyter Expositions in Mathematics 39 Walter de Gruyter GmbH & Co. KG,Berlin 2004.

8Khinchin composed the book in Spring 1945 for a friend of him who asked some interestingmaths to study during his recovering from a wound in WWII (the preface of this book is veryinstructive about the general feeling of that period in URSS).

Bibliography

Books used for the main parts of the course:

[IK ] H. Iwaniec, E. Kowalski: Analytic number theory, American Mathematical SocietyColloquium Publications 53, American Mathematical Society, Providence RI, 2004.

[MV ] H. L. Montgomery, R. C. Vaughan: Multiplicative number theory. I. Classicaltheory, Cambridge Studies in Advanced Mathematics 97, Cambridge University Press,Cambridge, 2007.

[P ] P. Pollack: Not always buried deep, A second course in elementary number theory,American Mathematical Society, Providence RI, 2009.

Books cited somewhere:

[ACK ] G. I. Arkhipov, V. N. Chubarikov and A. A. Karatsuba: Trigonometric sums innumber theory and analysis, de Gruyter Expositions in Mathematics 39 Walter deGruyter GmbH & Co. KG, Berlin 2004.

[EL ] P. Eymard and J.-P. Lafon: The number π, American Mathematical Society, Provi-dence, 2004.

[HR ] H. Halberstam and K. F. Roth: Sequences, 2 ed. Springer-Verlag, New York, 1983.

[H ] B. Huppert: Character theory of finite groups, de Gruyter Expositions in Mathema-tics 25, Berlin, 1998.

[HR ] G. H. Hardy and M. Riesz: The general theory of Dirichlet’s series, Stechert-Hafner,New York, 1964.

[HW ] G. H. Hardy and E. M. Wright: An introduction to the theory of numbers, V ed.,The Clarendon Press Oxford University Press, 1979.

[I ] I. M. Isaacs: Character theory of finite groups, Dover Publications Inc., New York,1994.

[T ] G. Tenenbaum: Introduction to analytic and probabilistic number theory, CambridgeStudies in Advanced Mathematics 46, Cambridge, 1995.

[Titch ] E. C. Titchmarsh: The theory of functions, 2d ed., Oxford University Press,London, 1975.

[TV ] T. Tao and V. Vu: Additive combinatorics, Cambridge Studies in Advanced Ma-thematics 105, Cambridge University Press Cambridge, 2006.

129

Characters of this drama

• Niels Henrik Abel. Frindoe (Norway) 5-8-1802, Froland (Norway) 6-4-1829.

• Jacob Bernoulli. Basel 27-12-1654, Basel 16-8-1705.

• Enrico Bombieri. Milan 26-11-1940.• Carlo Emilio Bonferroni. Bergamo (Italy) 28-1-1892, Florence 18-8-1960.

• Jean Bourgain. Ostend (Belgium) 28-2-1954.

• Viggo Brun. Lier 13-10-1885 (Norway), Drøbak (Norway) 15-8-1978.• Edmond Darrel Cashwell.

• Augustin Louis Cauchy. Paris 21-8-1789, Sceaux (France) 23-5-1857.

• Ernesto Cesaro. Naples 12-3-1859, Torre Annunziata (Italy) 12-9-1906.• Pafnuty Lvovich Chebyshev. Okatovo (Russia) 16-5-1821, San Pietroburgo 8-12-1894.

• Jingrun Chen. Fuzhou (Fujian Province, China) 22-5-1933, 22-3-1996.

• Harold Davenport. Huncoat (England) 30-10-1907, Cambridge 9-6-1969.• Julius Richard Dedekind. Braunschweig (now Germany) 6-10-1831, Braunschweig 12-2-1916.

• Jean-Marc Deshouillers.• Johann Peter Gustav Lejeune Dirichlet. Duren (Germany) 13-2-1805, Gottingen 5-5-1859.

• Leonard Eugene Dickson. Independence (Iowa) 22-1-1874, Harlingen (Texas)17-1-1954.

• Freeman John Dyson. Crowthorne (England) 15-12-1923.• Eratosthenes of Cyrene. (now Shahhat, Libya) 276 BC, Alexandria (Egypt) 194 BC.

• Paul Erdos. Budapest 26-3-1913, Warsaw 20-9-1996.

• Euclid of Alexandria. About 325 BC, Alexandria (Egypt) about 265 BC.• Johann Albrecht Euler. St.Peterbourg 27-11-1734, St. Petersburg 17-9-1800.

• Leonhard Euler. Basel 15-4-1707, St. Petersburg 18-9-1783.

• Cornelius Joseph Everett.• Francesco Faa di Bruno. Alessandria (Italy) 29-3-1825, Torino (Italy) 27-3-1888.

• Johann Faulhaber. Ulm (Germany) 5-5-1580, Ulm 10-9-1635.

• John Farey. Woburn (England) 1766, London 6-1-1826.• Pierre de Fermat. Beaumont-de-Lomagne (France) 17-8-1601, Castres (France) 12-1-1665.

• Guido Fubini. Venezia 19-1-1879, New-York 6-6-1943.• Johann Carl Friedrich Gauss. Brunswick-Luneburg (Germany) 30-4-1777, Gottingen 23-2-1855.

• Christian Goldbach. Konigsberg, Prussia (now Kaliningrad, Russia) 18-3-1690, Moscow 20-11-1764.

• Dorian Goldfeld. Marburg (Germany) 21-1-1947.• Ben Joseph Green. Bristol (England) 27-2-1977.

• Jacques Salomon Hadamard. Versailles 8-12-1865, Paris 17-10-1963.

• Harald Andres Helfgott. Lima (Peru) 25-11-1977.• David Hilbert. Konigsberg, Prussia (now Kaliningrad, Russia) 23-1-1862, Gottingen 14-2-1943.

• Otto Ludwig Holder. Stuttgart 22-12-1859, Leipzig 29-8-1937.

• Martin Neil Huxley (England).• Albert Edward Ingham. Northampton (England) 3-4-1900, Chamonix (France) 6-9-1967.

• Henryk Iwaniec. 9-10-1947 (Poland).

• Anatolii Alexeevitch Karatsuba. Grozny (Russia) 31-1-1937, Moscow 28-9-2008.• Aleksandr Yakovlevich Khinchin. Kondrovo (Russia) 19-7-1894, Moscow 18-11-1959.

• K. I. Klimov.• Nikolai Mikhailovich Korobov. 23-11-1917.

• Jeffrey M. Kubina.

• Jeffrey Clark Lagarias. Pittsburgh 11-1949.• Joseph Louis Lagrange. Tourin 25-1-1736, Paris 10-4-1813.

• Edmund Georg Hermann Landau. Berlin 14-2-1877, Berlin 19-2-1938.• Adrien-Marie Legendre. Paris 18-9-1752, Paris 10-1-1833.• Ernst Lindelof. Helsingfors (Russian Empire now Helsinki, Finland) 7-3-1870, Helsinki 4-6-1946.

• Yuri Linnik. Belaya Tserkov (Ukraine) 21-1-1915, Leningrad (now St Petersburg, Russia) 30-6-1972.

• Rudolf Otto Lipschitz. Konigsberg Prussia (now Kaliningrad, Russia) 14-5-1832, Bonn 7-10-1903.

130

CAP. : BIBLIOGRAPHY 131

• John Edensor Littlewood. Rochester (England) 9-6-1885, Cambridge (England) 6-9-1977.• Colin Maclaurin. Kilmodan (Great Britain) 2-1698, Edimburg 14-6-1746.

• Kurt Mahler. Krefeld (Prussian Rhineland) 26-7-1903, Camberra (Australia) 23-2-1988.

• Henry Berthold Mann. Vienna 27-10-1905, 1-2-2000.• Lorenzo Mascheroni. Bergamo (Italy) 13-5-1750, Paris 14-7-1800.

• Franz Carl Joseph Mertens. Schroda (Poland) 20-3-1840, Vienna 5-3-1927.

• August Ferdinand Mobius. Schulpforta (Germany) 17-1-1790, Lipsia 26-9-1868.• Hugh Lowell Montgomery.

• Giacinto Morera. Novara (Italy) 18-7-1856, Turin (Italy) 8-2-1907.• Maruti Ram Pedaprolu Murty. Guntur (India) 10-16-1953.

• Mohan K. N. Nair.

• Rolf Herman Nevanlinna. Joensuu (Finland, then Russia) 22-10-1895, Helsinki 28-5-1980.• Maxwell Herman Newman. Chelsea (England) 7-2-1897, Comberton (England) 22-2-1984.

• Ivan Morton Niven. Vancouver (Canada) 25-10-1915, Eugene (Oregon) 9-5-1999.

• Bertil Nyman.• Oskar Perron. Frankenthal (Germany) 7-5-1880, Munich 22-2-1975.

• Subbayya Sivasankaranarayana Pillai. Nagercoil (Tamil Nadu) 5-4-1901, Cairo (Egypt) 31-8-1950.

• George Polya. Budapest 13-12-1887, Palo Alto (California) 7-9-1985.• Alfred Pringsheim. Ohlau (Germany) 2-9-1850, Zurich 25-6-1941.

• Srinivasa Aiyangar Ramanujan. Erode (India) 22-12-1887, Kumbakonam (India) 26-4-1920.

• Olivier Ramare.• Georg Friedrich Bernhard Riemann. Breselenz (Germany) 17-9-1826, Selasca (Italy) 20-6-1866.

• Hans Riesel.• Marcel Riesz. Gyor (Hungary) 16-11-1886, Lund (Sweden) 4-9-1969.

• Giancarlo Rota. Vigevano (Italy) 27-4-1932, Cambridge 18-4-1999.

• Klaus Friedrich Roth. Breslau (Germany, now Wroc law, Polland) 29-10-1925.• Raghunath Krishna Rubugunday. Madras (India) 1918, 2000.

• Imre Z. Ruzsa. Budapest 23-7-1953.

• Issai Schur. Mogilev (Russia, now Belarus) 01-10-1875, Tel Aviv (Palestine now Israel) 01-10-1941.• Hermann Amandus Schwarz. Hermsdorf, Silesia (now Poland) 25-1-1843, Berlin 30-11-1921.

• Peter Sarnak. (South Africa) 18-12-1953.

• Atle Selberg. Langesund (Norway) 14-6-1917, Princeton 6-8-2007.• Lev Genrikhovich Shnirelman. Gomel (Belarus) 2-1-1905, Moscow 24-9-1938.

• James Stirling. Garden (Scotland) 5-1692, Edinburgh 5-12-1770.

• James Joseph Sylvester. London 3-9-1814, London 15-3-1897.• Brook Taylor. Edmonton (Great Britain) 18-8-1685, London 29-12-1731.

• Charles Jean Baron de la Vallee Poussin. Louvain (Belgium) 14-8-1866, Louvain 2-3-1966.• N. P. Romanoff.

• Terence Chi-Shen Tao. Adelaide (Australia) 17-7-1975.

• Robert Charles Vaughan 24-3-1945.• Ivan Matveevich Vinogradov. Milolyub (Russia) 14-9-1891, Moscow 20-3-1983.

• Hans Carl Friedrich von Mangoldt. Weimar 18-5-1854, Gdansk 27-10-1925.

• Georgy Fedoseevich Voronoi. Zhuravka, (Russia, now Ukraine) 20-04-1868, Warsaw 20-11-1908.• John Wallis. Ashford (England) 23-11-1616, Oxford 28-10-1703.

• Edward Waring. Shrewsbury (England) 1736, Pontesbury (England) 15-8-1798.

• Karl Theodor Wilhelm Weierstrass. Ostenfelde (Germany) 31-10-1815, Berlin 19-2-1897.• Andre Weil. Paris 6-5-1906, Princeton 6-8-1998.

• Hermann Klaus Hugo Weyl. Elmshorn (Germany) 9-11-1885, Zurich 9-12-1955.• Eduard Wirsing.• Trevor Wooley.

• John William Wrench Jr.. Westfield (New York) 13-10-1911, Frederick (Maryland) 27-2-2009.• Marvin C. Wunderlich.

Documents

Notes for the course in Analytic Number Theory · Analytic Number Theory G. Molteni Fall 2019 revision 8.0. Disclaimer These are the notes I have written for the course in Analytical