Lecture Notes on Random Matrices (for use within CSU) (not for distribution) · 2018-10-02 · 1.1. Random matrices 1 1.2. Sources 1 Chapter 2. Probability Measures on Matrix Groups

Lecture Notes on Random Matrices

(for use within CSU)

(not for distribution)

Elton P. Hsu

Contents

Chapter 1. Introduction 11.1. Random matrices 11.2. Sources 1

Chapter 2. Probability Measures on Matrix Groups 32.1. Gaussian ensembles 32.2. Generalizations 52.3. Haar measure 62.4. Distribution of eigenvalues 62.5. Entropy consideration 8

Chapter 3. Semicircle Law 93.1. Semicircle law 93.2. Eigenvalue distribution for circular unitary ensemble 103.3. Hermite polynomials 133.4. Semicircle law for GUE 153.5. 2-point correlation function 173.6. zeros of the Riemann zeta-function 193.7. Universality 193.8. General orthogonal polynomial ensembles 20

Chapter 4. Large Deviations of Random Matrices 234.1. Definitions 234.2. Large deviation of sums of i.i.d. random variables 264.3. Large deviation for the empirical distribution 304.4. Some other interesting large deviation results 304.5. Review of results concerning random matrices 314.6. Equilibrium measure and the rate function 324.7. Proof of the upper bound 344.8. Proof of the lower bound 364.9. Logarithmic potential and the semicircle law 40

Bibliography 41

Index 43

i

CHAPTER 1

Introduction

1.1. Random matrices

Theory of random matrices investigates asymptotic properties of matri-ces of large size whose entries are random variables, e.g., the distributionsand other statistical properties of its eigenvalues and matrix entries. Thetheory is closed related to free probability, orthogonal polynomials, poten-tial theory, statistical mechanics, nuclear physics, quantum mechanics, andother branches of mathematics and physics.

Probability measures on matrix groups are assumed to have certain in-dependence and invariance properties. In most cases, these invariance con-ditions determine the probability measure uniquely. For example, if thematrix group is compact, then we can use the Haar measure.

1.2. Sources

These notes are based on the following sources, some of which are notpublicly available at this point.

(1) Madan Lal Mehta, Random Matrices, Academic Press (1990)(2) Percy Deift, Orthogonal Polynomials and Random Matrices: A

Riemann-Hilbert Approach, AMS (2000)(3) D. V. Voiculescu et al., Free Random Variables, AMS (1992)(4) Fumio Hiai and Denes Petz, The Semicircle Law, Free Random

Variables and Entropy, AMS (2000)(5) D. V. Voiculescu (ed.), Free Probability Theory, AMS.(6) Philippe Biane, Free probability for Probabilistis, (1998).(7) Kurt Johansson, Determinantal processes with Number Variance

Saturation.(8) Mark Adler and Pierre van Moerbeke, PDEs for the joint distribu-

tions of the Dyson, Airy and sine processes.(9) Jonas Gustavsson, Gaussian Fluctuations of Eigenvalues in the

GUE.(10) Craig Tracy and Harold Widom, Introduction to Random Matrices

(Notes)(11) Z. D. Bai, Circular law, Ann. of Prob., 25, no. 1, 494–529 (1997).

1

2 1. INTRODUCTION

(12) Vlodzimierz Bryc et al., Spectral measure of large random HankelMarkov and Toeplitz matrices.

(13) Z. D. Bai, Methodologies in spectral analysis of large dimensionalrandom matrices, a review (survey), Statistica Sinica, 9, 611–677(1999).

(14) J. Baik et al., Phase transition of the largest eigenvalue for non-nullcomplex sample covariant matrices.

(15) Tiefeng Jiang, Maxima entries of Haar distributed matrices (2003).(16) Greg Anderson and Ofer Zeitouni Lectures on Random Matrices

(lecture notes)(17) Greg Anderson, Alice Guionnet, and Ofer Zeitouni, Lectures on

Random Matrices (lecture notes)(18) Alice Guionnet, Large deviations and stochastic calculus for ran-

dom matrices.(19) Tiefeng Jiang, How many entries of a typical orthogonal matrix can

be approximated by independent normals?(20) Brian Rider, Deviation from the circle law, Ann. Prob. (2004).(21) Forrester, Random Matrix Theory(22) Alice Guionnet, Large Deviations of Random Matrices and Stochas-

tic Calculus.

CHAPTER 2

Probability Measures on Matrix Groups

2.1. Gaussian ensembles

Reference for this section: Mehta[6].In this section we describe some commonly used probability measure on

matrix groups. We will use the following notations:Sn(R) – real symmetric matrices of size n;Hn – hermitian matrices of size n.Gaussian ensembles are a class of such probability measures specified by

their invariance properties. These measures have significance in statisticalphysics and were first studied by E. Wigner.

Let G be a topological group and µ a probability measure on G. Let Hbe another group which acts on G, i.e., there is an isomorphism φ : G →Aut(G). We say that µ is invariant under H if h∗µ = µ, i.e., µ(hE) = µ(E)for all h ∈ H and E ∈ B(G). Here

hE = {φ(h)g : g ∈ G} .

Definition 2.1.1. The Gaussian orthogonal ensemble (GOE) µ1 is theprobability measure on the space of real symmetric matrices such that

(i) it is invariant under the orthogonal group action M 7→ O∗MO;(ii) the entries {Mij , 1 ≤ i ≤ j ≤ n} are independent.

Definition 2.1.2. The Gaussian unitary ensemble (GUE) is a proba-bility measure µ2 on the space of Hermitian matrices such that

(i) it is invariant under the unitary group action M 7→ U∗MU ;(ii) the real and imaginary parts of the entries{

Mij = Aij +√−1Bij , 1 ≤ i ≤ j ≤ n

}are independent.

Such random matrices are called Wigner matricesLet JN be the symplectic metric matrix of rand 2N . A matrix S is called

symplectic ifJ = SJST .

It is clear that symplectic matrices form a group, called the symplectic group.Notation Sp(N). The (symplectic) dual of a matrix M is MR = −JMT J .M is called self-dual if MR = M , i.e., JM = MT J.

Definition 2.1.3. The Gaussian symplectic ensemble (GSE) is a prob-ability measure µ4 on the space of self-dual hermitian matrices such that

3

4 2. PROBABILITY MEASURES ON MATRIX GROUPS

(i) it is invariant under the symplectic group action M 7→ SRMS;(ii) the components of the linearly independent entries of M are inde-

pendent.

The meaning of the parameter s = 1, 2, 4 will be clarified later. As weshall see later, the fact that in each case the individual entry has a Gaussiandistribution (hence Gaussian ensemble) will be forced from the invariancehypothesis.

The choice of these measures are dictated by statistical physics:GOE(s = 1): time reversal, even spin;GUE(s = 2): no condition of time reversal;GSE(s = 4): time reversal, odd spin.Other possibilities (s 6= 1, 2, 4) are possible but some may not have direct

physical interpretation.In all cases, given the linear constraints, the space of matrices form a

finite dimensional linear space, and we can choose a complete set of linearlyindependent components. For example, in the real symmetric case, we cantake {Mij , 1 ≤ i ≤ j ≤ n}, and in the hermitian case, we can take

{Mii, 1 ≤ i ≤ n; Aij , Bij , 1 ≤ i < j ≤ n.}These variables can be taken as the (linear) coordinates for the matrix spaceunder consideration. We will denote the Lebesgue measure of the linearspace by dM . This measure is unique up to a multiplicative constant. Wehave the following theorem, which shows that the invariance condition im-posed on the measures µs will force it to be Gaussian.

Theorem 2.1.4. Let µs(dM) = fs(M)dM . Then the density functionhas the form

fs(M) = exp[−aTrH2 + bTrH + c

].

Here a, b, c are real numbers and a is positive.

Proof. See Mehta[6]. �

By shifting the origin, we can set b = 0. Thus

fs(M) = C exp[−aTrH2

].

We conclude from here that all linearly independent entries in a Gaussianensemble are Gaussian.

More detailed work will yield the following description:s = 1: the GOE µ1 is a probability measure on the space of real sym-

metric matrices such that the entries {Mij , 1 ≤ i ≤ j ≤ n} are linearly in-dependent Gaussian random variables of mean zero and the same variance.

s = 2: the GUE µ2 is a probability measure on the space of Hermitianmatrices such that the entries{

Mii, 1 ≤ i ≤ n;√

2Aij ,√

2Bij , 1 ≤ i < j ≤ n}

are independent Gaussian random variables of mean zero and the samevariance.

2.2. GENERALIZATIONS 5

s = 4: there is a similar description. The choices of the common varianceσ2 amounts to different normalization. The usual choices are σ2 = 1 andσ2 = 1/N .

2.2. Generalizations

Reference for this section: Anderson and Zeitouni[1].Many results in random matrix theory has the property of universality in

the sense that they do not dependent on particular choices of distributionsas long as they are suitably normalized. In particular, several importantresults (e.g., semicircle law) about Gaussian ensembles are valid for non-Gaussian distributions. Here we introduce the concept of generalized Wignermatrices. They are similar to Gaussian ensembles but with the Gaussianrandom variables replaced by general random variables satisfying certainmoment conditions.

A collection of generalized Wigner matrices is a set

MN ={MN

ij , 1 ≤ i ≤ j ≤ n}

, N = 1, 2, . . .

of independent random variables such that(1) MN is symmetric;(2) EMN

ij = 0;(3) E|MN

ij |2 = 1/N ;(4) for each l, there is Cl independent of N such that E|MN

ij |k ≤ Ck.Weaker conditions are possible. We can also define Wigner matrices

with Hermitian matrices.Invariance under group actions is required by physics, but independence

is introduced as a matter of convenience. We can generalize the setting inanother direction by removing the condition of independence, but compu-tational feasibility is the primary concern. We may consider for example

µ(dM) = f(M) dM,

where f(M) has the form

f(M) = exp [−TrG(M)]

with a suitably polynomial or other analytical function G(z). The caseG(z) = az2 corresponds to the Gaussian ensembles. Typical choice for G isa polynomial in z2 with positive coefficients.

Numerical and physical evidence show that the statistical properties ofeigenvalues are similar for a large class of choices of G. Of course G has tobe dependent on N to obtain proper scaling. For example, in the Gaussiancase,

G(z) =N

2· z2.

For this choice, the eigenvalues lie asymptotically in the interval [−√

2,√

2].


2.3. Haar measure

Another source of measures in random matrix theory are Haar measureson compact matrix groups.

A topological group is a topological space with group structure such that(a, b) 7→ ab−1 is continuous. Let G be such a group and let La : G → Gbe the left shift defined by La(b) = ab. The right shift Ra is defined byRa(b) = ba. A non-trivial measure µ on G is called left Haar measure ifL∗aµ = µ, i.e., µ(aE) = µ for all a ∈ G and E ∈ B(G).

Theorem 2.3.1. Every locally compact topological group has a left in-variant Radon measure. It is unique up to a constant multiplicative constant.

Similar theorem holds for right Haar measures.If G is compact, then a left Haar measure is also a right Haar measure.

This can be seen as follows. Let µ be a left Haar measure. Define

ν(E) =∫

Gµ(Eg)µ(dg).

Then it is easy to verify that ν is both left and right invariant. By theuniqueness the left and right Haar measures must be identical. We cannormalize it to be a probability measure.

Examples of compact matrix groups are orthogonal group O(N) and uni-tary group U (N). They are called the circular orthogonal ensemble (COE)and circular unitary ensemble (CUE). There is also circular symplectic en-semble (CSE), which is the induced Haar measure on the quotient groupU (2N)/Sp(N).

Random matrix theory studies asymptotic statistical properties of thesegroups as N → ∞. For example we can show that a suitably normalized afixed set of entries are asymptotically independent Gaussian.

2.4. Distribution of eigenvalues

Reference for this section: Tracy and Widom, Introduction to RandomMatrices, archived notes.

For the sake of discussion, we consider the GUE µ2 on H . We have

Ef(M) =1

CN

∫H

f(M)e−TrM2/2 dM.

CN is a normalization consider 2N/2CN is a universal constant.For the Gaussian ensembles and their generalizations, we mainly study

eigenvalue distributions, which are the only invariants in many cases. It isnatural to find the marginal distribution of eigenvalues.

For a matrix M of size N , let

λ(H) = λN = λN1 , λN

2 , . . . , λNN

2.4. DISTRIBUTION OF EIGENVALUES 7

be its set of eigenvalues. They are not ordered. Thus when we speak of afunction

f(λN ) = f(λN1 , · · · , λN

N )

of the eigenvalues, we assume that f is symmetric, i.e., invariant underpermutations of its variables.

On the matrix side, we consider class functions:

f(UMU−1) = f(M).

This means that f is constant on the orbit of the action by O. It is easyto see that if f is a class function on H , then there is a unique symmetricfunction f on RN such that

f(M) = f(λ(H)) = f(λN1 , · · · , λN

N ).

We have

Ef(M) =∫

Hf(M) =

∫RN

f(λ)p(λ) dλ.

We need to calculate the density function p(λ1, · · · , λN ).

Theorem 2.4.1. We have

ps(λ1, · · · , λN ) = CNse−

∑Ni=1 λ2

i

∏i<j

|λi − λj |s.

Proof. (s = 2) is the simplest case. We only need to prove the re-sult locally. Consider a neighborhood of a fixed Hermitian matrix. EveryHermitian matrix M can be written as

M = UDU∗,

where D = diag {λ1, . . . , λN} and U is unitary. We may assume that theeigenvalues are distinct. The set with equal eigenvalues has Lebesgue mea-sure zero and is closed. If we arrange the eigenvalues in the increasing order,then D is uniquely determined. The unitary matrix U is uniquely deter-mined up to a unitary diagonal matrix because of the freedom of choosing thephases of the eigenvectors of D. If we fix these phases in some way (i.e., thefirst nonzero component is required to be positive), then U is also uniquelydetermined. Therefore we have a one-to-one correspondence (D,U) 7→ Min a neighborhood of a Hermitian matrix with unequal eigenvalues.

We can choose the metric on H to be

ds2 = Tr(dM)2.

The measure m(dM) is the corresponding volume form. From M = UDU∗

we havedM = (dU)DU∗ + U(dD)U∗ + UD(dU∗).

Using (dU∗)U = −U∗(dU) we have

U∗(dM)U = U∗(dU)D −DU∗(dU) + dD = [U∗(dU), D] + dD.


The key observation is that the two parts are perpendicular:

Tr([U∗(dU), D](dD) = 0.

This is because D(dD) = (dD)D (D is diagonal) and therefore

Tr(U∗(dU)D(dD)) = Tr(U∗(dU)(dD)D) = Tr(DU∗(dU)(dD)).

It follows thatds2 = Tr([U∗(dU), D]2) + Tr(dD)2.

We have

Tr(dD)2 =N∑

i=1

(dλi)2.

If we write U∗(dU) = {dxij}, then it is clear that

[dx, D]ij = (dxij)(λj − λi)

and

Tr([U∗(dU), D]2) = 2n∑

i,j=1

∑1≤i<j≤n

(dxij)2|λj − λi|2.

It follows that the volume measure

m(dM) = Cm(dx)dλ1 · · ·λN

∏1≤i<j≤n

|λi − λj |2.

�

References: Hua[5] and Weyl[9].The corresponding results for the circular ensemble is

qNs = CNs

∏1≤i<j≤N

|e√−1θi − e

√−1θj |s.

The proof is similar.

2.5. Entropy consideration

See Mehta [6].We have mentioned that invariance under group actions is required by

physics, but independence is assumed as a matter of convenience. Thesetwo properties imply the Gaussian distribution for the components. TheGaussian components can also be obtained from the assumption that themeasure f(M)m(dM) should maximize the entropy

S = −∫

f(M) ln f(M) m(dM).

For the noncompact case, the Gaussian distribution for components can bederived based on this consideration. For the compact case, it is obvious thatthe Haar measure maximizes the entropy.

CHAPTER 3

Semicircle Law

3.1. Semicircle law

The semicircle law s is a distribution on the interval [−√

2,√

2] whosedensity function with respect to Lebesgue measure is

s(x) =√

2− x2

π.

In free probability theory this distribution plays the role similar to that ofthe normal distribution in probability theory.

The empirical distribution for eigenvalues is

L(MN ) =1N

N∑i=1

δλNi

.

The first natural questions is: How are the eigenvalues distributed? Orequivalently, How many eigenvalues lie in a fixed interval [a, b]? We showthat there is a (weak) law of large numbers here, i.e., L(MN ) convergesto the semicircle law in distribution. This means that as N increases, theprobability that L(MN ) lies in a ”neighborhood” of the semicircle law tendsto 1. The semicircle law is universal for Wigner matrices. We will discusstwo cases:

(1) GUE (Mehta’s approach);(2) real symmetric Wigner matrices.

We first discuss the GUE case. We will show that for reasonable functionsf ,

E〈f, LN 〉 → 〈f, ν〉.

If we take f = I[a,b]. Then the left side is

E[number of eigenvalues in [a,b]

N

].

Thus we can say that s(x) =√

2− x2/π is the density of the eigenvalues inthe sense that the average number of eigenvalues in [a, b] is approximately

N

∫ b

as(x) dx = Nν([a, b]).

9

10 3. SEMICIRCLE LAW

Note that a stronger form of the semicircle law takes the form

limN→∞

P {|〈f, LN 〉 − 〈f, ν〉| ≥ ε} = 0.

We will prove this for the case of real symmetric Wigner matrices.The corresponding result for the circular unitary ensemble is that the

eigenvalues will be distributed uniformly on the unit circle. We will provethis case as a warm-up and illustrate Mehta’s trick, which will be useful inthe GUE case in a more complicated form.

Recall the density function pN2(λ1, . . . , λN ) defined by

Ef(MN ) =∫

RN

f(λ)pN2(λ) dλ.

Here f is a class function on HN . It also denotes the corresponding sym-metric function on RN . Now consider

pN2(x) =∫

RN−1

pN2(x, λ2, · · · , λN ) dλ2 · · · dλN .

Let f be a bounded function. Then∫R

f(x)pN2(x) dx =∫

RN

f(λ1)p(λ1, λ2, · · · , λN ) dλ1dλ2 · · ·λN

= E

[1N

N∑i=1

f(λNi )

]= E〈f, LN 〉.

Let σN (x) =√

NpN2(√

Nx). Then∫R

σN (x) f(x) dx = E〈f, LN 〉,

where

LN =1N

N∑i=1

δλNi /

√N .

We will show that

limN→∞

∫R

etxσN (x) dx =∫

Retxs(x) dx.

3.2. Eigenvalue distribution for circular unitary ensemble

Reference for this section: Mehta [6].We derive a useful formula for qN2(θ1, · · · , θN ). Let θ = (θ1, · · · , θN ) for

simplicity. Recall the Vandermonde determinant

det(xj−1i ) =

∏1≤i<j≤N

(xi − xj).

We have

qN2(θ) = CN det(X(θ)T ) det(X(−θ)) = CN det(X(θ)T X(−θ)),

3.2. EIGENVALUE DISTRIBUTION FOR CIRCULAR UNITARY ENSEMBLE 11

whereX(θ) =

(e(i−1)

√−1θj

).

We haveX(θ)T X(−θ) = 2π {KN (θi, θj)} ,

where

KN (θi, θj) =12π

N∑k=1

e(k−1)√−1θie−(k−1)

√−1θj =

12π

N∑k=1

e(k−1)√−1(θi−θj).

Lemma 3.2.1. The following identity holds:

qN2(θ1, · · · , θN ) =1

N !det {KN (θi, θj)}1≤i,j≤N .

where

KN (θi, θj) =12π

N∑k=1

e(k−1)√−1(θi−θj).

Proof. We have proved the identity except for the value of the constant1/N !. This will be evaluated later. �

KN (θ1, θ2) is a reproducing kernel in the following sense:∫ 2π

0KN (θ1, θ)KN (θ, θ2) dθ = KN (θ1, θ2).

This can be checked easily using the fact that

KN (θi, θj) =N−1∑k=0

φk(θi)φk(θj),

whereφk(θ) =

1√2π

e√−1kθ, k ∈ Z

is an orthonormal system:∫ 2π

0φk(θ)φl(θ) dθ = δkl.

Proposition 3.2.2. (Mehta’s trick) Suppose that f(x, y) has the prop-erty that ∫

f(x, x) dx = C

and ∫f(x, z)f(z, y) dz = f(x, y).

LetJN (x1, · · · , xN ) = {f(xi, xj)}1≤i,j≤N .

Then∫det JN (x1, · · · , xN−1, xN ) dxN = (C + 1−N) detJN−1(x1, · · · , xN−1).


Proof. Let SN be the set of permutations on {1, . . . , N}. For a per-mutation σ ∈ SN denote

Iσ =∫

f(x1, xσ(1)) · · · f(xN , xσ(N)) dxN .

Then we have ∫det JN (x1, . . . , xN ) dxN =

∑σ∈SN

sgn(σ)Iσ.

We decompose SN into N disjoint sets according to σ(N). Let

Ti = {σ ∈ SN : σ(N) = i} .

Then

SN =N∑

i=1

Ti

and ∑σ∈SN

sgn(σ)Iσ =N∑

i=1

∑σ∈Ti

sgn(σ)Iσ.

Suppose that σ ∈ TN , i.e., σ(N) = N , then using the fact that∫f(xN , xN ) dxN = C,

we haveIσ = Cf(x1, xτ(1)) · · · f(xN−1, xσ(N−1)),

where τ is the restriction of σ on {1, . . . , N − 1}, i.e., τ(i) = σ(i) for i =1, . . . , N − 1. It is clear that σ 7→ τ is a one-to-one correspondence betweenTN and SN−1. Also sgn(σ) = sgn(τ). Therefore,∑

σ∈TN

sgn(σ)Iσ = C∑

τ∈SN−1

sgn(τ)f(x1, xτ(1)) · · · f(xN−1, xτ(N−1))

= CJN−1(x1, · · · , xN−1).

Now suppose that σ ∈ Ti and i 6= N , i.e., σ(N) = i 6= N . Let j =σ−1(N). Then σ(j) = N and in the term Iσ we need to integrate out∫

f(xj , xN )f(xN , xi) dxN = f(xj , xi)

and obtainIσ = f(x1, τ(1)) · · · f(xN−1, xτ(N−1)),

where τ ∈ SN−1 is the same as σ except that it takes j = σ−1(N) toi = σ(N). It is clear that σ 7→ τ is a one-to-one correspondence between Ti

and SN−1 and sgn(τ) = −sgn(σ). Therefore,∑σ∈Ti

sgn(σ)Iσ = −∑

τ∈SN−1

f(x1, τ(1)) · · · f(xN−1, xτ(N−1))

= −det JN−1(x1, · · · , xN−1).

3.3. HERMITE POLYNOMIALS 13

This equality holds for i = 1, . . . , N − 1. Putting things together, we have∑σ∈SN

sgn(σ)Iσ = (C −N + 1) detJN−1(x1, · · · , xN−1).

�

Applying this to the case f(x, y) = KN (x, y), we see that

C =∫ 2π

0KN (x, x) dx =

12π

∫ 2π

0N dx = N.

Hence,∫ 2π

0det ΘN (θ1, · · · , θm) dθm = (N −m + 1) detΘN (θ1, · · · , θm−1),

whereΘN (θ1, · · · , θm) = {KN (θi, θj)}1≤i,j≤m .

From this we have first of all,∫[0,2π]N

KN (θ1, · · · , θN ) dθ1 · · · dθN = N !.

This gives

qN2(θ1, · · · , θN ) =1

N !det {KN (θi, θj)}1≤i,j≤N .

For θ ∈ [0, 2π],

qN2(θ) =KN (θ, θ)

N=

12π

.

We have shown that the eigenvalues are uniformly distributed.From this simple case we have already seen the relation between random

matrix theory and the theory of orthogonal polynomials.

3.3. Hermite polynomials

Reference for this section:(1) Polya and Szego [7];(2) Anderson and Zeitouni [1].The Hermite polynomials are defined by

Hn(x) = (−1)nex2/2

(d

dx

)n

e−x2/2.

They are orthogonal with respect to the weight function e−x2/2 and∫R

e−x2/2Hn(x)2 dx =√

2πn!.

Lethn(x) =

1√√2π · n!

Hn(x)e−x2/4.


Then {hn} is an orthonormal system for L2(R). We the annihilation relation

H ′n(x) = nHn−1(x).

This can be seen from the fact that H ′n(x) is a polynomial of degree n−1 with

leading coefficient n and is orthogonal (with respect to the weight e−x2/2 toHk(x) for all k ≤ n − 2 (integration by parts). We also have the creationrelations

H ′n(x)− xHn(x) = −Hn+1.

This can be obtained directly from differentiating the definition of Hn(x).Equating the two expressions for H ′

n(x) we have the recurrence relation

Hn+1 − xHn(x) + nHn−1(x) = 0.

Differentiating the creation relation we have the differential equation

H ′′n(x)− xH ′(x) + nHn(x) = 0.

In terms of the Hermite functions, we have the Schrodinger equation for theHermite function

h′′n(x) +(

n +12− x2

4

)hn(x) = 0.

We will also need the Christoffel-Darboux identity:

N−1∑n=0

Hn(x)Hn(y)n!

=1

(N − 1)!HN (x)HN−1(y)−HN (y)HN−1(x)

x− y.

This can be proved by induction using the recurrence relation. In terms ofthe Hermite functions, we have

N−1∑n=0

hn(x)hn(y) =√

N · hN (x)hN−1(y)− hN (y)hN−1(x)x− y

.

Letting y → x, we have

1√N

N−1∑n=0

hn(x)2 = h′N (x)hN−1(x)− hN (x)h′N−1(x).

Using the creation relation to eliminate the derivatives we have

N−1∑n=0

hn(x)2 = NhN (x)2 −√

N(N + 1)hN−1(x)hN+1(x).

3.4. SEMICIRCLE LAW FOR GUE 15

3.4. Semicircle law for GUE

Reference for this section: Mehta [6].We extend the calculation for the circular unitary ensemble to GUE.

The first step is to derive a formula in terms of Hermite polynomials forpN2(x). Here x = (x1, · · · , xN ). Recall that

pN2(x) = CN exp

[−1

2

N∑i=1

|xi|2] ∏

1<i<j≤N

|xi − xj |2.

Again we have the connection with the Vandermonde determinant:∏1<i<j≤N

|xi − xj |2 = detX,

whereX =

{xi−1

j

}.

Let Hn(x) be the monomial Hermite polynomial of degree n. The ith rowof X is

xi−11 , xi−1

2 , · · · , xi−1N .

This row is the linear combination of the

Hi−1(x1), Hi−1(x2), · · · ,Hi−1(xN )

and the first, second, ..., and the (i− 1)th rows. Hence by the properties ofdeterminants, det X will not change if we replace Xij = xi−1

j by Hi−1(xj).Hence

pN2(x) = |det {hi−1(xj)} |2,where hi−1(xj) are the Hermite functions (see the previous section). Let

KN (x, y) =N∑

i=1

hi−1(x)hi−1(y).

We obtainpN2(x) = det {KN (xi, xj)}1≤i,j≤N .

It is clear that ∫R

KN (x, z)KN (z, y) dz = KN (x, y)

and ∫R

KN (x, x) dx = N.

Therefore we are in exactly the same situation as in the circular unitaryensemble. Applying Mehta’s lemma, we have

pN2(x) =1N

KN (x, x).


The density function is

σN (x) =√

NpN2(√

Nx) =KN (

√Nx,

√Nx)√

N.

From the last section we have

KN (x, x)√N

=1N

N−1∑i=0

|hi(x)|2 = h′N (x)hN−1(x)− hN (x)h′N−1(x).

Differentiating and use the differential equation for the Hermite functionswe have

d

dx

KN (x, x)√N

= −hN (x)hN−1(x).

Now we have ∫R

etxσN (x) dx =∫

Retx/

√N KN (

√Nx,

√Nx)√

Ndx

=1N

∫E

etx/√

NKN (x, x) dx.

Integrating by parts we have∫R

etxσN (x) dx =1t

∫R

etx/√

NhN (x)hN−1(x) dx.

In order to absorb the exponential factor, we return the the Hermite poly-nomials

hn(x) =1√√2πn!

e−x2/4Hn(x).

Using this and completing the square in the exponent, we have∫R

etxσN (x) dx =√

N

N !et2/2N

√2π

∫R

e−(x−t/√

N)2/2HN (x)HN−1(x) dx

=√

N

N !et2/2N

√2π

∫R

e−z2/2HN

(z +

t√N

)×

×HN−1

(z +

t√N

)dx.

We have made a change of variable

z = x− t√N

.

Using the annihilation relation

H ′n(x) = nHn−1(x)

we have

H(m)N (x) =

N !(N −m)!

HN−m(x).

3.5. 2-POINT CORRELATION FUNCTION 17

From this relation we have the Taylor expansion

HN (z + y) =N∑

m=0

(N

N −m

)HN−m(x)ym.

Using the orthogonal relation for the Hermite polynomials we can now com-pute it is now clear that the integral can be computed explicitly. The resultis ∫

RetxσN (x) dx = et2/2N

N−1∑m=0

1m + 1

(2m

m

)(N − 1) · · · (N −m)

Nm

t2m

(2m)!.

Letting N →∞ we obtain

limN→∞

∫R

etxσN (x) dx =∞∑

m=0

1m + 1

(2m

m

)t2m

(2m)!=

∫R

etxs(x) dx.

The semicircle law also holds for generalized Wigner matrices, see An-derson and Zeitouni [1].

3.5. 2-point correlation function

Reference for this section: Mehta [6].The n-point correlation function is defined to be

pNs(x1, · · · , xn) =∫

RN−n

pNs(x1, · · · , xn, xn+1, · · · , xN ) dxn+1 · · · dxN .

1-point correlation function is just the level density function σN (x). Then-point correlation function describes how two eigenvalues are related. Forexample, we have∫

R2

f(x1, x2) dx1dx2 = E

1N2

∑1≤i,j≤N

f(λNi , λN

j )

.

From Mehta’s lemma, we have the following identity

pN2(x1, · · · , xn) =(N − n)!

N !det {KN (xi, xj)}1≤i,j≤n ,

where

KN (x, y) =N−1∑i=0

hi(x)hi(y).

We will calculate the average number of pairs of eigenvalues whose nor-malized distances falls into a fixed interval. We only consider the Gaussianunitary ensemble. The surprising and still mysterious connection with thesimilar quantity for the zeros of the Riemann zeta function was discoveredby Montgomery and Dyson (see Section 3.6).


From the definition of the 2-point correlation function, it is clear that

E

#{

i, j ≤ N : γ1

√N ≤ λN

i − λNj ≤ γ2

√N

}N2

=

∫γ1

√N≤x−y≤γ2

√N

pN2(x, y) dxdy.

We have shown that

pN2(x, y) =1

N(N − 1)

∣∣∣∣KN (x, x) KN (x, y)KN (x, y) KN (y, y)

∣∣∣∣ ,

where

KN (x, y) =N∑

i=1

hi−1(x)hi−1(y).

Theorem 3.5.1. For the Gaussian unitary ensemble, we have

limN→∞

NpN2(√

2Nx,√

2Ny) = 1−∣∣∣∣sin(x− y)

x− y

∣∣∣∣2 .

Proof. Since

pN2(x, y) =KN (x, x)KN (y, y)−KN (x, y)2

N(N − 1),

the proof amounts to computing the limits

limN→∞

KN (√

Nx,√

Nx)√N

and

limN→∞

KN (√

Nx,√

Ny)√N

,

where

KN (x, y) =N−1∑i=0

hi(x)hi(y).

This can be accomplished again using Christoffel-Darboux formulaN−1∑n=0

hn(x)hn(y) =√

NhN (x)hN−1(y)− hN (y)hN−1(x)

x− y.

�

Hence, for 0 ≤ γ1 ≤ γ2,

limN→∞

E

#{

i, j ≤ N : γ1

√N ≤ λN

i − λNj ≤ γ2

√N

}N2

=

∫ γ2

γ1

[1−

∣∣∣∣sin r

r

∣∣∣∣2]

dr.

3.7. UNIVERSALITY 19

3.6. zeros of the Riemann zeta-function

The Riemann function is defined for Re s > 1 by

ζ(s) =∞∑

n=1

1ns

.

It is an analytic function on the half plane Re s > 1. It is well knownthat ζ(s) can be analytically continued to a meromorphic function on C andsatisfies the functional equation:

ξ(s) = ξ(1− s),

whereξ(s) = πs/2Γ

(s

2

)ζ(s).

From this functional equation and the know properties of the Gamma func-tion Γ (s) we know that ζ(s) has a simple pole of residue 1. Furthermore,the zeros of ζ(s) outside the critical strip 0 < Re s < 1 are simple zeros at−2,−4, . . . (trival zeros). Denote by N(T ) the number of zeros s = a +

√b

with 0 < a < 1 and 0 < b < T . Then

N(T ) =T

2πln

T

2π+ O(T ).

Thus the average spacing of the zeros is approximately T/2π. The Riemannconjecture states that all zeros in the critical strip lie on the line Res = 1/2.H. Let D(α, β) be the number of pairs (s, s′) of zeros s = 1/2+ iγ such that0 ≤ γ, γ′ ≤ T and (

2π

T

)α ≤ γ − γ′ ≤

(2π

T

)β.

H. Montgomery conjectured that as T →∞,

2πD(α, β)T lnT

→∫ β

α

[1−

∣∣∣∣sin r

r

∣∣∣∣2]

dr. (0 ≤ αβ)

There is an overwhelming collection of numerical evidence to supported bynumerical evidence. Therefore the 2-point correlation functions for the zerosof the Riemann zeta function and for the GUE are identical.

It is conjectured that the zeros of the Riemann zeta function is relatedto the spectrum of a self-adjoint operator and the similarly between the gapdistribution of the zeros on the one hand and the eigenvalues of the Gaussianunitary ensemble certainly makes this conjecture more tantalizing.

3.7. Universality

Reference for this section: Tracy and Widom [8].We assume a general potential function V . Let µV be the unique prob-

ability measure on R which minimizes the functional

IV (µ) =12

∫R

V (x) µ(dx) +∫

R2

ln1

|x− y|µ(dx)µ(dy)


(see Deift [2], Chapter 6). Then µV (dx) = f(x) dx is the eigenvalue dis-tribution function.

(1) Eigenvalue gaps. If f(x) > 0, then

limN→∞

P{

no eigenvalues in(

x, x +y

Nf(x)

)}= u(y),

where u(y) is described by the sine kernel and is independent of V and x.(2) Tracy-Widom law. If the density function f(x) vanishes like

const.√

x− a

at the extreme right point a of the support, then the largest eigenvalueλN → a in the sense that

limN→∞

P {|λN − a| ≥ ε} = 0.

We also have

limN→∞

P{

CN2/3(λN − a) ≤ s}

= FTW (s).

Here C is a universal constant.

3.8. General orthogonal polynomial ensembles

Reference for this section: Deift [2] and Mehta [6]. Consider a moregeneral form of N -point correlation function:

pN (λ) =1

ZN∆N (λ)2

N∏i=1

w(λi),

where∆N (λ) =

∏1≤i<j≤N

(λi − λj).

There is also a discrete version.Let

KN (x, y) =N∑

i=1

φi(x)φi(y)

be the reproducing kernel. We have

E

[1N

N∑i=1

f(λi)

]=

∫pN (x)f(x) dx

and

Var

[N∑

i=1

f(λi)

]= N

∫pN (x)|f(x)|2 dx−

∫ ∫KN (x, y)2f(x)f(y) dxdy.

We also have

P {no eigenvalues in [a, b]} = det(I −KN )L2[a,b],

3.8. GENERAL ORTHOGONAL POLYNOMIAL ENSEMBLES 21

where

KNf(x) =∫ b

aKN (x, y)f(y) dy.

For proofs, see Deift’s book. It is helpful here to read classical Fredholmtheory of integral equations. We are interested in the behavior of variousquantities as N →∞.

Other topics: random tilings.

CHAPTER 4

Large Deviations of Random Matrices

References:

1) A. Dembo and O. Zeitouni, Large Deviations Techniques and Appli-cations, (2nd edition), Springer (1998) [3].

2) A. Guionnet, Large Deviations and Stochastic Calculus for LargeRandom Matrices, lecture notes[4].

4.1. Definitions

We often encounter the following situation. Let X be a topological space(usually a metric space) and {Pn} be a sequence of probability measures onX. This sequence converges to a point mass at x0 ∈ X in the sense that forany open set O containing x0 we have

limn→∞

Pn {O} = 1.

Thus if F is a closed set not containing x0, then

limn→∞

Pn {F} = 0.

Large deviation theory investigates the speed of converging to zero of thesmall probabilities Pn {F}. Usually these probabilities converge to zero ex-ponentially in the sense that ln Pn {F} converges to zero at a polynomialrate. The following definitions give the usual form of large deviation results.

Definition 4.1.1. A nonnegative, lower semi-continuous real valuedfunction I on X is called a rate function. It is called a good rate functiongoodrate function if {x ∈ X : I(x) ≤ c} is compact for every 0 ≤ c < ∞.

Definition 4.1.2. Let {sn} be a sequence of positive numbers increasingto infinity. A sequence of probability measures {Pn} is said to satisfy thelarge deviation principle with rate I and speed sn if for every closed subsetF of X,

lim supn→∞

1sn

Pn {F} ≤ − infx∈F

I(x),

and for every open subset G of X,

lim infn→∞

1sn

Pn {G} ≥ − infx∈G

I(x).

23

24 4. LARGE DEVIATIONS OF RANDOM MATRICES

If the first relation holds only for compact subsets, we say that the sequencesatisfies the weak large deviation principle.

Typically the speed sequence is sn = n, but this is not the case forlarge deviations of random matrices. Of course the form of the rate functiondepends on how we number the sequence of probability measures.

Roughly the large deviation principle says that the probabilities near apoint x is asymptotically e−snI(x). Such large deviation results are oftenvery useful in applications of probability theory to analysis and physics.

In this section we make some general remarks about large deviationprinciples. In the next section we illustrate the theory with a typical case:large deviations of sums of i.i.d. random variables. We then discuss thelarge deviation of Gaussian random matrices.

For simplicity we will assume that X is a metric space.From the above definitions we see that verifying that a sequence of prob-

ability measures satisfying a large deviation principle usually consisting oftwo steps, namely the upper bound and the lower bound. Both steps involvecalculating asymptotic behavior of probabilities Pn {O} for small balls O.This statement can be made precise as follows.

Theorem 4.1.3. Suppose that for the probabilities Pn {B(x; r)} the fol-lowing two limits coincide:

lim infr→0

lim inf1sn

Pn {B(x; r)} lim supr→0

lim sup1sn

Pn {B(x; r)} = −I(x).

Then the sequence {Pn} satisfies the weak large deviation principle with therate function I(x).

The proof of this result is not difficult. We need the fact that everycovering of a compact set can be reduced to a finite covering. We also needthe following simple observation:

Pn {O1 ∪ · · · ∪Ol} ≤ l max1≤i≤l

Pn {Oi} .

To pass from weak to strong large deviations, one need to show thatprobabilities outside large compact sets can be ignored. In practice thefollowing result is often used.

Proposition 4.1.4. Suppose that {Pn} satisfies the weak large deviationprinciple with a rate function I and speed sequence {sn}. Suppose further{KN} is a sequence of compact sets such that

lim supN→∞

lim supn→∞

Pn {KcN} = −∞.

Then {Pn} also satisfies the large deviation principle with the same rate andspeed functions.

From Theorem 4.1.3 we see that to verify that a sequence of probabilitymeasures has a large deviation principle we need to estimate probabilities ofsmall balls from both below and above and to show that these two bounds

4.1. DEFINITIONS 25

are close in the asymptotic sense. It is usually easy to get upper bounds (bya simple Markov inequality argument, test function argument, etc). As arule to obtain a good lower bound is usually difficult and methods used forit vary greatly from one problem to another. Below we will discuss upperbounds in more general terms.

Applications of large deviation results are often done through the fol-lowing Varadhan’s lemma. This result is similar to the well known Laplacemethod in analysis.

Proposition 4.1.5. Let {Pn} satisfy the large deviation principle withrate function I and speed sequence sn. Let F : X → R be a bounded contin-uous function on X. Then we have

limn→∞

1sn

∫X

esnF (x)Pn(dx) = supx∈X

{F (x)− I(x)} .

There is a converse of the above result, which shows that the rate func-tion can be recovered from test functions.

Theorem 4.1.6. Let

L (F ) = limn→∞

1sn

∫X

esnF (x)Pn(dx).

Then {Pn} satisfies the large deviation function with the speed sequence sn

and the rate function

I(x) = supF∈Cb(X)

{F (x)−L (F )} .

The problem with applying this theorem to calculating the rate functionis that in most cases we can only calculate the limit in the theorem for asubclass C of Cb(X). For example, C could be the space of linear functionsif X is a vector space. Suppose that

IC (x) = supF∈C

{F (x)−L (F )} .

Then clearly I ≥ IC and we obtain a lower estimate for the rate function.In order to show that C is already sufficient, we need to show that IC alsogives the correct lower bound for the asymptotic probabilities Pn {G} foropen sets G.

If the space of linear functions is sufficient, the rate function has theform of a Legendre transform

I(x) = supt∈R

{t · x−L (t)} .

We will see that this type of rate functions when we discuss the case of sumsof i.i.d. random variables.


4.2. Large deviation of sums of i.i.d. random variables

Let {Xn} be a sequence of real valued i.i.d. random variables. Let

φX(t) = EetX .

We make the basic assumption that φX(t) is finite for all t. From thisassumption it is easy to show (by the dominated convergence theorem) thatφX is a smooth function. We assume that X is not degenerate, i.e., it is nota constant (almost surely). Let

Sn = X1 + · · ·+ Xn, Sn =Sn

n.

The law of large numbers shows that

P{Sn → EX

}= 1.

Let Pn be the law of Sn. Then Pn converges to the point mass at EX.Before reading the next lemma, it is helpful to take a look at the Ex-

amples 4.2.3 and 4.2.4 below.

Lemma 4.2.1. Let φ(t) = EetX be finite for all t ∈ R and

I(x) = supt∈R

{tx− lnφ(t)} .

(1) I is convex. (2) I is a good rate function. (3) I(x) = 0 if and only ifx = EX. (4) I is strictly increasing for x ≥ EX and strictly decreasing forx ≤ EX.

Proof. (1) Since I is the supremum of a set of linear functions, it mustbe convex.

(2) Since I is the supremum of a set of continuous functions, it is lowersemi-continuous. Letting t = 0, we see that I(x) ≥ 0 for all x. Therefore itis a rate function. We will show it is a good rate function in (4).

(3) By Jensen’s inequality we have

φ(t) = Eetx ≥ etEX .

It follows that

(4.2.1) tx− lnφ(t) ≤ t(x− EX).

This shows that I(EX) ≤ 0, hence I(EX) = 0. Conversely, suppose thatI(x) = 0, then

tx− lnφ(t) ≤ 0for all t ∈ R. For t ↓ 0 we have

x ≤ 1ln

φ(t) → EX.

For t ↑ 0 we have

x ≥ 1t

lnφ(t) → EX.

It follows that x = EX.

4.2. LARGE DEVIATION OF SUMS OF I.I.D. RANDOM VARIABLES 27

(4) Suppose that y > x > EX and I(x) < ∞. From (4.2.1) we see thatif x ≥ EX then

I(x) = supt≥0

{tx− lnφ(t)} .

There is a sequence tn ≥ 0 such that

tnx− lnφ(tn) → I(x).

It cannot be true that tn → 0 because otherwise I(x) = 0 and x = EX. Wecan therefore assume that tn ≥ t0 for all n. We have

I(y) ≥ tny − lnπ(tn) ≥ t0(y − x) + tnx− lnφ(tn) → t0(y − x) + I(x).

This shows that I(y) > I(x). A similar argument applies to the case x <EX.

Now we prove that I is a good rate function. It is enough to verify thatI(x) → ∞ as |x| → ∞. Let’s consider the case x ≥ EX. Suppose thatlimx→∞ I(x) = ∞ does not hold. Then there is a constant M such that

supx≥EX

supt≥0

{tx− lnφ(t)} ≤ M.

Thus by taking t = 1 we see that this cannot be true. �

Theorem 4.2.2. {Pn} satisfies the large deviation principle with thespeed sequence sn = n and the rate function

I(x) = supt∈R

{tx− lnφX(t)} .

Proof. (1) Upper bound. We use linear functions. Let x ≥ 0. We have

P{Sn ≥ x

}≤ e−ntxE

{etSn

}= e−ntxφ(t)n.

It follows that1n

P{Sn ≥ x

}≤ −(tx− φ(t)).

It follows that

(4.2.2) lim supn→∞

1n

P{Sn ≥ x

}≤ −I(x).

Now let F be a compact subset of [0,∞) and let⋃l

i=1(ai, bi) ⊃ F . Thenfrom

Pn {F} ≤ l max1≤i≤l

Pn {(ai, bi)} ,

we have immediately

lim supn→∞

1n

Pn {F} ≤ − inf1≤i≤l

I(ai).

Recall now that I is lower semi-continuous and ai can be made as close toF as we want. Therefore we have

lim supn→∞

1n

Pn {F} ≤ − infx∈F

I(x).


This can be easily extended to all closed sets by using (4.2.2) and the factthat I(x) →∞ as x →∞ (Part (4) of Lemma 4.2.1). Thus we have provedthe upper bound.

(2) Now we consider the lower bound. This requires a small technique.Suppose that we want to find the lower bound at x. The idea is to shift themean from EX to x. There is nothing to prove if I(x) = ∞. Therefore weassume that I(x) < ∞.

Suppose first that both {X < x} and {X > x} have positive probability.Then

tx− lnφ(t) = − ln Eet(X−x) →∞as |t| → ∞. Therefore there is a point s such that

I(x) = − inft∈R

{tx− lnφ(t)} = sx− lnφ(s).

Since φ is smooth, we also have x = (d lnφ(t)/dt) |t=s = 0, i.e.,

E{XesX

}E esX

= x.

We consider the probability

dQn

dP=

n∏i=1

esXi

φ(s)=

esSn

φ(s)n.

Then EQnXi = x. Now consider

P{Sn ∈ (x− ε, x + ε)

}= φ(s)nEQn

{e−tnSn ;Sn ∈ (x− ε, x + ε)

}≥ φ(s)ne−ns(x+ε)Qn

{Sn

n∈ (x− ε, x + ε)

}.

By the weak law of large numbers we have

limn→∞

Qn

{Sn

n∈ (x− ε, x + ε)

}→ 1.

It follows that

lim infn→∞

1n

P{Sn ∈ (x− ε, x + ε)

}≥ lnφ(s)− s(x + ε) ≥ −I(x)− sε.

The left side is an increasing function of ε, hence we can remove the termsinvolving ε on the right side and obtain

(4.2.3) lim infn→∞

1n

P{Sn ∈ (x− ε, x + ε)

}≥ −I(x).

Now we claim that the above inequality holds in general without assum-ing that P {X > x} and P {X < x} are positive. For example, if P {X > x} =0. Then X ≤ x with probability and we have

I(x) = − inft∈R

ln Eet(X−x) = − ln P {X = x}

andP

{Sn ∈ (x− ε, x + ε)

}≥ P {X = x}n .

4.2. LARGE DEVIATION OF SUMS OF I.I.D. RANDOM VARIABLES 29

Hence the inequality (4.2.3) also holds in this case.Now suppose that G is an open set in R. If x ∈ G, then there is an

interval (x− ε, x+ ε) containing x and is contained in G. Hence we have forany x ∈ G, hence

lim infn→∞

1n

P{Sn ∈ G

}≥ lim inf

n→∞

1n

P{Sn ∈ (x− ε, x + ε)

}≥ −I(x).

It follows that

lim infn→∞

1n

P{Sn ∈ G

}≥ − inf

x∈GI(x).

�

Example 4.2.3. If X has the standard normal distribution, then therate function

I(x) =x2

2.

This rate function is not hard to see directly. Each Sn/n has the normaldistribution N(0, 1/n), which has the density√

n

2πe−x2/2n.

The probability of a small neighborhood of x has the exponential ordere−x2/2n ∼ e−nI(x). Hence I(x) = x2/2.

Example 4.2.4. If X has the symmetric Bernoulli distribution

P {X = 1} = P {X = −1} =12,

then the rate function is given by

I(x) = (1 + x) ln(1 + x) + (1− x) ln(1− x), 0 ≤ x ≤ 1.

I(x) = ∞ if |x| > 1. This can be obtained by the formula

I(x) = supt∈R

{tx− lnφ(t)} .

It can also be obtained by calculating explicitly the distribution of Sn/n,which has the binomial distribution.

The above two examples shows that if we can find an explicity goodapproximation of the distribution of the probability measures {Pn} we canusually guess the rate function. Of course to show that the guess is correctneeds some technical work.

Remark 4.2.5. Cramer’s Theorem 4.2.2 holds without the assumptionthat EetX is finite for all t, though I in general is not a good rate function,see Dembo and Zeitouni [3].


4.3. Large deviation for the empirical distribution

Let {Xi} be a sequence of i.i.d. random variables as in the previoussection. The empirical distribution is

LN =1N

N∑i=1

δXi .

It is a probability measure on R. Thus LN is a P(R) valued random variable.There is also a large deviation principle for the laws of LN , see Andersonand Zeitouni [1]. The large deviation principle for SN can be obtained fromthat of LN by the contraction principle.

4.4. Some other interesting large deviation results

(1) Occupation measure large deviation of Brownian motion. Let B ={Bt} be a one dimensional Brownian motion starting from 0. The occupationmeasure LT of B is defined as follows

LT (A) =1T

∫ T

0IA(Bt) dt.

It is a random variable taking values in the space P(R) of probability mea-sures on R. The laws of LT satisfies the large deviation principle with thespeed function T and the rate function I defined as follows. Suppose thatµ ∈ P(R) absolutely continuous with respect to the Lebesgue measure withdensity function f . Suppose that

√f ∈ H1(R) (Sobolev space). Then

I(µ) =∫

R

∣∣∣∇√f∣∣∣2 =

14

∫R

|∇f |2

f.

If µ does not satisfy the above conditions we define I(µ) = 0.There are similar results for homogeneous Markov chains.(2) Path space large deviation for Brownian motion. Again let B be a

Brownian motion starting from 0. Define the scaled Brownian motion

BT (s) = BTs, 0 ≤ s ≤ 1.

Then for each fixed T , we can regard BT as a random variable taking valuesin the path space W (R) = C([0, 1]; R). As T ↓ 0, the laws of BT satisfies thelarge deviation principle with the speed function 1/T and the rate function

I(γ) =12

∫ 1

0|γ(s)|2 ds.

Again, when the right side is not defined, we let I(γ) = ∞.

4.5. REVIEW OF RESULTS CONCERNING RANDOM MATRICES 31

4.5. Review of results concerning random matrices

We consider distributions of eigenvalues of Gaussian orthogonal ensem-ble (GOE), Gaussian unitary ensemble (GUE), and the Gaussian symplecticensemble (GSE). Chapter 2 contains a detailed discussion on these classesof random matrices. For discussing large deviation properties of the eigen-value distributions, we need the explicit distributions derived there. Thedensity functions for the eigenvalues λ1, · · · , λN are given

ps(λ1, · · · , λN ) =1

ZNse−(Ns/4)

∑Ni=1 λ2

i

∏i<j

|λi − λj |s.

Here ZNs is the normalization constant and the parameter s has the follow-ing meaning:

1) s = 1 (Gaussian orthogonal ensemble);2) s = 2 (Gaussian unitary ensemble);3) s = 4 (Gaussian symplectic ensemble);

For example, the case s = 2 (GUE) corresponds to the N × N Wignerrandom Hermitian matrices

M ={Aij +

√−1Bij

},

where {Mii, 1 ≤ i ≤ N ;

√2Aij ,

√2Bij , 1 ≤ i < j ≤ N

}are i.i.d. Gaussian random variables with mean zero and variance 1/N .

We consider the empirical distribution of the eigenvalues

LN =1N

N∑i=1

δλNi

.

LN is a probability measure on R, i.e., LN ∈ P(R). Let s(dx) = s(x) dxbe the semicircle law, whose density function with respect to the Lebesguemeasure is

s(x) =√

2− x2

πI{|x|≤√2}.

The fact that the empirical distribution LN converges to the semicircle lawis a general phenomenon for random matrices and is not a special propertyof Gaussian ensembles. The result that LN converges to the semicircle lawcan be stated in various ways. In Chapter 3 we have shown that

E〈f, LN 〉 → 〈f, s〉for functions of the form f(x) = etx for any real t. This amounts to sayingthat the expectation of the Laplace transform of the law of LN convergesto the Laplace transform of the semicircle law. This result can be strength-ened in various ways. In its strongest form, by constructing an appropriateprobability space one can shown that LN converges almost surely to thesemicircle law (the unit mass at the semicircle law) in the space P(R).This corresponds to the strong law of large numbers in the case of sums of


i.i.d. independent random variables. But we do not need this strong formof convergence.

4.6. Equilibrium measure and the rate function

From now on we often drop the subscript s.Recall that the density function for the eigenvalues has the following

form:

pN (x1, · · · , xN ) =1

ZN

∏1≤i<j≤N

|xi − xj |2e−(Ns/4)∑N

i=1 |xi|2 .

This is the case where we known the density function explicitly but theunderlying space P(R) is a relatively complicated space. Note that we usethe topology of weak convergence on this space, i.e., µn → µ if and only iffor every bounded continuous function f ∈ Cb(R)

limn→∞

∫R

f dµn =∫

Rf dµ.

We now rewrite the density function in terms of the empirical distributionmeasure of the eigenvalues:

µ =1N

N∑i=1

δxi .

the density function can be written as

ln pN (x1, · · · , xN )N2

= − lnZN

N2+

s

2

∫R2\D

ln |x−y|µ(dx)µ(dy)−s

4

∫R|x|2µ(dx).

The problem we have to deal with is the fact that the logarithmic functionis unbounded at the origin. From this expression we can guess that for thelarge deviations of the eigenvalue empirical measures, the speed functionshould be sN = N2 and the rate function should be

I(µ) =s

4

∫R|x|2µ(dx) +

s

2

∫R2

ln1

|x− y|µ(dx)µ(dy)− 3s

8.

Here we have used the fact that

(4.6.1) limN→∞

lnZN

N2=

3s

8.

This limit can be proved by directly computation using the so-called Sel-berg’s integrals (see Mehta [6], p. 354). See also Remark 4.6.2.

It is often helpful to write the rate function in a more symmetric form:

I(µ) =s

2

∫R2

h(x, y) µ(dx)µ(dy)− 3s

8,

where

h(x, y) =x2 + y2

4+ ln |x− y|−1.

4.6. EQUILIBRIUM MEASURE AND THE RATE FUNCTION 33

Note that from

ln |x− y| ≤ ln(1 + |x− y| − 1) ≤ |x− y| − 1 ≤ |x|+ |y| − 1 ≤ x2 + y2

8+ 3

we have

(4.6.2) h(x, y) ≥ x2 + y2

8− 3.

Theorem 4.6.1. Let

I(µ) =s

2

∫R2

h(x, y) µ(dx)µ(dy)− 3s

8,

where

h(x, y) =x2 + y2

4+ ln |x− y|−1.

Then I is a good rate function on P(R) and I(µ) = 0 if and only if µ is thesemicircle law:

dµ

dx=√

2− x2

πI{|x|≤2}.

Proof. Let hM (x, y) = min {h(x, y),M}. Then hM is a bounded con-tinuous function on R2. Hence the function

µ 7→∫

R2

hR(x, y)µ(dx)µ(dy)

is continuous on P(R). This shows that I is the increasing limit of a se-quence of continuous functions, hence it must be lower semi-continuous. Wewill discuss the positivity of I later.

To show that I is a good rate function we need to show that {I ≤ C} iscompact for finite positive C, i.e., the set

(4.6.3){

µ ∈ P(R) :∫

R2

(h(x, y) + 3)µ(dx)µ(dy) ≤ C +3s

8+ 3

}is tight. From (4.6.2) we have for all (x, y) ∈ R2,

h(x, y) + 3 ≥ x2 + y2

8.

Hence,

µ ([−l, l]c) ≤ 2r

√C +

3s

8+ 3.

This shows that the set (4.6.3) is tight.From the fact that h ≥ −3 we have I(µ) ≥ −3s for any µ ∈ P(R).

Therefore there is a sequence {µn} such that

I(µn) → infµ∈P(R)

I(µ).


Since {I(µn)} is bounded, from what we have shown above, the set {µn} ispre-compact. Therefore there is a subsequence converging to a probabilitymeasure µ0. By the lower semi-continuity of I we must have

I(µ0) = infµ∈P(R)

I(µ),

i.e., the infimum is attained. The identification of µ0 as the semi-circle lawis not easy. We will reproduce the proof of in Deift [2], Chapater 6 in theend of this chapter. �

Remark 4.6.2. From the above theorem we see that (4.6.1) is equivalentto ∫

R2

h(x, y) s(x)s(y) dx dy =34.

This can be verified by direct computation.

4.7. Proof of the upper bound

Reference for this section: Guionnet [4].It is more convenient to deal with non-normalized distribution. Thus we

start with the measure on RN given by

Qn(dλ)ZNpN (λ) dλ,

where pN (λ) = pN (λ1, · · · , λN ) is the density function of the distribution ofthe eigenvalues and dλ = dλ1 · · · dλN is the Lebesgue measure on RN . Moreexplicitly, with dλ = dλ1 · · · , dλN the measure Qn(dλ) can be written as

exp

[−N2s

2

∫R2\D

h(x, y) LλN (dx)Lλ

N (dy)− Ns

4

∫R|x|2Lλ

N (dx)

]dλ.

Here

LλN =

1n

N∑i=1

δλi.

We now prove the large deviation upper bound. Let

hR(x, y) = min {h(x, y), R} .

Since with probability 1 the eigenvalues are distinct, the measure of thediagonal D ⊂ R2 is

(LλN × Lλ

N )(D) =1N

.

Hence for the integral in the exponent we have∫R2\D

h(x, y)LλN (dx)Lλ

N (dy) ≥∫

R2\DhR(x, y)Lλ

N (dx)LλN (dy)

≥∫

R2

hR(x, y)LλN (dx)Lλ

N (dy)− R

N.

4.7. PROOF OF THE UPPER BOUND 35

Now let F be a Borel subset of P(R). We have(4.7.1)

QN {LN ∈ F} ≤ C(N,R) exp[−N2s

2infµ∈F

∫R2

hR(x, y)µ(dx)µ(dy)]

,

where C(N,R) = (π/s)N/2 eNR. The first factor comes from the integral∫RN

exp[−Ns

4

∫R|x|2Lλ

N (dx)]

dλ =∫

RN

exp

[− s

4N

n∑i=1

|λi|2]

dλ.

Therefore we obtain an upper bound for an arbitrary set F ⊂ P(R):

lim supN→∞

1N2

ln Qn {LN ∈ F} ≤ −s

2infµ∈F

∫R2

hR(x, y)µ(dx)µ(dy).

To complete the proof of the upper bound we need to show that if F isclosed, then the truncation at R can be removed.

We first show that the laws of LN are tight. This allows us to assume toassume that the closed set is compact in the proof of the upper bound. Forthis purpose, we need to recall the fact that a subset C of P(R) is relativelycompact if and only if it is tight: For any positive ε there is a positive Msuch that

µ ([−M,M ]c) ≤ ε, for all µ ∈ C.

For each positive M let lM be a positive function such that lM (C) ↓ 0as C ↑ ∞. Let

KM = {µ ∈ P(R) : µ ([−C,C]c) ≤ lM (C) for all positive C} .

Each KM is a compact subset of P(R). The exponential tightness we arereferring to above is the following.

Lemma 4.7.1. Take lM (C) = 8(M + 3)/C2. Then

lim supM→∞

lim supN→∞

1N2

ln QN (CcM ) = 0.

Proof. Using the lower bound (4.6.2), we see that if µ ∈ KcM , that is,

if for all positive C,µ ([−C,C]c) ≥ lM (C),

then ∫R2

hR(x, y)µ(dx)µ(dy) = −3 + min{

R,C2

8

}lM (C)2.

Taking C =√

8R, we have∫R2

hR(x, y)µ(dx)µ(dy) = −3 + RlM (√

8R)2.

Now taking lM such that

lM (√

8R) =3 + M

R,


we have immediately from (4.7.1),

lim supN→∞

ln QN (LN ∈ KcM )) ≤ −Ms

2,

from which the exponential tightness follows immediately. �

We can now complete the proof of the upper bound. By the exponentialtightness, we may assume that F is a compact set. Since hM is a boundedcontinuous function, the function

µ 7→∫

R2

hR(x, y)µ(dx)µ(dy)

is a continuous function on P(R). Thus from (4.7.1) we have

lim supG↓{µ}

limN→∞

1N2

ln QN (LN ∈ G) ≤ −s

2

∫R2

hR(x, y)µ(dx)µ(dy).

This holds for all R we can let R →∞ to obtain

lim supG↓{µ}

limN→∞

1N2

ln QN (LN ∈ G) ≤ −s

2

∫R2

h(x, y)µ(dx)µ(dy).

The rest of the proof is routine. Suppose that F is a compact set. For anypositive ε and any µ ∈ F , there is an open set Gµ containing µ such that

limN→∞

1N2

ln QN (LN ∈ Gµ) ≤ −s

2

∫R2

h(x, y)µ(dx)µ(dy) + ε.

Since F is compact, there are finitely many µi ∈ F such that⋃l

i=1 Gµi ⊃ F .From

QN (LN ∈ F ) ≤ l max QN (LN ∈ Gµi)

we have

limN→∞

1N2

ln QN (LN ∈ F ) ≤ −s

2inf

1≤i≤l

∫R2

h(x, y)µ(dx)µ(dy) + ε.

Because each µi ∈ F and ε is arbitrary, we obtained the desired upperbound.

4.8. Proof of the lower bound

Reference for this section: Guionnet [4].We now prove the lower bound. By general theory (Section 1), it is

enough to show that for any open set G and any µ ∈ G, we have

lim infN→∞

1N2

QN (LN ∈ G) ≥ −∫

R2

h(x, y)µ(dx)µ(dy).

There is nothing to prove if I(µ) = ∞, so we assume that I(µ). Note thatunder this condition |x|2 is integrable with respect to µ and ln |x − y| isintegrable with respect to µ× µ.

4.8. PROOF OF THE LOWER BOUND 37

We first construct a discrete approximation of the measure µ. Becauseof the singularity of ln |x−y| at x = y, the measure µ does not charge singlepoints. Define a sequence of N points as follows:

xi = inf{

x : µ((−∞, x] ≥ i

N + 1

}, 1 ≤ i ≤ N.

There points are finite. We let x0 = −∞ and xN+1 = ∞. Thus R is dividedinto N + 1 non-overlapping intervals, each of which has meausre 1/(N + 1).By the continuity of x 7→ µ((−∞, x]) we have

µ((xi−1, xi]) =1

N + 1, 1 ≤ i ≤ N.

Let

µN =1N

N∑i=1

δxi .

Then it is easy to see that for any bounded continuous function f on R,

limN→∞

∫R

f(x)µN (dx) =∫

Rf(x)µ(dx).

This means that µN → µ. Since G is open, µN ∈ G for sufficiently large N .Let as before,

LλN =

1N

N∑i=1

δλi.

Consider the subset of RN defined by

(4.8.1) D(N, ε, x) : λ1 ≤ λ2 ≤ · · · ≤ λN and |xi − λi| ≤ ε.

Since µN ∈ G and G is open, if ε is sufficiently small, then LλN ∈ G. To

sum up for all sufficiently large N and sufficiently small ε, every LλN which

satisfies (4.8.1) is in G. It follows that

Qn (LN ∈ G) ≥∫

D(N,ε,x)

∏1≤i<j≤N

|λi − λj |s exp

[−Ns

4

N∑i=1

|λi|2]

dλ.

The above inequality holds for all sufficiently large N and sufficiently smallε. We estimate this integral. First we shift the center of the domain of inte-gration from x = (x1, · · · , xN ) to (0, · · · , 0). Letting D(N, ε) = D(N, ε, 0),we can rewrite the integral as

(4.8.2)∫

D(N,ε)

∏1≤i<j≤N

|xi − xj + λi − λj |s exp

[−Ns

4

N∑i=1

|xi + λj |2]

dλ.

Obviously the next thing to do is to separate xi’s from λi’s and to relate theabove integral to the integral

12

∫R|x|2µ(dx)−

∫R2

ln |x− y|µ(dx)µ(dy).


The N + 1 intervals [xi−1, xi] are non-overlapping, one and at most twoof them containing 0. If we consider the integral of |x|2 on these intervalsbut excluding one interval that contains 0 we find that

1N + 1

N∑i=1

(|xi|+ ε)2 ≤∫

R(|x|+ ε)2µ(dx).

Next, consider the integral of ln |x − y| on the square [x1, xN ]2. We divideit into non-overlapping rectangles

Iij = [xi, xi+1]× [xj , xj+1], 1 ≤ i, j ≤ N − 1.

If i ≤ j, then on Iij ,ln |x− y| ≤ ln |xi − xj+1|.

If j ≤ i, then on Iij ,ln |x− y| ≤ ln |xi+1 − xj |.

Therefore,∫[x1,xN ]2

ln |x− y|µ(dx)µ(dy) ≤ 2(N + 1)2

∑1≤i<j−1≤N−1

ln |xi − xj |

+1

(N + 1)2

N−1∑i=1

ln |xi − xi+1|.

Combining the two parts, we have12

∫R(|x|+ ε)2µ(dx) +

∫[x1,xN ]2

ln |x− y|µ(dx)µ(dy)

≥ 1(N + 1)2

12

N∑i=1

|xi|2 −2

(N + 1)2∑

1≤i<j−1≤N−1

ln |xi − xj |

− 1(N + 1)2

∑1≤i≤N−1

ln |xi − xi+1|.

This inequality relates discrete sums to continuous integrals of |x|2 andln |x− y|.

On the region D(N, ε) ⊂ RN the sequence {λi} is increasing, hencexi − xj and λi − λj always have the same sign. We have

|xi − xj + λi − λj | ≥ |xi − xj |and

|xi − xj + λi − λj | ≥ |λi − λj |.Now, for i < j−1 we use the first bound and for i+1 = j we use the bound

|xi − xi+1 + λi − λi+1| ≥ |xi − xi+1|1/2|λi − λi+1|1/2.

Using these bounds in the integral (4.8.2) we have

(4.8.3) QN (LN ∈ G) ≥ C(N, ε) exp[−(N + 1)2s

2· IN,ε(µ)

],

4.8. PROOF OF THE LOWER BOUND 39

where

IN,ε(µ) =12

∫R(|x|+ ε)2µ(dx)−

∫[x1,xN ]2

ln |x− y|µ(dx)µ(dy),

and

C(N, ε) =∫

DN,ε(0)

N−1∏i=1

|λi − λi+1|s/2dλ1 · · · dλN .

With this estimate, it is easy to complete the proof of the large deviationlower bound. We first show that the factor CN,ε does not matter. By scalingwe see that

C(N, ε) = ε(N−1)s/2+1C(N, 1).Next, we restrict integral on D(N, 1) to the region

2i− 12N

≤ λi ≤2i

2N, i = 1, · · · , N.

On this region, whose area is clearly (1/2N)N , the integrand has the boundN−1∏i=1

|λi − λi+1|s/2 ≥(

12N

)(N−1)s/2

.

It follows that

limN→∞

lnC(N, 1)N2

= 0.

On the other hand, by the integrability of |x|2 we have

limε→0

∫R(|x|+ ε)2µ(dx) =

∫R|x|2µ(dx).

By the integrability of ln |x− y| with respect to µ× µ and the fact that themeasure of the complement of the square [x1, xN ]2 is

4N2

+4N

→ 0,

we have

limN→∞

∫[x1,xN ]2

ln |x− y|µ(dx)µ(dy) =∫

R2


These two limits show that

limε→0

limN→∞

IN,ε(µ) =12

∫R|x|2 −

∫R2


Finally, in (4.8.3) letting N →∞ and then ε → 0 we have

lim infN→∞

1N2

ln QN (LN ∈ G)

≥ −s

2

[12

∫R|x|2µ(dx) +

∫R2

ln |x− y|µ(dx)µ(dy)]

.

This completes the proof of the lower bound for the large deviation of thelaw of LN .


4.9. Logarithmic potential and the semicircle law

Reference for this section: Deift [2].We discuss the functional

I(µ) =12

∫R|x|2µ(dx) +

∫R2

ln1

|x− y|µ(dx)µ(dx)− 3

4, µ ∈ P(R).

This is the logarithmic potential of the charge distribution µ under thepresence of the external field |x|2. The main result is that it is nonnegativeand attains zero only at the semicircle law.

If we admit the result

limN→∞

ZN

N2=

3s

8,

which can be verified by using the Selberg integrals, and the large deviationresults proved in the previous section, we can conclude that I(µ) ≥ 0 forall µ ∈ P(R). Furthermore, using the fact that the empirical distributionsof the eigenvalues converges to the semicircle law s, we can conclude thatI(s) = 0. We will not use these results, instead we give a complete, self-contained discussion of the functional I.

We haveI(µ) =

∫R2

h(x, y) µ(dx)µ(dy),

where

h(x, y) =x2 + y2

4+ ln

1|x− y|

.

As we have shown before,

h(x, y) ≥ x2 + y2

8− 3.

From this it is easy to prove (as we have done) that I is lower semicontinuousand attains its minimum. Our task is to show that the minimum value iszero and is attained only at the semicircle law.(to be continued)

Bibliography

1. Anderson and Zeitouni, Lectures on random matrices, Notes, 2004.2. Percy Deift, Orthogonal polynomials and random matrices: A riemann-hilbert approach,

Courant Lecture Notes in Mathematics, V. 3, AMS, New York, 2000.3. Amir Dembo and Ofer Zeitouni, Large deviation techniques and applications, Springer,

1998.4. Alice Guionnet, Large deviations of random matrices and stochastic calculus, Lecutre

notes, Lyon, 2004.5. L. K. Hua, Harmonic analysis on classical domains, AMS, Providence, RI, 1963.6. M. L. Mehta, Random matrices, Academic Press, New York, 1991.7. G. Polya and G. Szego, Orthogonal polynomials, American Mathematicsl Society, Prov-

idence, RI, 1952?8. C. Tracy and H. Widom, Introduction to random matrices, archived notes (1992?).9. H. Weyl, Classical groups, Princeton University Press, Princeton, NJ, 1946.

41

Index

Christoffel-Darboux identity, 14circular ensemble, 8contraction principle, 30correlation function, 17Cramer’s theorem, 27

empirical distribution, 30ensemble

circular, 8exponentially tight, 35

free probability, 1

Gaussian ensembles, 3Gaussian orthogonal ensemble (GOE), 3Gaussian symplectic ensemble (GSE), 3Gaussian unitary ensemble (GUE), 3generalized Wigner matrices, 17generalized Wigner matrix, 5group

topological, 6

Haar measure, 1Hermite polynomial, 13

large deviation principle, 23weak, 24

lawTracy-Widom, 20

Legendre transform, 25logarithmic potential, 40

Mehta’s trick, 11

nuclear physics, 1

orthogonal polynomials, 1

polynomialHermite, 13

potential theory, 1

quantum mechanics, 1

rate function, 23good, 23

reproducing kernel, 11Riemann conjecture, 19

Schrodinger equation, 14Selberg integrals, 40self-dual, 3semicircle law, 9, 31statistical mechanics, 1symplectic group, 3

time reversal, 4topological group, 6Tracy-Widom law, 20

universality, 5

weak convergence, 32weak large deviation principle, 24Wigner, 3Wigner matrix, 3

43

Documents

Lecture Notes on Random Matrices (for use within CSU) (not for distribution) · 2018-10-02 · 1.1. Random matrices 1 1.2. Sources 1 Chapter 2. Probability Measures on Matrix Groups