50
Sub-Gaussian Estimators of the Mean of a Random Matrix with Entries Possessing Only Two Moments Stas Minsker University of Southern California July 21, 2016 ICERM Workshop

Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

  • Upload
    hatuong

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Sub-Gaussian Estimators of the Mean of a Random Matrix withEntries Possessing Only Two Moments

Stas MinskerUniversity of Southern California

July 21, 2016

ICERM Workshop

Page 2: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Simple question: how to estimate the mean?

Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).

Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.

Solution: compute µn := 1n

n∑j=1

Xj , take

CInorm(α) =

[µn − σ0

√2

√log(1/α)

n, µn + σ0

√2

√log(1/α)

n

]

Page 3: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Simple question: how to estimate the mean?

Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).

Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.

Solution: compute µn := 1n

n∑j=1

Xj , take

CInorm(α) =

[µn − σ0

√2

√log(1/α)

n, µn + σ0

√2

√log(1/α)

n

]

Page 4: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Simple question: how to estimate the mean?

Assume that X1, . . . ,Xn are i.i.d. N (µ, σ20).

Problem: construct CInorm(α) for µ with coverage probability ≥ 1− 2α.

Solution: compute µn := 1n

n∑j=1

Xj , take

CInorm(α) =

[µn − σ0

√2

√log(1/α)

n, µn + σ0

√2

√log(1/α)

n

]

Coverage is guaranteed since

Pr

(∣∣µn − µ∣∣ ≥ σ0

√2 log(1/α)

n

)≤ 2α.

Page 5: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"

Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that

EX = µ, Var(X) ≤ σ20?

Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α

length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))

No additional assumptions on Π are imposed.

Remark: guarantees for the sample mean µn = 1n

n∑j=1

Xj is unsatisfactory:

Pr

(∣∣µn − µ∣∣ ≥ σ0

√(1/α)

n

)≤ α.

Does the solution exist?

Page 6: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"

Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that

EX = µ, Var(X) ≤ σ20?

Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α

length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))

No additional assumptions on Π are imposed.

Remark: guarantees for the sample mean µn = 1n

n∑j=1

Xj is unsatisfactory:

Pr

(∣∣µn − µ∣∣ ≥ σ0

√(1/α)

n

)≤ α.

Does the solution exist?

Page 7: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

P. J. Huber (1964): “...This raises a question which could have been asked already by Gauss,but which was, as far as I know, only raised a few years ago (notably by Tukey): whathappens if the true distribution deviates slightly from the assumed normal one?"

Going back to our question: what if X1, . . . ,Xn are i.i.d. copies of X ∼ Π such that

EX = µ, Var(X) ≤ σ20?

Problem: construct CI for µ with coverage probability ≥ 1− α such that for any α

length(CI(α)) ≤ (Absolute constant) · length(CInorm(α))

No additional assumptions on Π are imposed.

Remark: guarantees for the sample mean µn = 1n

n∑j=1

Xj is unsatisfactory:

Pr

(∣∣µn − µ∣∣ ≥ σ0

√(1/α)

n

)≤ α.

Does the solution exist?

Page 8: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

Answer (somewhat unexpected?): Yes!

Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]

Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:

G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1

|G1|∑

Xi∈G1

Xi

. . . . . .

Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1

|Gk |∑

Xi∈Gk

Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )

Claim:

Pr

(|µ∗ − µ| ≥ 7.7σ0

√log(e/α)

n

)≤ α

Page 9: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

Answer (somewhat unexpected?): Yes!

Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]

Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:

G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1

|G1|∑

Xi∈G1

Xi

. . . . . .

Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1

|Gk |∑

Xi∈Gk

Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )

Claim:

Pr

(|µ∗ − µ| ≥ 7.7σ0

√log(e/α)

n

)≤ α

Page 10: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

Answer (somewhat unexpected?): Yes!

Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]

Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:

G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1

|G1|∑

Xi∈G1

Xi

. . . . . .

Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1

|Gk |∑

Xi∈Gk

Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )

Claim:

Pr

(|µ∗ − µ| ≥ 7.7σ0

√log(e/α)

n

)≤ α

Page 11: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Example: how to estimate the mean?

Answer (somewhat unexpected?): Yes!

Construction: [A. Nemirovski, D. Yudin ‘83; N. Alon, Y. Matias, M. Szegedy ‘96; R. Oliveira, M. Lerasle ‘11]

Split the sample into k = blog(1/α)c+ 1 groups G1, . . . ,Gk of size ' n/k each:

G1︷ ︸︸ ︷X1, . . . ,X|G1|︸ ︷︷ ︸µ1:= 1

|G1|∑

Xi∈G1

Xi

. . . . . .

Gk︷ ︸︸ ︷Xn−|Gk |+1, . . . ,Xn︸ ︷︷ ︸µk := 1

|Gk |∑

Xi∈Gk

Xi︸ ︷︷ ︸µ∗=µ∗(α):=median(µ1,...,µk )

Claim:

Pr

(|µ∗ − µ| ≥ 7.7σ0

√log(e/α)

n

)≤ α

Then take

CI(α) =

[µ∗ − 7.7σ0

√log(e/α)

n, µ∗ + 7.7σ0

√log(e/α)

n

]

Page 12: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Idea of the proof:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

µ8µ1 µ. . . . . . . . . . . .

|µ− µ| ≥ s =⇒ at least half of events {|µj − µ| ≥ s} occur.

Page 13: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Improve the constant?

O. Catoni’s estimator (2012), “Generalized truncation”: let α > 0

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2),

and define µ vian∑

j=1

ψ(θ(Xj − µ)

)= 0.

Page 14: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Improve the constant?

O. Catoni’s estimator (2012), “Generalized truncation”: let α > 0

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2),

and define µ vian∑

j=1

ψ(θ(Xj − µ)

)= 0.

Truncation τ(x) = (|x | ∧ 1)sign(x) satisfies a weaker inequality

− log(1− x + x2) ≤ τ(x) ≤ log(1 + x + x2)

!1 0 1

!1

0

1

Page 15: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Improve the constant?

n∑j=1

ψ(θ(Xj − µ)

)= 0.

Intuition: for small θ > 0,

n∑j=1

ψ(θ(Xj − µ)

)'

n∑j=1

θ(Xj − µ) = 0

=⇒ µ '1n

n∑j=1

Xj

Page 16: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Improve the constant?

n∑j=1

ψ(θ(Xj − µ)

)= 0.

The following holds: set θ∗ =√

2 log(1/α)n

1σ0

. Then

|µ− µ| ≤(√

2 + o(1))σ0

√log(1/α)

n

with probability ≥ 1− 2α.

Page 17: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

A natural question: is it possible to extend presented techniques to the multivariate mean?

Motivation: PCA

Genes mirror geography within Europe, J. Novembre et al, Nature 2008.

Mathematical framework:

Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.

Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥

Op

is small.

Sample covariance

Σn =1n

n∑j=1

Yj Y Tj

is very sensitive to outliers.

Page 18: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensionsA natural question: is it possible to extend presented techniques to the multivariate mean?Motivation: PCA

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

71

71.1

71.2

71.3

71.4

71.5

71.6

71.7

71.8

71.9

72

=⇒

0

1

2

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

Genes mirror geography within Europe, J. Novembre et al, Nature 2008.Mathematical framework:

Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.

Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥

Op

is small.Sample covariance

Σn =1n

n∑j=1

Yj Y Tj

is very sensitive to outliers.

Page 19: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

A natural question: is it possible to extend presented techniques to the multivariate mean?

Motivation: PCAGenes mirror geography within Europe, J. Novembre et al, Nature 2008.

The direction of the PC1 axis and its relative strength may reflect aspecial role for this geographic axis in the demographic history ofEuropeans (as first suggested in ref. 10). PC1 aligns north-northwest/south-southeast (NNW/SSE, 216 degrees) and accounts forapproximately twice the amount of variation as PC2 (0.30% versus0.15%, first eigenvalue 5 4.09, second eigenvalue 5 2.04). However,caution is required because the direction and relative strength of thePC axes are affected by factors such as the spatial distribution ofsamples (results not shown, also see ref. 9). More robust evidencefor the importance of a roughly NNW/SSE axis in Europe is that, inthese same data, haplotype diversity decreases from south to north(A.A. et al., submitted). As the fine-scale spatial structure evident inFig. 1 suggests, European DNA samples can be very informativeabout the geographical origins of their donors. Using a multi-ple-regression-based assignment approach, one can place 50% of

individuals within 310 km of their reported origin and 90% within700 km of their origin (Fig. 2 and Supplementary Table 4, resultsbased on populations with n . 6). Across all populations, 50% ofindividuals are placed within 540 km of their reported origin, and90% of individuals within 840 km (Supplementary Fig. 3 andSupplementary Table 4). These numbers exclude individuals whoreported mixed grandparental ancestry, who are typically assignedto locations between those expected from their grandparental origins(results not shown). Note that distances of assignments fromreported origin may be reduced if finer-scale information on originwere available for each individual.

Population structure poses a well-recognized challenge for disease-association studies (for example, refs 11–13). The results obtainedhere reinforce that the geographic distribution of a sample is impor-tant to consider when evaluating genome-wide association studies

–0.03 –0.02 –0.01 0 0.01 0.02 0.03–0.03

–0.02

–0.01

0

0.01

0.02

0.03

Italy

Germany

France

UK

SpainPortugal

0 1,000 2,000 3,000

–0.010

0

0.010

0.020

Geographic distance betweenpopulations (km)

Med

ian

gene

tic c

orre

latio

n

PC

1a

b c

French-speaking SwissGerman-speaking SwissItalian-speaking Swiss

FrenchGermanItalian

Nor

th–s

outh

in P

C1–

PC

2 sp

ace

East–west in PC1–PC2 space

PC2

Figure 1 | Population structure within Europe. a, A statistical summary ofgenetic data from 1,387 Europeans based on principal component axis one(PC1) and axis two (PC2). Small coloured labels represent individuals andlarge coloured points represent median PC1 and PC2 values for eachcountry. The inset map provides a key to the labels. The PC axes are rotatedto emphasize the similarity to the geographic map of Europe. AL, Albania;AT, Austria; BA, Bosnia-Herzegovina; BE, Belgium; BG, Bulgaria; CH,Switzerland; CY, Cyprus; CZ, Czech Republic; DE, Germany; DK, Denmark;ES, Spain; FI, Finland; FR, France; GB, United Kingdom; GR, Greece; HR,

Croatia; HU, Hungary; IE, Ireland; IT, Italy; KS, Kosovo; LV, Latvia; MK,Macedonia; NO, Norway; NL, Netherlands; PL, Poland; PT, Portugal; RO,Romania; RS, Serbia and Montenegro; RU, Russia, Sct, Scotland; SE,Sweden; SI, Slovenia; SK, Slovakia; TR, Turkey; UA, Ukraine; YG,Yugoslavia. b, A magnification of the area around Switzerland froma showing differentiation within Switzerland by language. c, Geneticsimilarity versus geographic distance. Median genetic correlation betweenpairs of individuals as a function of geographic distance between theirrespective populations.

NATURE | Vol 456 | 6 November 2008 LETTERS

99 ©2008 Macmillan Publishers Limited. All rights reserved

good explanation for non-experts:https://faculty.washington.edu/tathornt/SISG2015/lectures/assoc2015session05.pdf

Mathematical framework:

Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.

Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥

Op

is small.Sample covariance

Σn =1n

n∑j=1

Yj Y Tj

is very sensitive to outliers.

Page 20: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

A natural question: is it possible to extend presented techniques to the multivariate mean?

Motivation: PCA

Genes mirror geography within Europe, J. Novembre et al, Nature 2008.

Mathematical framework:

Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.

Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥

Op

is small.

Sample covariance

Σn =1n

n∑j=1

Yj Y Tj

is very sensitive to outliers.

Page 21: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

A natural question: is it possible to extend presented techniques to the multivariate mean?

Motivation: PCA

Genes mirror geography within Europe, J. Novembre et al, Nature 2008.

Mathematical framework:

Y1, . . . ,Yn ∈ Rd , i.i.d. EYj = 0, EYj Y Tj = Σ.

Goal: construct Σ, an estimator of Σ such that∥∥∥Σ− Σ∥∥∥

Op

is small.

Sample covariance

Σn =1n

n∑j=1

Yj Y Tj

is very sensitive to outliers.

Page 22: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.

Better approach – replace the usual median by the geometric median.

x∗ = med(x1, . . . , xk ) := argminy∈Rd

k∑j=1

‖y − xj‖.

Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.

Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.

Page 23: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.

Better approach – replace the usual median by the geometric median.

x∗ = med(x1, . . . , xk ) := argminy∈Rd

k∑j=1

‖y − xj‖.

Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.

Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.

Page 24: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.

Better approach – replace the usual median by the geometric median.

x∗ = med(x1, . . . , xk ) := argminy∈Rd

k∑j=1

‖y − xj‖.

Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.

Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.

Page 25: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Extensions to higher dimensions

Naive approach: apply the "median trick" (or Catoni’s estimator) coordinatewise.Makes the bound dimension-dependent.

Better approach – replace the usual median by the geometric median.

x∗ = med(x1, . . . , xk ) := argminy∈Rd

k∑j=1

‖y − xj‖.

Still some issues:1 does not work well for small sample sizes;2 yields bounds in the wrong norm.

Alternatives: Tyler’s M-estimator, Maronna’s M-estimator; guarantees are limited to specialclasses of distributions.

Page 26: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix functions

f : R 7→ R, A = AT = UΛUT , then

f (A) = Uf (Λ)UT , f (Λ) = f

λ1

. . .λd

=

f (λ1)

. . .f (λd )

Page 27: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Construction of the estimator

X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.

No additional assumptions.

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define

Σn =1nθ

n∑j=1

ψ(θXj )

For example, if Xj = Yj Y Tj , we get

Σn =1nθ

n∑j=1

ψ(θYj Y T

j

)Intuition: for small θ, ψ(θx) ' θx , hence

Σn ' Sample mean + o(θ)

Page 28: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Construction of the estimator

X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.

No additional assumptions.

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define

Σn =1nθ

n∑j=1

ψ(θXj )

For example, if Xj = Yj Y Tj , we get

Σn =1nθ

n∑j=1

ψ(θYj Y T

j

)Intuition: for small θ, ψ(θx) ' θx , hence

Σn ' Sample mean + o(θ)

Page 29: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Construction of the estimator

X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.

No additional assumptions.

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define

Σn =1nθ

n∑j=1

ψ(θXj )

For example, if Xj = Yj Y Tj , we get

Σn =1nθ

n∑j=1

ψ(θYj Y T

j

)Note that

ψ(θYj Y T

j

)= ψ(θ‖Yj‖2

2)Yj

‖Yj‖2

Y Tj

‖Yj‖2

is easy to compute.

Intuition: for small θ, ψ(θx) ' θx , hence

Σn ' Sample mean + o(θ)

Page 30: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Construction of the estimator

X ∈ Rd×d - symmetric random matrix, X1, . . . ,Xn ∈ Rd×d – i.i.d. copies of X , E‖X‖2F <∞.

No additional assumptions.

− log(1− x + x2/2) ≤ ψ(x) ≤ log(1 + x + x2/2), θ > 0, define

Σn =1nθ

n∑j=1

ψ(θXj )

For example, if Xj = Yj Y Tj , we get

Σn =1nθ

n∑j=1

ψ(θYj Y T

j

)Intuition: for small θ, ψ(θx) ' θx , hence

Σn ' Sample mean + o(θ)

Page 31: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Σn =1nθ

n∑j=1

ψ(θXj)

Theorem (M., 2016)

X1, . . . ,Xn - i.i.d. Assume that σ2 ≥ ‖EX 2‖. Let θ =√

2 log(d/α)n

, then

∥∥∥Σn − EX∥∥∥ ≤ σ√2 log(d/α)

n

with probability ≥ 1− 2α.

For example, in covariance estimation σ2 =∥∥∥E‖Y‖2

2 YY T∥∥∥.

Page 32: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Theorem (M., 2016)

X1, . . . ,Xn - i.i.d. Assume that σ2 ≥ ‖EX 2‖. Let θ =√

2 log(d/α)n

, then

∥∥∥Σn − EX∥∥∥ ≤ σ√2 log(d/α)

n

with probability ≥ 1− 2α.

Compare to:

Theorem (Matrix Bernstein inequality, Tropp ‘11)

X ,X1, . . . ,Xn ∈ Rd×d - i.i.d., σ20 =

∥∥E(X − EX)2∥∥, ‖X‖ ≤ M. Then for all 0 < α < 1,

∥∥∥1n

n∑j=1

Xj − EX∥∥∥ ≤ max

(2σ0

√log(d/α)

n,

43

M log(d/α)

n

)

with probability ≥ 1− 2α.

Page 33: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Further improvements: Xj 7→ Xj + S,

Σ(S) = S +1nθ

n∑j=1

ψ(θ(Xj − S)

)︸ ︷︷ ︸

'EX−S

.

"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.

Iterate...

S∞ = S∞ +1nθ

n∑j=1

ψ(θ(Xj − S∞)

)︸ ︷︷ ︸

=0

Page 34: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Further improvements: Xj 7→ Xj + S,

Σ(S) = S +1nθ

n∑j=1

ψ(θ(Xj − S)

)︸ ︷︷ ︸

'EX−S

.

"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.

Iterate...

S∞ = S∞ +1nθ

n∑j=1

ψ(θ(Xj − S∞)

)︸ ︷︷ ︸

=0

Page 35: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Further improvements: Xj 7→ Xj + S,

Σ(S) = S +1nθ

n∑j=1

ψ(θ(Xj − S)

)︸ ︷︷ ︸

'EX−S

.

"Ideal choice" S = EX is unavailable =⇒ use the initial estimator Σn in place of S.

Iterate...

S∞ = S∞ +1nθ

n∑j=1

ψ(θ(Xj − S∞)

)︸ ︷︷ ︸

=0

Page 36: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Theorem (M., 2016)

Assume that σ20 ≥ ‖E(X − EX)2‖. Let θ =

√2 log(d/α)

n1σ0

, and

1nθ

n∑j=1

ψ(θ(Xj − S∞)

)= 0.

Assume that n is large enough (n & d3). Then S∞ exists and

∥∥∥S∞ − EX∥∥∥ ≤ Cσ0

√log(d/α)

n

with probability ≥ 1− α.

Page 37: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Numerical results

Y1, . . . ,Yn ∈ R100,

Σ =

10

51

. . .1

100

Yi,j ∼ symmetric Pareto-type distribution with 4 moments.

Page 38: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Numerical results

Histograms over 500 replications: n = 100.

1 2 3 4 5 6 7 8 9 10 110

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Error

Fre

quency

Sample covariance estimator

Robust covariance estimator

Sample covariance error

‖Sn− Σ‖/‖Σ‖

Robust estimator error

‖Σn − Σ‖/‖Σ‖

Page 39: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Numerical results

Histograms over 500 replications: n = 100.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Error

Fre

quency

Sample covariance estimator

Robust covariance estimator

‖u1(Σn)u(Σn)T− u1(Σ)u1(Σ)T‖

‖u1(Sn)u1(Sn)T− u1(Σ)u1(Σ)T‖

Page 40: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Numerical results

Histograms over 500 replications: n = 1000.

0 1 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Error

Fre

quency

Sample covariance estimator

Robust covariance estimator

Robust estimator error

‖Σn − Σ‖/‖Σ‖

Sample covariance error

‖Sn− Σ‖/‖Σ‖

Page 41: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Numerical results

Histograms over 500 replications: n = 1000.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Error

Fre

quency

Sample covariance estimator

Robust covariance estimator

‖u1(Σn)u(Σn)T− u1(Σ)u1(Σ)T‖

‖u1(Sn)u1(Sn)T− u1(Σ)u1(Σ)T‖

Page 42: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix Completion

Observe some entries of the ratings matrix

A0 =

movie 1 movie 2 . . . movie n

user 1 ∗ ∗ . . . ∗... . . . . . . . . .

...user k ∗ ∗ . . . ∗

Question: can we predict the unobserved entries?

Page 43: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix Completion

X ={

ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d

}.

X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form

Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)

where ξj , j = 1, . . . , n is additive noise.

E(YX) = 1d2 A0, hence natural estimator of A0 is

A =d2

n

n∑j=1

Yj Xj .

Incorporate low rank assumption:

Aτ = argminA∈Rd×d

[‖A− A‖2

F

d2+ τ‖A‖1

]

Page 44: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix Completion

X ={

ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d

}.

X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form

Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)

where ξj , j = 1, . . . , n is additive noise.

E(YX) = 1d2 A0, hence natural estimator of A0 is

A =d2

n

n∑j=1

Yj Xj .

Incorporate low rank assumption:

Aτ = argminA∈Rd×d

[‖A− A‖2

F

d2+ τ‖A‖1

]

Page 45: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix Completion

X ={

ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d

}.

X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form

Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)

where ξj , j = 1, . . . , n is additive noise.

E(YX) = 1d2 A0, hence natural estimator of A0 is

A =d2

n

n∑j=1

Yj Xj .

Incorporate low rank assumption:

Aτ = argminA∈Rd×d

[‖A− A‖2

F

d2+ τ‖A‖1

]

Page 46: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix Completion

X ={

ej (d)eTk (d), 1 ≤ j ≤ d , 1 ≤ k ≤ d

}.

X1, . . . ,Xn - independent sample from Π := Unif(X ), and observations Yj , j = 1, . . . , n havethe form

Yj = tr (X Tj A0) + ξj , (“noisy matrix entry”)

where ξj , j = 1, . . . , n is additive noise.

E(YX) = 1d2 A0, hence natural estimator of A0 is

A =d2

n

n∑j=1

Yj Xj .

Incorporate low rank assumption:

Aτ = argminA∈Rd×d

[‖A− A‖2

F

d2+ τ‖A‖1

]

Page 47: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix completion

What if noise ξj is heavy-tailed (only Var(ξj ) <∞)?

Replace A with a "robust" estimator

R =d2

n∑j=1

ψ(θYjH(Xj )

)and

Rτ = argminA∈Rd×d

[‖A− R‖2

F

d2+ τ‖A‖1

].

Here, H(X) =

(0 X

X T 0

)is the so-called self-adjoint dilation.

Page 48: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix completion

What if noise ξj is heavy-tailed (only Var(ξj ) <∞)?

Replace A with a "robust" estimator

R =d2

n∑j=1

ψ(θYjH(Xj )

)and

Rτ = argminA∈Rd×d

[‖A− R‖2

F

d2+ τ‖A‖1

].

Here, H(X) =

(0 X

X T 0

)is the so-called self-adjoint dilation.

Page 49: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Matrix completionWhat if noise ξj is heavy-tailed (only Var(ξj ) <∞)?

Replace A with a "robust" estimator

R =d2

n∑j=1

ψ(θYjH(Xj )

)and

Rτ = argminA∈Rd×d

[‖A− R‖2

F

d2+ τ‖A‖1

].

Here, H(X) =

(0 X

X T 0

)is the so-called self-adjoint dilation.

Theorem (M., 2016)Take

τ = Const ·√

t + log 2dnd

,

then1

d2

∥∥∥Rτ −H(A0)∥∥∥2

F≤(

1 +√

22

)2d · 2rank(A0)

n

√t + log 2d

with probability ≥ 1− e−t .

Page 50: Sub-Gaussian Estimators of the Mean of a Random … Estimators of the Mean of a Random Matrix with Entries Possessing Only ... to emphasize the similarity to the geographic map of

Thank you for your attention!