64
Bayesian and other approaches for extremes in one, many or infinitely many dimensions Anne Sabourin CNRS-LTCI, T´ el´ ecom ParisTech December 17 th , AppliBUGS, ENGREF, Paris 1/60

Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Bayesian and other approaches for extremes in one,many or infinitely many dimensions

Anne Sabourin

CNRS-LTCI, Telecom ParisTech

December 17th, AppliBUGS, ENGREF, Paris

1/60

Page 2: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Outline

Univariate extremesMaxima, Excesses above threshold, Point processInference

Multivariate extremesWhat are multivariate extremesDirichlet mixture model

Spatial extremes

Towards large (finite) dimensionIssuesAnomaly detectionSparse support and estimationEstimationResults

2/60

Page 3: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Extremes and risks

Figure : Tempete Xynthia, La Faute-Sur-Mer, 1er Mars 2010.

3/60

Page 4: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Extremes and risksQuantity of interest : X (water level, temperature, insurance claims, . . .)−→ time series Xt , t ≥ 0.

1930 1940 1950 1960 1970 1980

3.5

4.0

4.5

5.0

year

sea

leve

l

h: seuil d'intérêt

• Given a high threshold h, find p = P(X ≥ h)

• Given p (e.g. p = 10−4), find h such that P(X > h) ≤ p.

• Given a long duration T (e.g. 104), find P(maxt≤T Xt ≤ h).

4/60

Page 5: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Beyond the range of dataFor h max(Xobs), or T Tobs , or p 1/Nobs too small :

1950 2000 2050 2100 2150 2200

3.5

4.0

4.5

5.0

5.5

6.0

year

sea

leve

l

seuil d'intérêt

Empirical estimator P(X > h) =

Nobs∑i=1

IXi>h = 0 ! !

Need an extrapolation model

5/60

Page 6: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Three complementary approaches to understand extremes

1. Block maxima

2. Excesses above a high threshold

3. Point process above a high threshold

The three approaches are equivalent in theory

6/60

Page 7: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Extreme value analysis

Theory : Under minimal assumptions, distributions ofmaxima/excesses converge to a certain class.

Modelling : Use those limits to model maxima/excesses above largethresholds.

X : random object (variable / vector/ process) Xii .i .d .∼ X.

n∨i=1

Xid≈ Max-stable (n large)

[X∣∣ ‖X‖ ≥ r

] d≈ Generalized Pareto (r large)

n∑i=1

δ( in,

Xin

)

d≈ Poisson point process

7/60

Page 8: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Dependence issues (see part 2)

• Sationary time series, not time-indepedent −→ time declustering(separate clusters and keep the largest observation in each one)

• Non-stationary time series −→ difficult to identify

• Spatial dependence / dependence between features (temperature,precipitation, wind , . . . ) :

• Max-stable models −→ allows space extrapolation, with parametricassumptions on the dependence structure. Long range independencedifficult to handle.

• Multivariate extremes models (not necessarily spatial) −→ learn thedependence structure of extremes

8/60

Page 9: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Block Maxima

• Maximum of a “block” of size n :

Mn = maxt=1,...,n

Xtnotation

=n∨1

Xt .

e.g. : monthly maximum of concentration for an air pollutant.

• Dividing the dataset into m blocks → m maxima (Mn[1], . . . ,Mn[m]) ;Mn[i ] =

∨t∈ bloc i Xt

X

1 31 61 92 122 152 183 213 243 274 304 334 365

M[1]

M[2]

M[3]

M[12]

donnéesmaxs de blocslimites des blocs

• n ∗m data points (m blocks of size n) → only m maxima !

9/60

Page 10: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Peaks-Over-Threshold

X

u

10/60

Page 11: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Peaks-Over-Threshold

Y

0 5 10 15 20 25 30

Yu[1]

Yu[2]

Yu[3]

Yu[12]

u

• Excess : Y = X − u, for X > u.

• Conditional survival function

Fu(y) = P(X − u > y |X > u) =F (u + y)

F (u)

10/60

Page 12: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Point process (counting process) above threshold

Nu

([t1, t2]× [u,∞)

)=

t2∑t=t1

IXt>u

X

1 2 3 4 5 6 7 8 9 10

seuilXI_(X>u)

2 4 6 8 10

01

23

4

N(t

)

I(Xt>u)N(t)

Nu counts the points above u

11/60

Page 13: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Bi-variate counting process

N([t1, t2]× [u1, u2]) =n∑

t=1

I(t,Xt)([t1, t2]× [u1, u2])

X

1 2 3 4 5 6 7 8 9 10

u2

u1

t1 t2

Number of points in rectangle [t1, t2]× [u1, u2]

• N : random measure, integer-valued, finite on compacts.N = N(A), A ⊂ R2.

12/60

Page 14: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Extreme values theorem

Theorem (Fisher et Tipett, 1928 ; Gnedenko 1943)

(Xt)t≥0 i.i.d random variables, Mn = maxt≤n Xt . If there exists sequences(an)n > 0, (bn)n ∈ R, and a non-degenerate r.v. Y , s.t.

Mn − bn

an

d−→ Y ,

then, Y is a Generalized Extreme Value Distribution (GEV), i.e.

∀x ∈ R, P(Y ≤ x) := Gµ,σ,ξ(x) = e−[(1+ξ x−µσ

)+]−1/ξ

with ξ ∈ R, y+ = max(0, y), and Gµ,σ,0(x) = e−e− x−µ

σ .

13/60

Page 15: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Maxima ⇐⇒ excesses ⇐⇒ point processes[x∗, x

∗] = supp(G). Let H(x) = −log(G (x)).

Theorem

The following statements are equivalent :

• (Maxima) F n(anx + bn) −→n→∞

G (x) (x∗ < x < x∗)

• (Conditional law of excesses) ∃σ(t) > 0, s.t.

F (u + σ(u)x)

F (u)−→u→∞

H(x) (x∗ < x < x∗)

• (Point process)

Nn( · ) =n∑

i=1

δ( in,Xi−bn

an)( · ) d−→

n→∞N

where N is a Poisson PP on (0, 1)× (x∗, x∗), with intensity measure

λ(t1, t2)× (x ,∞) = (t2 − t1)H(x)

14/60

Page 16: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Inference methods, existing R packages

• Maximum likelihood, probability weighted moments

• R packages : ismev, extRemes, evd, fExtremes, EVIM,

Xtremes, HYFRAN, EXTREMES , . . .

http://cran.r-project.org/

• Gilleland, Ribatet, Stephenson, 2013 : A software review for extremevalue analysis

• Introductory book : Coles, 2001, An Introduction to StatisticalModeling of Extreme Values.

15/60

Page 17: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Assumption behind extreme values models• For block maxima : for n large enough, Mn ∼ Gµ,σ,ξ.

• For Peaks-over-threshold : for u large enough,[X − u|X > U] ∼ GPD(u, σ, ξ)

• Poisson process : for u, n large enough,N =

∑i=1n I i

n,Xn∼ PP(lebesgue⊗ Hu,σ,ξ)

• goal : estimate µ, σ, ξ

• Bayesian inference : put a prior on µ, σ, ξ. Allows to take into accountexpert knowledge / historical information

• Parent, Bernier, 2003, Bayesian POT modeling for historical data• Renard, 2011, A Bayesian hierarchical approach to regional frequency

analysis• . . .

. . . 16/60

Page 18: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

example : POT model for univariate data

• GPD model above threshold :Fu(y |ξ, σ) := P(X ≥ u + y |X ≥ u) ' Hξ,σ

• data : excesses (y1, . . . , yNu) above u ⇒(ξ, σ)

?

• u moderate : enough data above. ˆF (u) = Nun .

• F (u + y) ' ˆF (u)Fu(y)

L(y, ξ, σ) ∝ −Nu∏i=1

d

dyFu(yi |ξ, σ)

MLE estimators :ξ, σ ∈ argmax

σ,ξL(y, ξ, σ)

Or Bayesian estimation → posterior sample.

17/60

Page 19: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Outline

Univariate extremesMaxima, Excesses above threshold, Point processInference

Multivariate extremesWhat are multivariate extremesDirichlet mixture model

Spatial extremes

Towards large (finite) dimensionIssuesAnomaly detectionSparse support and estimationEstimationResults

18/60

Page 20: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Why multivariate extremes ?

• Spatial or multivariate data : what is the dependence betweenfeatures/ locations at extreme levels ?

• conditional probabilities of an excess : X = (X1,X2) ;

P(X2 > y |X1 > x)? (x large )

• probability of a joint excess :

P(X1 > x ,X2 > x)? (x large )

19/60

Page 21: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Multivariate extremes

• Random vectors Y = (Y1, . . . ,Yd ,) ; Yj ≥ 0

• Margins : Yj ∼ Fj , 1 ≤ j ≤ d (continuous).

• Preliminary step : Standardization Xj = 11−Fj (Yj )) , P(Xj > v) = 1

v .

• Goal : P(X ∈ A), A ’far from 0’ ?

u1 X1

X2

u2

A0 :

“Extremal region”

A

X

20/60

Page 22: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Fundamental assumption and consequences de Haan, Resnick, 70’s, 80’s

Intuitively : P(X ∈ tA) ' 1t P(X ∈ A)

Multivariate regular variation

0 /∈ A : t P(

X

t∈ A

)−−−→t→∞

µ(A), µ : Exponent measure

necessarily : µ(tA) = t−1µ(A) (Radial homogeneity)→ angular measure on the sphere : Φ(B) = µtB, t ≥ 1

General model for extremes

P(‖X‖ ≥ r ; X

‖X‖ ∈ B)' r−1 Φ(B)

Φ is finite : H := 1Φ(Sd ) Φ is a probability distribution : “angular

distribution”.

21/60

Page 23: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Polar decomposition and angular measure• Polar coordinates : R =

∑dj=1 Xj (L1 norm) ; W = X

R .

• W ∈ simplex Sd = w : wj ≥ 0,∑

j wj = 1.

Model above large radial threshold r0 Haan, Resnick, 77

P(R > r ,W ∈ B | R ≥ r0) ' r0

rH(B)

Angular measure H (+ margins) rules the joint distribution

1st component

2nd

com

pone

nt

B

x

0 1 5 10

01

510

w

B

W

R

X

Sd

22/60

Page 24: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Angular distribution

• H (+ margins) rules the joint distribution

X1

X2

x

0 1 5 10

01

510

w

x

X1

X2

x

0 1 5 10

01

510

w

x

• Non parametric family : Only one moment constraint on H,

Center of mass = Center of the simplex

• Statistician’s goal : estimate H(if possible, together with margins)

23/60

Page 25: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Estimating the angular measure (assume margins known)

• Non parametric estimation :empirical likelihood, Einmahl et al., 2001,

Einmahl, Segers, 2009, Guillotte et al, 2011.

Issues : asymptotic variance , Bayesian inference with d > 2, censoreddata

• Restriction to parametric family : Gumbel, logistic, pairwise Beta. . . Coles & Tawn, 91, Cooley et al., 2010, Ballani & Schlather, 2011 : Modeluncertainty ?

• Compromise : Mixture of countably many parametric models →Infinite-dimensional model

Dirichlet mixture model( Boldi, Davison, 2007 ; S. , Naveau, 2013)

24/60

Page 26: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Dirichlet distribution (“multivariate Beta”)

∀w ∈Sd , diri(w | µ, ν) =

Γ(ν)∏di=1 Γ(νµi )

d∏i=1

wνµi−1i .

• µ ∈Sd : location parameter (point on the simplex) : ‘center’ ;

• ν > 0 : concentration parameter.

0.00 0.35 0.71 1.06 1.41

w3 w1

w2

ex : µ = (0.15, 0.35, 0.5), ν = 9.

25/60

Page 27: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Dirichlet mixture model Boldi, Davison, 2007

• µ = µ · ,1:k , ν = ν1:k , p = p1:k ,

h(w|µ,ν,p) =k∑

m=1

pm diri(w | µ · ,m, νm)

• Moments constraint → on (µ, p) :

k∑m=1

pm µ.,m = (1

d, . . . ,

1

d) .

Weakly dense family (k ∈ N) in the space of admissible angular measures26/60

Page 28: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Bayesian inference

• Moments constraints (Boldi, Davison, 2007) → difficult to handle in aBayesian setting in dimension d > 2

• Re-parametrization S. , Naveau (13) : work with unconstrainedparameter in a product space

• Weak posterior consistency

• MCMC with reversible jumps manageable in moderate dimension (' 5).

• Inference with censored data S. , 2015, JMVA

27/60

Page 29: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

examples of results : mixing properties of the MCMCalgorithm

Simulated data, dimension 5,showing 2D predictive angular measure)

X2/(X2+X5)

0 1

0.0

1.8

X2/(X2+X5)

0 1

0.0

1.8

original algo (Boldi, Davison, 07) reparametrized (S., Naveau, 2014)

dots : true density ; thick line : posterior predictive ; grey area : posteriorquantiles

28/60

Page 30: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

predictive angular density

Simulated data, dimension 3,showing true / predictive angular density level sets)

0.00 0.35 0.71 1.06 1.41

w3 w1

w2

0.00 0.35 0.71 1.06 1.41

29/60

Page 31: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Connection with Point process/max-stable modelsSimilar to the 1-D case

tP(X

t∈ · ) −−−→

n→∞µ( · ) RV

⇐⇒[(‖X‖, X

‖X‖

) ∣∣∣ ‖X‖ > r]

d−−−→n→∞

ddr

r 2dH (POT CV)

⇐⇒

Nn( · ) :=n∑1

δ in,Xin

d−−−→n→∞

PP(ddr

r 2dH(w)) (PP CV)

⇐⇒∨ni=1Xi

nd−−−→

n→∞G max CV

where

G (x1, . . . , xn) = exp

−d

∫Sd

d∨j=1

wi

xidH(w)

max-stable distribution

30/60

Page 32: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Outline

Univariate extremesMaxima, Excesses above threshold, Point processInference

Multivariate extremesWhat are multivariate extremesDirichlet mixture model

Spatial extremes

Towards large (finite) dimensionIssuesAnomaly detectionSparse support and estimationEstimationResults

31/60

Page 33: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Pointwise maxima of continuous processes

0 2 4 6 8 10

0.0

0.1

0.2

0.3

X

Z

32/60

Page 34: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

possible limits (in distribution) of pointwise paxima• A continuous stochastic process Z on a domain D is max-stable if ∃

continuous normalizing functions (αn) > 0 and βn s.t.

Zd=

n∨i=1

Zi − βnαn

.

• N.B : equality in distribution : determined by finite-dimensionaldistributions [Zs1 , . . . ,Zsn ].• If (Xi ) are i.i.d. continuous processes and ∃an > 0, bn, s.t.

n∨i=1

Xi − bn

an

n→∞−−−→d

Z

then Z is a max-stable process.de Haan, 84, A spectral representation for max-stable processes.

• Idea for statistical modeling : same as before : use a max-stable family(or its equivalent for peaks-over-thresholds) as a model for maxima/excesses.

33/60

Page 35: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Spectral representation

• Standardization to unit frechet margins (probability integraltransform) → P(Z (s) ≤ x) = e−1/x , s ∈ D. Then Z is a simplemax-stable process.

Theorem (de Haan, 84, Penrose, 92)

Any non degenerate continuous, simple max-stable process Z (s) : inDdefined on a compact set D ⊂ Rd , satisfies

Z (x)d= ∨i≥1ζi fi (s)

where ζi , fi , i ≥ 1 points of a Poisson process on (0,∞)× C withintensity ζ−2 dζ dν(f ), for some locally finite measure ν on the space C of

continuous, ≥ 0 functions on D such that∫C f (s)dν(f ) = 1, s ∈ D.

Intuition : f = infinite-dim generalisation of an ”angle”.

• fi : rainstorm profile

• ζi : rainstorm intensity34/60

Page 36: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Models for spatial extremes Ribatet, Dombry, Oesting,

Spatial extremes and max-stable processes (To appear)

model for spectral functions fi → model for Z

• fi : random gaussian density (mean = center of the storm, variance :inverse width : Smith model (Smith, 90)

• fi (s) = max(0,Wi (s)), Wi : stationary Gaussian process : Schlathermodel (Schlather, 2002)

• More flexible (and difficult to simulate until recently (Kabluchko et.al. , 2009) : Brown-Resnick process, Extremal-t process (Opitz, 2013)

Smith, Schlather, Brown-Resnick, Extremal-t

35/60

Page 37: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Inference for max-stable processes

• c.d.f. for a finite number of location :Fs1,...,sd (x1, . . . , xd) = e−V (x1,...,xd ) → likelihood expression is a d th

derivative, huge number of terms !→ use

• composite likelihood, Padoan et.al, 2010, Likelihood-based inference for

max-stable processes, can be Bayesian (Cooley, Davison, Ribatet, 11)• concurrent extremes : conditioning on the underlying spectral function

(hitting scenario), Dombry, Eyi-Minko 2013

• Implementation of standard methods in package spatialExtremes

(Ribatet, 15)

• Bayesian hierarchical model : Reich, Shaby, 2013.

• Peaks-over-Threshold framework (Thibaud et.al., 2013)

36/60

Page 38: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Outline

Univariate extremesMaxima, Excesses above threshold, Point processInference

Multivariate extremesWhat are multivariate extremesDirichlet mixture model

Spatial extremes

Towards large (finite) dimensionIssuesAnomaly detectionSparse support and estimationEstimationResults

37/60

Page 39: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

If no spatial structure but large number of features

• Example : collection of air pollutants, network data (features of theconnection requests), blood toxines, . . .

Issues in large (≥ 10) dimension for sandard multivariate models :

• MCMC convergence would take ages.

• Implicit assumption in many model : “All variables must beconcomitantly large”, (or some pre-specified susbets, as in the logisticmodel) :not reasonable in a spatial context (localized storms, affecting onlysome subsets of variables) :

38/60

Page 40: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

End of this talk joint work with Nicolas Goix and Stephan Clemencon

Dimension reduction in multivariate extremesExhibit sparsity ?

Error (‘uncertainty’) assessmentFinite sample Bounds on the error ? (not Bayesian . . .)

Anomaly detection in ‘extreme’ data’Extremes’ = points located in the tail of the distribution.

What does ‘normal’ mean among extremes ?

39/60

Page 41: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

End of this talk joint work with Nicolas Goix and Stephan Clemencon

Dimension reduction in multivariate extremesExhibit sparsity ?

Error (‘uncertainty’) assessmentFinite sample Bounds on the error ? (not Bayesian . . .)

Anomaly detection in ‘extreme’ data’Extremes’ = points located in the tail of the distribution.

What does ‘normal’ mean among extremes ?

39/60

Page 42: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

What is Anomaly Detection (AD) ?

• Training step 1 : Learn a profile characterizing ‘normal’ behavior,e.g. approximate support.

• Training step 2 : Build a decision function→ ‘normal’ region around the profile.

• Step 3 : with new data :Anomalies = points outside the ‘normal region’

Applications : Public health, network intrusions, finance, surveillance40/60

Page 43: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

What is Anomaly Detection (AD) ?

• Training step 1 : Learn a profile characterizing ‘normal’ behavior,e.g. approximate support.

• Training step 2 : Build a decision function→ ‘normal’ region around the profile.

• Step 3 : with new data :Anomalies = points outside the ‘normal region’

Applications : Public health, network intrusions, finance, surveillance40/60

Page 44: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

What is Anomaly Detection (AD) ?

• Training step 1 : Learn a profile characterizing ‘normal’ behavior,e.g. approximate support.

• Training step 2 : Build a decision function→ ‘normal’ region around the profile.

• Step 3 : with new data :Anomalies = points outside the ‘normal region’

Applications : Public health, network intrusions, finance, surveillance40/60

Page 45: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Multivariate EVT for Anomaly detection

• If ‘normal’ data are heavy tailed, there may be extreme normal data.

How to distinguish between large anomalies and normal extremes ?

• Yet : no multivariate AD algorithm has a specific treatment formultivariate extreme data

• Our goal (from an AD point of view) : Improve performance ofstandard AD algorithms on extremal regions using MEVT.→ reduce # false positives

41/60

Page 46: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Recall Multivariate extremes

• Random vectors Y = (Y1, . . . ,Yd ,) ; Yj ≥ 0

• Margins : Yj ∼ Fj , 1 ≤ j ≤ d (continuous).

• Preliminary step : Standardization Xj = 11−Fj (Yj )) , P(Xj > x) = 1

x .

• Goal : P(X ∈ A), A ’far from 0’ ?

u1 X1

X2

u2

A0 :

“Extremal region”

A

X

42/60

Page 47: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Fundamental assumption and consequences de Haan, Resnick, 70’s, 80’s

Intuitively : P(X ∈ tA) ' 1t P(X ∈ A)

Multivariate regular variation

0 /∈ A : t P(

X

t∈ A

)−−−→t→∞

µ(A), µ : Exponent measure

necessarily : µ(tA) = t−1µ(A) (Radial homogeneity)→ angular measure on the sphere : Φ(B) = µtB, t ≥ 1

General model for extremes

P(‖X‖ ≥ r ; X

‖X‖ ∈ B)' r−1 Φ(B)

43/60

Page 48: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Angular measure

• Φ rules the joint distribution of extremes

X1

X2

X1

X2

• Asymptotic dependence : (X1,X2) may be large together.

vs

• Asymptotic independence : only X1 or X2 may be large.

No assumption on Φ : non-parametric framework.

44/60

Page 49: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Multivariate extremes in large dimension ?

• Flexible multivariate models for moderate dimension (d ' 5)

Dirichlet Mixtures (Boldi,Davison 07 ; S., Naveau 12), Logistic family (Stephenson

09, Fougeres et.al, 13), Pairwise Beta (Cooley et.al) . . .

• Theory for angular measure (dependence) estimation : asymptotic,d = 2, rates under second order conditions

(Einmahl, 01) Empirical likelihood (Einmahl, Segers 09)

• High dimension ? (d 1) :

• Spatial → max-stable models (parametric)• Non spatial → ? ?

(multiple air pollutants, assets, features for AD . . .)• Theory for integrated versions (tail dependence function)

Asymptotic normality (Einmahl et. al., 12, 15) (parametric case),

Finite sample bounds (Goix et. al, 15)

6→ structure of extremes (which components may be large together)

45/60

Page 50: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

It cannot rain everywhere at the same time

(daily precipitation)

(air pollutants)

46/60

Page 51: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Towards high dimension

• Reasonable hope : only a moderate number of Xj ’s may besimultaneously large → sparse angular measure

• Our goal from a MEVT point of view :

Estimate the (sparse) support of the angular measure(i.e. the dependence structure).

Which components may be large together, while the other are small ?

• For MEVT modeling : recover the asymptotically dependent groups ofcomponents → use simplified model.

• for AD : support = normal profile→ anomalies = points ‘far away’ from the support.

47/60

Page 52: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Sparse angular support

Full support : Sparse supportanything may happen (X1 not large if X2 or X3 large)

Where is the mass ?

Subcones of Rd+ : Cα = x 0, xi ≥ 0 (i ∈ α), xj = 0 (j /∈ α), ‖x‖ ≥ 1

α ⊂ 1, . . . , d.48/60

Page 53: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Support recovery + representation

• Ωα, α ⊂ 1, . . . , d : partition of the unit sphere

• Cα, α ⊂ 1, . . . , d : corresponding partition of x : ‖x‖ ≥ 1• µ-mass of subcone Cα : M(α) (unknown)

• Goal : learn the 2d − 1-dimensional representation (potentially sparse)

M =(M(α)

)α⊂1,...,d,α 6=∅

• M(α) > 0 ⇐⇒features j ∈ α may be large together while the others are small.

49/60

Page 54: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Identifying non empty edgesIssue : real data = non-asymptotic : Xj > 0.

Cannot just count data on each edge :Only the largest-dimensional sphere has empirical mass !

50/60

Page 55: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Identifying non empty edges

Fix ε > 0. Affect data ε-close to an edge, to that edge.

Ωα → Ωεα = w : wi > ε (i ∈ α), wj < ε (j /∈ α).

Cα → Cεα = t Ωεα, t ≥ 1.

→ New partition of the input space, compatible with non asymptotic data.

51/60

Page 56: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Empirical estimator : Counts the standardized points in Cεα, far from 0.

Algorithm

data : Yi , i = 1, . . . , n, Yi = (Xi ,1, . . . ,Yi ,d).

• Standardize : Xi = 11−Fj (Yi,j )

, with Fj(Yi ,j) =rank(Yi,j )−1

n

• Natural estimator

Φn(Ωα) = µn(Cεα) =n

kPn(X ∈ n

kCεα).

−→ M = (Φn(Ωα), α ⊂ 1, . . . , d) 52/60

Page 57: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Sparsity in real datasetsData=50 wave direction from buoys in North sea.(Shell Research, thanks J. Wadsworth)

53/60

Page 58: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Finite sample error boundVC-bound adapted to low probability regions (see Goix et. al. , 2015)

Theorem

If the margins Fj are continuous and if the density of the angular measureis bounded by M > 0 on each subface,There is a constant C s.t. for any n, d, k, δ ≤ e−k , ε ≤ 1/4,with probability ≥ 1− δ,

maxα|Φn(Ωα)− Φ(Ωα)| ≤Cd

(√1

kεlog

d

δ+ Mdε

)+ Bias n

k,ε(F , µ).

Bias : using non asymptotic data to learn about an asymptotic quantity

Regular variation ⇐⇒ Biast,ε −−−→t→∞

0

• Existing litterature (d = 2) : 1/√

k.• Here : 1/

√kε+ Mdε. Price to pay for biasing estimator with ε.

OK if ε k →∞, ε→ 0.Choice of ε : cross-validation or ‘ε = 0.1’ 54/60

Page 59: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Tools for the proof

1. VC inequality for small probability classes (Goix et.al., 2015)

→ max deviations ≤ √p × (usual bound)

2. Apply it on VC-class of rectangles kn R(x , z , α), x , z ε

→ p ≤ dk

εn

3. Approach Cεα with such rectangles → error ≤ d√ε

4. Approach µ(Cα) with µ(Cεα) → error ≤ dε(bounded angular density).

55/60

Page 60: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Results : support recovery

• Asymmetric logistic, d = 10, dependence parameter α = 0.1→ Non asymptotic data (not exactly Generalized Pareto)

• K randomly chosen (asymptotically) non-empty faces.

• parameters : k =√

n, ε = 0.1

• Additional (heuristic) step : eliminate faces supportingless than 1% of total mass.

# sub-cones K 10 15 20 30 35 40 45 50

Aver. # errors 0.01 0.09 0.39 1.82 3.59 6.59 8.06 11.21(n=5e4)

Aver. # errors 0.06 0.02 0.14 0.98 1.85 3.14 5.23 7.87(n=15e4)

56/60

Page 61: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Algorithm DAMEX (Detecting Anomalies with MultivariateExtremes)

Anomaly = new observation ‘violating the sparsity pattern’ :observed in empty or light subcone.

Scoring function : for x such that v ∈ Cεα,

sn(x) =1

‖v‖φn(Ωε

α) 'x large P (X ∈ Cα, ‖X‖ > x)

57/60

Page 62: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Conclusion

• Adequate notion of ‘sparsity’ for MEVT : sparse angular measure

• Empirical estimation ( → algorithm) to learn this sparse asymptoticsupport from non-asymptotic, non sparse data.

• Finite sample error bounds (tools from statistical learning theory)

• Applications :• Immediate application to AD• View towards multivariate extreme (or spatial ?) modeling :

use sparsity information to build a simplified model, need to doclustering ?

(ongoing work, Mael Chiapino)

• Question : can we detect sparsity in a Bayesian framework ?ideas welcome. . .

58/60

Page 63: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Some references

• Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection :A survey, 2009

• E. Chautru. Dimension reduction in multivariate extreme value analysis, 2015

• J. H. J. Einmahl , J. Segers. Maximum empirical likelihood estimation of thespectral measure of an extreme-value distribution, 2009.

• J. H. J. Einmahl, A. Krajina, J. Segers. An M-estimator for tail dependencein arbitrary dimensions, 2012.

• J.H.J Einmahl, A. Kiriliouk, A. Krajina, J. Segers. An M-estimator of spatialtail dependence, 2015

• N. Goix, A. Sabourin, S. Clemencon. Learning the dependence structure ofrare events : a non-asymptotic study, 2015

• FT Liu, Kai Ming Ting, Zhi-Hua Zhou. Isolation forest, 2008

• Y. Qi. Almost sure convergence of the stable tail empirical dependencefunction in multivariate extreme statistics, 1997

• S.J. Roberts. Novelty detection using extreme value statistics, 1999

59/60

Page 64: Bayesian and other approaches for extremes in one, many or …genome.jouy.inra.fr/applibugs/applibugs.15_12_17... · 2016-06-07 · 0 1 5 10 0 1 5 10 w X1 X2 x l l l l l l l l l l

Some references

• M. O. Boldi, A. Davison. A mixture model for multivariate extremes.

• J. H. J. Einmahl, A. Krajina, J. Segers. An m-estimator for tail dependencein arbitrary dimensions, 2012.

• N. Goix, A. Sabourin, S. Clemencon. Learning the dependence structure ofrare events : a non-asymptotic study, to appear.

• S. Resnick. Extreme Values, Regular Variation, Point Processes, 1987

• A. Sabourin, P. Naveau. Bayesian Dirichlet mixture model for multivariateextremes : A re-parametrization, 2014.

• A. Sabourin. Semi-parametric modeling of excesses above high multivariatethresholds with censored data, 2015

• A. Sabourin, B. Renard, Combining regional estimation and historicalfloods : a multivariate semi-parametric peaks-over-threshold model withcensored data, 2015

• J. Tawn. Modelling multivariate extreme value distributions, 1990.

• D. Van Dyk, X.Meng. The art of data augmentation, 2001

60/60