Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Extreme Value TheoryFitting Models
Investigation of an Automated Approach to ThresholdSelection for Generalized Pareto
Kate R. Saunders
Supervisors: Peter Taylor & David Karoly
University of Melbourne
April 8, 2015
Extreme Value TheoryFitting Models
Outline
1 Extreme Value Theory
2 Fitting Models
Extreme Value TheoryFitting Models
Problem
What are the climate processes that drive extreme rainfall?(El Nino Southern Oscillation, Interdecadal Pacific Oscillation)
How do these drivers differ at different timescales; sub-daily, daily,consecutive day totals?
Extreme Value TheoryFitting Models
Data
Extreme Value TheoryFitting Models
Extreme Value Theory
Extreme Value TheoryFitting Models
Block Maxima
Extreme Value TheoryFitting Models
Block Maxima
Let X1, X2, ... , Xn be a sequence of i.i.d. random variables withdistribution function F . Define Mn = max{X1, X2, . . . , Xn}.
(Xi might be daily rainfall observations and M365 the annual maximumrainfall.)
Pr(Mn ≤ x) = Pr(X1 ≤ x , . . . ,Xn ≤ x)
= Pr(X1 ≤ x)× · · · × Pr(Xn ≤ x)
= F (x)n.
As n → ∞, the distribution of the Mn converges to a generalisedextreme value distribution.
Extreme Value TheoryFitting Models
Generalized Extreme Value Theorem (Fisher-Tippett-Gnendenko)
If there exists sequences of constants {an > 0} and {bn} such that
Pr
(Mn − bn
an≤ z
)→ G (z) as n→∞
for a non-degenerate distribution function G , then G is a member of theGeneralized Extreme Value family
G (z) = exp
−[
1 + ξ
(z − µσ
)]−1
ξ
defined on {z : 1 + ξ(z − µ)/σ > 0}, where ∞ < µ <∞, σ > 0 and−∞ < ξ <∞.
Extreme Value TheoryFitting Models
Leveraging more data
Extreme Value TheoryFitting Models
Generalized Pareto Distribution
Let X1, X2, ... , Xn be a sequence of iid random variables with marginaldistribution function F .
Pr{X > u + y |X > u} =1− F (u + y)
1− F (u)y > 0.
If F satisfies Generalized Extreme Value Theorem then for a large enoughthreshold u, the distribution function of (X − u) conditional on X > u isthe GPD.
Generalized Pareto Distribution - Picklands (1975)
H(y) = 1−(
1 +ξy
σ
)−1/ξdefined on {y : y > 0} and (1 + ξy/σ > 0) where, σ = σ + ξ(u − µ).
Extreme Value TheoryFitting Models
Dependence
Rainfall observations are dependentHeavy rainfall yesterday effects the probability of heavy rain todayHeavy rainfall a year ago doesn’t
Extreme Value Theory extends to stationary series with weak longrange dependence
However, for processes with short range dependence extremes occurin clusters
Extreme Value TheoryFitting Models
Clusters
Extreme Value TheoryFitting Models
Dependent Series
Let {Xi}i≥1 be a stationary series and {X ∗i }i≥1 be an independent seriesof variables with the same marginal distribution.
Define Mn = max{X1, . . . , Xn} and M∗n = max{X ∗1 , . . . , X ∗n }. Undersuitable regularity conditions,
Pr
{(M∗n − bn)
an≤ z
}→ G (z),
as n→∞ for normalizing sequences {an > 0} and {bn}, where G is anon-degenerate distribution functions, if and only if
Pr
{(Mn − bn)
an≤ z
}→ G θ(z),
for a constant θ such that 0 < θ ≤ 1.
Extreme Value TheoryFitting Models
Extremal Index
θ = {Limiting mean cluster size}−1 ∈ (0, 1]
θ = 0.5⇒ 2 observations per cluster on average.
Extreme Value TheoryFitting Models
Fitting Models
Extreme Value TheoryFitting Models
Fitting Models
Select a threshold
Decluster the data for independent observations
Extreme Value TheoryFitting Models
Declustering
BlocksPartition the observation sequence into blocks of length, bAssume extreme observations within the same block belong to thesame same cluster.
RunsSpecify a run length, KAssume extreme observations with an inter-exceedance time of lessthan K belong to the same cluster.
Extreme Value TheoryFitting Models
Intervals
The limiting process of exceedance times is compound Poisson forstationary series (Hsing et al. 1988).
Ferro and Segers (2003) showed the limiting distribution ofinter-exceedance times is a mixture distribution with weight θ,
Tθ(t) = (1− θ)ε0 + θ · θ exp(−θt),
where ε0 is a degenerate distribution, Tθ is the distribution of arrivaltimes of exceedances at threshold u.
By equating moments a non-parametric estimator can be found for θ.
The largest θ(N − 1) inter-exceedance times can be interpreted as betweencluster arrivals.
Extreme Value TheoryFitting Models
Fitting Models
→ Select a threshold
Decluster the data for independent observations
Extreme Value TheoryFitting Models
Mean Residual Life Plots
For sufficiently high thresholds, as the threshold increases the expectedexceedance above the threshold should grow linearly.
Extreme Value TheoryFitting Models
Parameter Stability Plots
Parameter estimates of (modified) scale and shape parameters should beconstant for the range of valid thresholds.
Extreme Value TheoryFitting Models
Alternative
Set the threshold according to a high quantile of non-zero observationsEg. 90th percentile.
Is this an appropriate threshold?Is our model is misspecified?
Suggested approach by Suveges and Davison et al. (2010) is to test thethreshold, u, and run parameter, K pair for model misspecification.
Extreme Value TheoryFitting Models
Log-Likelihood
Limiting distribution of inter-exceedance times:
Tθ(t) = (1− θ)ε0 + θ2 exp(−θt),
Log-Likelihood (strictly positive inter-exceedance times):
N−1∑i=1
log((1− θ)I(ti=0)(θ2 exp(θti )
I(ti>0))
=N−1∑i=1
[2I(ti > 0) log(θ)− θti
],
where ti = NTin , n is the total number of observations and N is the
number of exceedances.
However as n gets large our estimate, θ, tends to 1 suggestingindependence.
Extreme Value TheoryFitting Models
Log-Likelihood
Adjustment of the inter-exceedance times using the run parameter K :
ci = max{ti − K , 0}
Log-likelihood:
`(θ; ci ) =N−1∑i=1
[I(ci = 0) log(1− θ) + 2I(ci > 0) log(θ)− θci
]Approach used in Fukutome et al. (2014) and Suveges and Davison
(2010).
Test combinations of threshold, u, and run parameter, K , formisspecification of the likelihood function. Select the (u,K ) pair thatmaximizes the number of independent clusters.
Extreme Value TheoryFitting Models
Model Misspecification
If a parametric model is misspecified then there is no θ such that g = f (θ),where g is the true model and f is the misspecified parametric model.
For a well specified model,the Fisher’s information matrix, I (θ) = E{`′′(θ; cj} is equal to thevariance of the score vector, J(θ) = Var{`′(θ; cj)}.
Test the hypothesis:D(θ) = J(θ)− I (θ),
where H0 : D(θ) = 0 and H1 : D(θ) 6= 0.
Extreme Value TheoryFitting Models
Empirically:
IN−1(θ) =−1
(N − 1)
N−1∑j=1
`′′(θ; cj)
JN−1(θ) =1
(N − 1)
N−1∑j=1
`′(θ; cj)2
DN−1(θ) = JN−1(θ)− IN−1(θ)
VN−1(θ) =1
(N − 1)
N−1∑j=1
[(dj(θ; cj)− D
′N−1(θ)IN−1(θ)−1`
′(θ; cj)
)2]
where VN−1(θ) is the sample variance of DN−1(θ).
Extreme Value TheoryFitting Models
Model Misspecification
Theorem: (Information Matrix Test - Whyte 1982) If the assumedmodel `(θ; ci ) contains the true model for some θ = θ0, then as n→∞,
(i)√
(N − 1)DN−1(θ)w−→ N(0,V (θ0)),
(ii) VN−1( ˆθN−1)a.s.−−→ V (θ0), and VN−1(θ) is non-singular for sufficiently
large N,
(iii) Then the Information Matrix Test statistic,(N − 1)DN−1(θ)′VN−1(θ)−1DN−1(θ) is asymptotically χ2
1 distributed.
Extreme Value TheoryFitting Models
Example: AR(2)
Yi = 0.95Yi−1 − 0.89Yi−2 + Zi where Zi ∼ GP(1, 1/2) and n = 8000.100 simulations
Extreme Value TheoryFitting Models
Adjusting inter-exceedance times
Common to assume stationarity by enforcing seasonal blocking.
Collapse inter-exceedance times across seasonal blocks using thememoryless property of the exponential for fitting.
Extreme Value TheoryFitting Models
Results: Gatton, South East Queensland
Extreme Value TheoryFitting Models
Results: Oenpelli, Northern Territory
Extreme Value TheoryFitting Models
Summary
Shown how to check if the threshold and run parameter selectedviolate the assumptions of the model
Given confidence to threshold selection in the absence of a hard andfast rule and in the presence of subjectivity
Extreme Value TheoryFitting Models
References
Ferro, C. and Segers, J. (2003). Inference for clusters of extreme values. Journalof the Royal Statistical Society: Series B (Statistical Methodology), 65(2),pp.545-556.
Fukutome, S., Liniger, M. and Sveges, M. (2014). Automatic threshold and runparameter selection: a climatology for extreme hourly precipitation inSwitzerland. Theoretical and Applied Climatology.
Hsing, T., Husler, J. and Leadbetter, M. (1988). On the exceedance pointprocess for a stationary sequence. Probability Theory and Related Fields, 78(1),pp.97-112.
Suveges, M. and Davison, A. (2010). Model misspecification in peaks overthreshold analysis. The Annals of Applied Statistics, 4(1), pp.203-221.
White, H. (1982). Maximum Likelihood Estimation of Misspecified Models.
Econometrica, 50(1), p.1.
Extreme Value TheoryFitting Models
ANZAPW 2015: Barossa Valley, South Australia
This work has been supported by the ARC through the LaureateFellowship FL130100039.
Questions?
Extreme Value TheoryFitting Models
Results: Kalamia, Far North Queensland
Extreme Value TheoryFitting Models
Results: Yamba, New South Wales