458
Quantitative Risk Management: Concepts, Techniques and Tools Paul Embrechts ETH Zurich udiger Frey University of Leipzig Alexander McNeil ETH Zurich 19th International Summer School of Swiss Association of Actuaries 10–14 July 2006, University of Lausanne http://www.pupress.princeton/edu/titles/8056.html http://www.math.ethz.ch/mcneil/book/ [email protected] [email protected] [email protected] c 2006 (Embrechts, Frey, McNeil)

Quantitative Risk Management. Concepts, Techniques and Tools

Embed Size (px)

Citation preview

Quantitative Risk Management: Concepts, Techniques and ToolsPaul Embrechts ETH Zurich Rdiger Frey u University of Leipzig Alexander McNeil ETH Zurich

19th International Summer School of Swiss Association of Actuaries 1014 July 2006, University of Lausanne http://www.pupress.princeton/edu/titles/8056.html http://www.math.ethz.ch/mcneil/book/ [email protected] [email protected] [email protected] 2006 (Embrechts, Frey, McNeil)

Q U A N T I TAT I V E

RISK

MANAGEMENTConcepts Techniques Tools

Alexander J. McNeil Rdiger Frey Paul EmbrechtsP R I N C E T O N S E R I E S I N F I N A N C E

c 2006 (Embrechts, Frey, McNeil)

1

OverviewI. Introduction to QRM and Multivariate Risk Models II. Modelling Extreme Risks III. Operational Risk IV. Credit Risk Management V. Dynamic Credit Models and Credit Derivatives

c 2006 (Embrechts, Frey, McNeil)

2

I: Introduction to QRM and Multivariate Risk ModelsA. QRM: The Nature of the Challenge B. Multivariate Risk Factor Models C. Copulas

c 2006 (Embrechts, Frey, McNeil)

3

A. The Nature of the Challenge1. Financial Risk in Perspective 2. QRM: the Nature of the Challenge 3. Loss Distributions 4. Risk Measures Based on Loss Distributions

c 2006 (Embrechts, Frey, McNeil)

4

A1. Financial Risk in PerspectiveWhat is Risk? hazard, a chance of bad consequences, loss or exposure to mischance [OED] any event or action that may adversely aect an organizations ability to achieve its objectives and execute its strategies the quantiable likelihood of loss or less-than-expected returns

c 2006 (Embrechts, Frey, McNeil)

5

Risk and RandomnessRisk relates to uncertainty and hence to the notion of randomness. Arguably randomness had eluded a clear, workable denition for centuries, until Komogorov oered an axiomatic denition of randomness and probability in 1933. We assume that readers/participants are familiar with basic notation, terminology and results from elementary probability and statistics, the branch of mathematics dealing with stochastic models and their application to the real world. The word stochastic is derived from the Greek Stochazesthai, the art of guessing, or Stochastikos, meaning skilled at aiming, stochos being a target.

c 2006 (Embrechts, Frey, McNeil)

6

Financial RiskWe are primarily concerned with the main categories of nancial risk: market risk the risk of a change in the value of a nancial position due to changes in the value of the underlying components on which that position depends, such as stock and bond prices, exchange rates, commodity prices, etc. credit risk the risk of not receiving promised repayments on outstanding investments such as loans and bonds, because of the default of the borrower. operational risk the risk of losses resulting from inadequate or failed internal processes, people and systems, or external events.c 2006 (Embrechts, Frey, McNeil) 7

Insurance RiskThe insurance industry also has a longstanding relationship with risk. The Institute and Faculty of Actuaries use the following denition of the actuarial profession: Actuaries are respected professionals whose innovative approach to making business successful is matched by a responsibility to the public interest. Actuaries identify solutions to nancial problems. They manage assets and liabilities by analysing past events, assessing the present risk involved and modelling what could happen in the future. An additional risk category entering through insurance is underwriting risk: the risk inherent in insurance policies sold.c 2006 (Embrechts, Frey, McNeil) 8

The Road to BaselRisk management: one of the most important innovations of the 20th century. [Steinherr, 1998] The late 20th century saw a revolution on nancial markets. It was an era of innovation in academic theory, product development (derivatives) and information technology and of spectacular market growth. Large derivatives losses and other nancial incidents raised banks consciousness of risk. Banks became subject to regulatory capital requirements, internationally coordinated by the Basle Committee of the Bank of International Settlements.c 2006 (Embrechts, Frey, McNeil) 9

Some Dates 1950s. Foundations of modern risk analysis are laid by work of Markowitz and others on portfolio theory. 1970. Oil crises and abolition of Bretton-Woods turn energy prices and exchange rates into volatile risk factors. 1973. CBOE, Chicago Board Options Exchange starts operating. Fisher Black and Myron Scholes, publish an article on the rational pricing of options. [Black and Scholes, 1973] 1980s. Deregulation; globalization - mergers on unprecedented scale; advances in IT.

c 2006 (Embrechts, Frey, McNeil)

10

Growth of MarketsExample 1 Average daily trading volume at New York stock exchange: 1970: 3.5 million shares 1990: 40 million shares Example 2: Global market in OTC derivatives (nominal value). 1995 1998 FOREX contracts $13 trillion $18 trillion Interest rate contracts $26 trillion $50 trillion All types $47 trillion $80 trillion Source BIS; see [Crouhy et al., 2001]. $1 trillion = $1 1012.

c 2006 (Embrechts, Frey, McNeil)

11

Disasters of the 1990sThe period 1993-1996 saw some spectacular derivatives-based losses: Orange County (1.7 billion US$) Metallgesellschaft (1.3 billion US$) Barings (1 billion US$) Although, to be fair, classical banking produced its own large losses.e.g. 50 billion CHF of bad loans written o by the Big Three in early nineties.

c 2006 (Embrechts, Frey, McNeil)

12

The Regulatory Process 1988. First Basel Accord takes rst steps toward international minimum capital standard. Approach fairly crude and insuciently dierentiated. 1993. The birth of VaR. Seminal G-30 report addressing for rst time o-balance-sheet products (derivatives) in systematic way. At same time JPMorgan introduces the Weatherstone 4.15 daily market risk report, leading to emergence of RiskMetrics. 1996. Amendment to Basel I allowing internal VaR models for market risk in larger banks. 2001 onwards. Second Basel Accord, focussing on credit risk but also putting operational risk on agenda. Banks may opt for a more advanced, so-called internal-ratings-based approach to credit.c 2006 (Embrechts, Frey, McNeil) 13

Basel II: What is New? Rationale for the New Accord: More exibility and risk sensitivity Structure of the New Accord: Three-pillar framework: Pillar 1: minimal capital requirements (risk measurement) Pillar 2: supervisory review of capital adequacy Pillar 3: public disclosure

c 2006 (Embrechts, Frey, McNeil)

14

Basel II Continued Two options for the measurement of credit risk: Standard approach Internal rating based approach (IRB) Pillar 1 sets out the minimum capital requirements (Cooke Ratio): total amount of capital 8% risk-weighted assets MRC (minimum regulatory capital) Explicit treatment of operational riskc 2006 (Embrechts, Frey, McNeil) 15

def

=

8% of risk-weighted assets

A2. QRM: the Nature of the ChallengeIn our book we have tried to contribute to the establishment of the new discipline of QRM. This has two main strands: Fixing the Foundations putting current practice onto a rmer mathematical footing where, for example, concepts like prot and loss distributions, risk factors, risk measures, capital allocation and risk aggregation are given formal denitions. In doing this we have been guided by the consideration of what topics should form the core of a course on QRM for a wide audience of students. Going Beyond Current Practice gathering material on techniques and tools which go beyond current practice and address some of the deciencies that have been raised repeatedly by critics.

c 2006 (Embrechts, Frey, McNeil)

16

Extremes MatterFrom the point of view of the risk manager, inappropriate use of the normal distribution can lead to an understatement of risk, which must be balanced against the signicant advantage of simplication. From the central banks corner, the consequences are even more serious because we often need to concentrate on the left tail of the distribution in formulating lender-of-last-resort policies. Improving the characterization of the distribution of extreme values is of paramount importance. [Alan Greenspan, Joint Central Bank Research Conference, 1995]

c 2006 (Embrechts, Frey, McNeil)

17

Extremes Matter II: LTCMWith globalisation increasing, youll see more crises. Our whole focus is on the extremes now - whats the worst that can happen to you in any situation - because we never want to go through that [LTCM] again. [John Meriwether, The Wall Street Journal, 21st August 2000] Much space is devoted in our book to models for nancial risk factors that go beyond the normal (or Gaussian) model and attempt to capture the related phenomena of heavy tails, volatility and extreme values.

c 2006 (Embrechts, Frey, McNeil)

18

The Interdependence and Concentration of RisksThe multivariate nature of risk presents an important challenge. Whether we look at market risk or credit risk, or overall enterprise-wide risk, we are generally interested in some form of aggregate risk that depends on high-dimensional vectors of underlying risk factors such as individual asset values in market risk, or credit spreads and counterparty default indicators in credit risk. A particular concern in our multivariate modelling is the phenomenon of dependence between extreme outcomes, when many risk factors move against us simultaneously.

c 2006 (Embrechts, Frey, McNeil)

19

Dependent Extreme Values: LTCMExtreme, synchronized rises and falls in nancial markets occur infrequently but they do occur. The problem with the models is that they did not assign a high enough chance of occurrence to the scenario in which many things go wrong at the same timethe perfect storm scenario. [Business Week, September 1998] In a perfect storm scenario the risk manager discovers that the diversication he thought he had is illusory; practitioners describe this also as a concentration of risk.

c 2006 (Embrechts, Frey, McNeil)

20

Concentration RiskOver the last number of years, regulators have encouraged nancial entities to use portfolio theory to produce dynamic measures of risk. VaR, the product of portfolio theory, is used for short-run, day-to-day prot and loss exposures. Now is the time to encourage the BIS and other regulatory bodies to support studies on stress test and concentration methodologies. Planning for crises is more important than VaR analysis. And such new methodologies are the correct response to recent crises in the nancial industry. [Scholes, 2000]

c 2006 (Embrechts, Frey, McNeil)

21

The Problem of ScaleA further challenge in QRM is the typical scale of our portfolios, which at their most general may represent the entire position in risky assets of a nancialinstitution. Calibration of detailed multivariate models for all risk factors is an almost impossible task and hence any sensible strategy involves dimension reduction, that is to say the identication of key risk drivers and a concentration on modelling the main features of the overall risk landscape with a fairly broad brush approach. This applies both to market risk and credit risk models. In the latter, factor models for default dependence are at least as important as detailed models of individual default.

c 2006 (Embrechts, Frey, McNeil)

22

InterdisciplinarityThe quantitative risk manager of the future should have a combined skillset that includes concepts, techniques and tools from many elds: mathematical nance; statistics and nancial econometrics; actuarial mathematics; non-quantitative skills, especially communication skills; humility QRM is a small piece of a bigger picture!

c 2006 (Embrechts, Frey, McNeil)

23

A3. Loss DistributionsTo model risk we use language of probability theory. Risks are represented by random variables mapping unforeseen future states of the world into values representing prots and losses. The risks which interest us are aggregate risks. In general we consider a portfolio which might be a collection of stocks and bonds; a book of derivatives; a collection of risky loans; a nancial institutions overall position in risky assets.c 2006 (Embrechts, Frey, McNeil) 24

Portfolio Values and LossesConsider a portfolio and let Vt denote its value at time t; we assume this random variable is observable at time t. Suppose we look at risk from perspective of time t and we consider the time period [t, t + 1]. The value Vt+1 at the end of the time period is unknown to us. The distribution of (Vt+1 Vt) is known as the prot-and-loss or P&L distribution. We denote the loss by Lt+1 = (Vt+1 Vt). By this convention, losses will be positive numbers and prots negative. We refer to the distribution of Lt+1 as the loss distribution.

c 2006 (Embrechts, Frey, McNeil)

25

Risk FactorsGenerally the loss Lt+1 for the period [t, t + 1] will depend on changes in a number of fundamental risk factors in the time period, such as stock prices and index values, yields and exchange rates. Writing Xt+1 for the vector of changes in underlying risk factors, the loss will be given by a formula of the form Lt+1 = l[t](Xt+1). where l[t] : Rd R is a known function which we call the loss operator. The book contains examples showing how the loss operator is derived for dierent kinds of portfolio. This is a process known as mapping.c 2006 (Embrechts, Frey, McNeil) 26

Loss DistributionThe loss distribution is the distribution of Lt+1 = l[t](Xt+1)? But which distribution exactly? The Conditional distribution of Lt+1 given given Ft = ({Xs : s t}), the history up to and including time t? The unconditional distribution under assumption that (Xt) form stationary time series? Conditional problem forces us to model the dynamics of the risk factors and is most suitable for market risk. Unconditional approach is used for longer time intervals and is also typical in credit portfolio management.c 2006 (Embrechts, Frey, McNeil) 27

A4. Risk MeasuresRisk measures attempt to quantify the riskiness of a portfolio. The most popular risk measures like VaR describe the right tail of the loss distribution of Lt+1 (or the left tail of the P&L). To address this question we put aside the question of whether to look at conditional or unconditional loss distribution and assume that this has been decided. Denote the distribution function of the loss L := Lt+1 by FL so that P (L x) = FL(x).

c 2006 (Embrechts, Frey, McNeil)

28

VaR and Expected ShortfallLet 0 < < 1. We use Value at Risk is dened as VaR = q(FL) = FL () ,

(1)

where we use the notation q(FL) or q(L) for a quantile of the distribution of L and FL for the (generalized) inverse of FL. Provided E(|L|) < expected shortfall is dened as 1 ES = 11

qu(FL)du.

(2)

c 2006 (Embrechts, Frey, McNeil)

29

VaR in Visual TermsLoss Distribution0.25Mean loss = -2.4 95% VaR = 1.6

95% ES = 3.3

0.05

probability density 0.10 0.15

0.20

5% probability

0.0-10

-5

0

5

10

c 2006 (Embrechts, Frey, McNeil)

30

Losses and ProtsProfit & Loss Distribution (P&L)0.2595% VaR = 1.6

Mean profit = 2.4

0.05

probability density 0.10 0.15

0.20

5% probability

0.0-10

c 2006 (Embrechts, Frey, McNeil)

-5

0

5

10

31

Expected ShortfallFor continuous loss distributions expected shortfall is the expected loss, given that the VaR is exceeded. For any (0, 1) we have E(L; L q(L)) ES = = E(L | L VaR) , 1 where we have used the notation E(X; A) := E(XIA) for a generic integrable rv X and a generic set A F. For a discontinuous loss df we have the more complicated expression 1 ES = E(L; L q) + q(1 P (L q)) . 1 [Acerbi and Tasche, 2002].c 2006 (Embrechts, Frey, McNeil) 32

Coherent Measures of RiskThere are many possible measures of the risk in a portfolio such as VaR, ES or stress losses. To decide which are reasonable risk measures a systematic approach is called for. New approach of Artzner et.al.(1999): Give a list of properties (Axioms) that a reasonable risk measure should have; such risk measures are called coherent. Study coherence of standard risk measures (VaR, ES, stress losses etc.). On a more theoretical level: characterize all coherent risk measures. Goal: Look at practically relevant aspects of this approach.c 2006 (Embrechts, Frey, McNeil) 33

Purposes of Risk MeasurementRisk measures are used for the following purposes: Determination of risk capital. Risk measure gives amount of capital needed as a buer against (unexpected) future losses to satisfy a regulator. Management tool. Risk measures are used in internal limit systems. Insurance premia can be viewed as measure of riskiness of insured claims. Our interpretation. Risk measure gives amount of capital that needs to be added to a position with loss L, so that the position becomes acceptable to an (internal/external) regulator.c 2006 (Embrechts, Frey, McNeil) 34

The AxiomsA coherent risk measure is a realvalued function on some space of rvs (representing losses) that fullls the following 4 axioms: 1. Monotonicity. For two rvs with L1 L2 we have (L1) (L2). 2. Subadditivity. For any L1, L2 we have (L1 + L2) (L1) + (L2). This is the most debated property. Necessary for following reasons: Reects idea that risk can be reduced by diversication and that a merger creates no extra risk. Makes decentralized risk management possible. If a regulator uses a non-subadditive risk measure, a nancial institution could reduce risk capital by splitting into subsidiaries.c 2006 (Embrechts, Frey, McNeil) 35

The Axioms II3. Positive homogeneity. For 0 we have that (L) = (L). If there is no diversication we should have equality in subadditivity axiom. 4. Translation invariance. For any a R we have that (L + a) = (L) + a. Remarks: VaR is in general not coherent. coherent. ES (as we have dened it) is

Non-subadditivity of VaR is relevant in presence of skewed loss distributions (credit-risk management, derivative books), or if traders optimize against VaR.c 2006 (Embrechts, Frey, McNeil) 36

B. Multivariate Risk Factor Models1. Motivation: Multivariate Risk Factor Data 2. Basics of Multivariate modelling 3. The Multivariate Normal Distribution 4. Normal Mixture Distributions 5. Generalized Hyperbolic Distributions 6. Dimension Reduction and Factor Models

c 2006 (Embrechts, Frey, McNeil)

37

B1. Motivation: Multivariate Risk Factor DataAssume we have data on risk factor changes X1, . . . , Xn. These might be daily (log) returns in context of market risk or longer interval returns in credit risk (e.g. monthly/yearly asset value returns). What are appropriate multivariate models? Distributional Models. In unconditional approach to risk modelling we require appropriate multivariate distributions, which are calibrated under assumption data come from stationary time series. Dynamic Models. In conditional approach we use multivariate time series models that allow us to make risk forecasts. This module concerns the rst issue. A motivating example shows the kind of data features that particularly interest us.c 2006 (Embrechts, Frey, McNeil) 38

Bivariate Daily Return DataBMW-0.05 0.0 0.05 0.10

0.10

Time

-0.05 0.0 0.05 0.10

-0.05

SIEMENS

SIEMENS 0.0

23.01.85

23.01.86 23.01.87

23.01.88

23.01.89 23.01.90

23.01.91

23.01.92

-0.10

-0.15

0.05

-0.15

-0.15

23.01.85

23.01.86 23.01.87

23.01.88

23.01.89 23.01.90 Time

23.01.91

23.01.92

-0.15

-0.10

-0.05

0.0 BMW

0.05

0.10

BMW and Siemens: 2000 daily (log) returns 1985-1993.c 2006 (Embrechts, Frey, McNeil) 39

Three Extreme DaysBMW-0.05 0.0 0.05 0.10

0.10

-0.15

123.01.85 23.01.86 23.01.87 23.01.88 Time

223.01.89 23.01.90 23.01.91

323.01.92

-0.05 0.0 0.05 0.10

-0.05

SIEMENS

-0.10

2

3

1

0.05

-0.15

-0.15

123.01.85 23.01.86 23.01.87 23.01.88 Time

223.01.89 23.01.90 23.01.91

323.01.92

SIEMENS 0.0

-0.15

-0.10

-0.05

0.0 BMW

0.05

0.10

Those extreme days: 19.10.1987, 16.10.1989, 19.08.1991c 2006 (Embrechts, Frey, McNeil) 40

History

New York, 19th October 1987

Berlin Wall

16thOctober 1989

The Kremlin, 19th August 1991c 2006 (Embrechts, Frey, McNeil) 41

B2. Basics of Multivariate ModellingA d-dimensional random vector of risk-factor changes X = (X1, . . . , Xd) has joint df F (x) = F (x1, . . . , xd) = P (X1 x1, . . . , Xd xd). The marginal dfs Fi of the individual risks are given by Fi(xi) = P (Xi xi) = F (, . . . , , xi, , . . . , ). In some cases we work instead with joint survival functions F (x) = F (x1, . . . , xd) = P (X1 > x1, . . . , Xd > xd), and marginal survival functions Fi(xi) = P (Xi > xi) = F (, . . . , , xi, , . . . , ).c 2006 (Embrechts, Frey, McNeil) 42

Basics IIDensities. Joint densities f (x) = f (x1, . . . , xd), when they exist, are related to joint dfs byx1 xd

F (x1, . . . , xd) =

f (u1, . . . , ud)du1 . . . dud.

Independence. The components of X are said to be mutually independent if and only ifd

F (x) =i=1

Fi(xi),

x Rd,

or, when X possesses a joint density, if and only ifd

f (x) =i=1c 2006 (Embrechts, Frey, McNeil)

fi(xi),

x Rd.43

MomentsThe mean vector of X is E(X) = (E(X1), . . . , E(Xd)) and the covariance matrix is cov(X) = E ((X E(X))(X E(X)) ) . (assuming niteness of moments in both cases). Writing for cov(X), the (i, j)th element of this matrix is ij = cov(Xi, Xj ) = E(XiXj ) E(Xi)E(Xj ). The correlation matrix of X is the matrix P with (i, j)th element ij , ij = iijj the ordinary pairwise linear correlation of Xi and Xj . Writing = diag( 11, . . . , dd) we have P = 11.c 2006 (Embrechts, Frey, McNeil) 44

Moments IIMean vectors and covariance matrices are extremely easily manipulated under linear operations on the vector X. For any matrix B Rkd and vector b Rk we have E(BX + b) = BE(X) + b, cov(BX + b) = B cov(X)B . Covariance matrices must be positive semi-denite; writing for cov(X) we see that (3) implies that var(a X) = a a 0 for any a Rd . (3)

c 2006 (Embrechts, Frey, McNeil)

45

Estimators of Covariance and CorrelationAssumptions. We have data X1, . . . , Xn which are either iid or at least serially uncorrelated from a distribution with mean vector , nite covariance matrix and correlation matrix P . Standard method-of-moments estimators of and are the sample mean vector X and the sample covariance matrix S dened by 1 X= Xi , n i=1n

1 S= (Xi X)(Xi X) . n i=1

n

The sample correlation matrix R has (i, j)th element given by rij = sij / siisjj . Writing D = diag( s11, . . . , sdd) we have R = D1SD1.c 2006 (Embrechts, Frey, McNeil) 46

Properties of the Estimators?Further properties of the estimators X, S and R depend on the true multivariate distribution of observations. They are not necessarily the best estimators of , and P in all situations, a point that is often forgotten in nancial risk management where they are routinely used. If our data are iid multivariate normal Nd(, ) then X and S are the maximum likelihood estimators (MLEs) of the mean vector and covariance matrix . Their behaviour as estimators is well understood and statistical inference concerning the model parameters is relatively unproblematic. However, certainly at short time intervals such as daily data, the multivariate normal is not a good description of nancial risk factor returns and other estimators of and may be better.c 2006 (Embrechts, Frey, McNeil) 47

B3. Multivariate Normal (Gaussian) DistributionThis distribution has joint density (x ) 1(x ) f (x) = (2)d/2||1/2 exp , 2 where Rd and Rdd is a positive denite matrix. If X has density f then E (X) = and cov (X) = , so that and are the mean vector and covariance matrix respectively. A standard notation is X Nd(, ). Clearly, the components of X are mutually independent if and only if is diagonal. For example, X Nd(0, I) if and only if X1, . . . , Xd are iid N (0, 1).c 2006 (Embrechts, Frey, McNeil) 48

Bivariate Standard Normals4 4-4 -2

2

0

-2

-4

0

In left plots = 0.9; in right plots = 0.7.c 2006 (Embrechts, Frey, McNeil) 49

-4 4

-4

-4

-2

-2

-2

0 Y

2 0 X

0 Y -2

0 X

4

2

2

4

Z 0.25 0 0.05 0.1 0.15 0.2

Z 0.1 0.2 0.3 0.4

4 24

x

x

0

2

4

-4-4

-2

0

y

y

2

-2

0

2

4

Properties of Multivariate Normal Distribution The marginal distributions are univariate normal. Linear combinations a X = a1X1 + adXd are univariate normal with distribution a X N (a , a a). Conditional distributions are multivariate normal. The sum of squares (X ) 1(X ) 2 (chi-squared). d Simulation. 1. Perform a Cholesky decomposition = AA 2. Simulate iid standard normal variates Z = (Z1, . . . , Zd) 3. Set X = + AZ.c 2006 (Embrechts, Frey, McNeil) 50

Testing for Multivariate NormalityIf data are multivariate normal then margins must be univariate normal. This can be assessed graphically with QQplots or tested formally with tests like Jarque-Bera or Anderson-Darling. However, normality of the margins is not sucient we must test joint normality. There are numerical tests of multivariate normality (see book) or one can perform a graphical test by calculating Xi X S 1 Xi X , i = 1, . . . , n . These should form (approximately) a sample from a 2 distribution, d and this can be assessed with a QQplot. (QQplots compare empirical quantiles with theoretical quantiles of reference distribution.)c 2006 (Embrechts, Frey, McNeil) 51

Deciencies of Multivariate Normal for Risk Factors Tails of univariate margins are very thin and generate too few extreme values. Simultaneous large values in several margins relatively infrequent. Model cannot capture phenomenon of joint extreme moves in several risk factors. Very strong symmetry (known as elliptical symmetry). suggests more skewness may often be present. Reality

c 2006 (Embrechts, Frey, McNeil)

52

B4. Normal Mixture DistributionsLet Z Nd(0, Id) be a vector of iid standard normal variates and W be an independent, positive, scalar random variable. Let Rd and A Rdd be a vector an matrix of constants respectively. The vector X given by X=+ W AZ (4) is said to have a multivariate normal variance mixture distribution. Easy calculations give E(X) = and cov(X) = E(W ) where := AA . Note that X | W = w Nd(, w) The rv W can be thought of as a common shock impacting the variances of all components. The distribution of X is a mixture of normals but is not itself have a normal distribution.c 2006 (Embrechts, Frey, McNeil) 53

Examples of Normal Variance Mixtures2 point mixture W = k1 with probability p, k2 with probability 1 p k1 > 0, k2 > 0, k1 = k2.

Could be used to model two regimes - ordinary and extreme. Multivariate t W has an inverse gamma distribution, W Ig(/2, /2). This gives multivariate t with degrees of freedom. Equivalently /W 2 . Symmetric generalised hyperbolic W has a GIG (generalised inverse Gaussian) distribution.c 2006 (Embrechts, Frey, McNeil) 54

The Multivariate t DistributionThis has density f (x) = k,,d (x ) (x ) 1+ 1 (+d) 2

where Rd, Rdd is a positive denite matrix, is the degrees of freedom and k,,d is a normalizing constant. If X has density f then E (X) = and cov (X) = 2 , so that and are the mean vector and dispersion matrix respectively. For nite variances/correlations > 2. Notation: X td(, , ).

If is diagonal the components of X are uncorrelated. They are not independent. The multivariate t distribution has heavy tails.c 2006 (Embrechts, Frey, McNeil) 55

Bivariate Normal and t4 4-4 -2

2

0

-2

-4

Z 0.25 0 0.05 0.1 0.15 0.2

4

= 0.7, = 3, variances all equal 1.c 2006 (Embrechts, Frey, McNeil) 56

-4

-4

-4

-4

-2

-2

-2

0 X

0 Y -2

0 X

0 Y

2

2

2

0

Z 0.1 0.2 0.3 0.4

4

4

2

x

x

0

2

4

-4-4

-2

0

y

y

2

-2

0

2

4

4

Fitted Normal and t3 Distributions0.10 0.10

-0.05

-0.10

-0.10

-0.05

0.05

SIEMENS 0.0

-0.15

-0.15

-0.10

-0.05

0.0 BMW

0.05

0.10

-0.15

SIEMENS 0.0

0.05

-0.15

-0.10

-0.05

0.0 BMW

0.05

0.10

Simulated data (2000) from models tted by maximum likelihood to BMW-Siemens data.c 2006 (Embrechts, Frey, McNeil) 57

Multivariate Normal Mean-Variance MixturesNormal variance mixtures are elliptically symmetric distributions. We can introduce asymmetry and skewness by considering the following more general mixture construction: X = + W + W AZ, (5) where Rd is a vector of asymmetry parameters and all other terms are as in (4). If = 0 then we are back in the elliptical variance mixture family. Main example. When W has a GIG distribution we get generalized hyperbolic family.

c 2006 (Embrechts, Frey, McNeil)

58

Moments of Mean-Variance MixturesSince X | W Nd( + W , W ) it follows that E(X) = E (E(X | W )) = + E(W ), cov(X) = E (cov(X | W )) + cov (E(X | W )) = E(W ) + var(W ) , (7) (6)

provided W has nite variance. We observe from (6) and (7) that the parameters and are not in general the mean vector and covariance matrix of X. Note that a nite covariance matrix requires var(W ) < whereas the variance mixtures only require E(W ) < .

c 2006 (Embrechts, Frey, McNeil)

59

Generalised Inverse Gaussian (GIG) DistributionThe random variable X has a generalised inverse Gaussian (GIG), written X N (, , ), if its density is 1 ( ) 1 x exp x1 + x f (x) = 2 2K( )

,

x > 0,

where K denotes a modied Bessel function of the third kind with index and the parameters satisfy > 0, 0 if < 0; > 0, > 0 if = 0 and 0, > 0 if > 0. For more on this Bessel function see [Abramowitz and Stegun, 1965]. The GIG density actually contains the gamma and inverse gamma densities as special limiting cases, corresponding to = 0 and = 0 respectively. Thus, when = 0 and = 0 the mixture distribution in (5) is multivariate t.c 2006 (Embrechts, Frey, McNeil) 60

Samping from Normal Mixture DistributionsIt is straightforward to simulate normal mixtures. 1. Generate Z Nd(0, Id). 2. Generate W independently. 3. Set X = + W + W AZ. Example: t distribution (and skewed t) We require W Ig(/2, /2). Alternatively generate V 2 and set W = /V . Example: generalized hyperbolic distribution To sample from GIG distribution we can use an algorithm in [Atkinson, 1982]; see also work of [Eberlein et al., 1998].c 2006 (Embrechts, Frey, McNeil) 61

B5. Generalized Hyperbolic DistributionsThe generalized hyperbolic density f (x) K d2

( + Q(x; , ))( + 1) exp (x ) 1 ( + Q(x; , ))( + 1)d 2

.

where Q(x; , ) = (x ) 1(x ) and the normalising constant is d 1 ( ) ( + ) 2 c= . d 1 (2) 2 || 2 K

c 2006 (Embrechts, Frey, McNeil)

62

Generalized Hyperbolic Distribution IINotation X GHd(, , , , , ). Closure under linear operations If X GHd(, , , , , ) and we consider Y = BX + b where B Rkd and b Rk then Y GHk (, , , B + b, BB , B). This means of course that marginal distributions are univariate generalized hyperbolic. A version of the variance-covariance method may be based on this family.

c 2006 (Embrechts, Frey, McNeil)

63

Special Cases If = 1 we get a multivariate distribution whose univariate margins are one-dimensional hyperbolic distributions, a model widely used in univariate analyses of nancial return data. If = 1/2 then the distribution is known as a normal inverse Gaussian (NIG) distribution. This model has also been used in univariate analyses of return data; its functional form is similar to the hyperbolic with a slightly heavier tail. If > 0 and = 0 we get a limiting case of the distribution known variously as a generalised Laplace, Bessel function or variance gamma distribution. If = /2, = and = 0 we get an asymmetric or skewed t distribution.c 2006 (Embrechts, Frey, McNeil) 64

Empirical Experience with GH FamilyThe normal mixture structure of this family makes it possible to t it with the EM algorithm [McNeil et al., 2005]. Our experience shows that skewed t and NIG models are useful special cases; often the elliptical special cases ( = 0) cannot be rejected.Daily p-value =0 p-value Weekly p-value =0 p-value GH 17306.44 17303.10 0.15 2890.65 2887.52 0.18 GBP, NIG Hyp t VG 17306.43 17305.61 17304.97 17302.5 0.85 0.20 0.09 0.00 17303.06 17302.15 17301.85 17299.15 0.24 0.13 0.10 0.01 2889.90 2889.65 2890.65 2888.98 0.22 0.16 1.00 0.07 2886.74 2886.48 2887.52 2885.86 0.17 0.14 0.28 0.09 Euro, Yen, CHF against USD: returns 00-04 Gauss

17144.38 0.00

2872.36 0.00

c 2006 (Embrechts, Frey, McNeil)

65

Elliptical distributionsA random vector (X1, . . . , Xd) is spherical if its distribution is invariant under rotations, i.e. for all U Rdd with U U = U U = Id U X = X. A random vector (X1, . . . , Xd) is called elliptical if it is an ane transform of a spherical random vector (Y1, . . . , Yk ), X = AY + b, A Rdk , b Rd. A normal variance mixture in (4) with = 0 and = I is spherical; any normal variance mixture is elliptical.c 2006 (Embrechts, Frey, McNeil) 66

d

References [Barndor-Nielsen and Shephard, 1998] distributions) (generalized hyperbolic

[Barndor-Nielsen, 1997] (NIG distribution) [Eberlein and Keller, 1995] ) (hyperbolic distributions) [Prause, 1999] (GH distributions - PhD thesis) [Fang et al., 1990] (elliptical distributions)

c 2006 (Embrechts, Frey, McNeil)

67

B6. Dimension Reduction and Factor ModelsIdea: Explain the variability in a d-dimensional vector X in terms of a smaller set of common factors. Denition: X follows a p-factor model if X = a + BF + , (8)

(i) F = (F1, . . . , Fp) is random vector of factors with p < d, (ii) = (1, . . . , d) is random vector of idiosyncratic error terms, which are uncorrelated and mean zero, (iii) B Rdp is a matrix of constant factor loadings and a Rd a vector of constants, (iv) cov(F, ) = E((F E(F)) ) = 0.

c 2006 (Embrechts, Frey, McNeil)

68

Remarks on Theory of Factor Models Factor model (8) implies that covariance matrix = cov(X) satises = BB + , where = cov(F) and = cov() (diagonal matrix). Factors can always be transformed so that they are orthogonal: = BB + . (9)

Conversely, if (9) holds for covariance matrix of random vector X, then X follows factor model (8) for some a, F and . If, moreover, X is Gaussian then F and may be taken to be independent Gaussian vectors, so that has independent components.c 2006 (Embrechts, Frey, McNeil) 69

Factor Models in PracticeWe have multivariate nancial return data X1, . . . , Xn which are assumed to follow (8). Two situations to be distinguished: 1. Appropriate factor data F1, . . . , Fn are also observed, for example returns on relevant indices. We have a multivariate regression problem; parameters (a and B) can be estimated by multivariate least squares. 2. Factor data are not directly observed. We assume data X1, . . . , Xn identically distributed and calibrate factor model by one of two strategies: statistical factor analysis - we rst estimate B and from (9) and use these to reconstruct F1, . . . , Fn; principal components - we fabricate F1, . . . , Fn by PCA and estimate B and a by regression.c 2006 (Embrechts, Frey, McNeil) 70

C. Copulas1. Basic Copula Primer 2. Copula-Based Dependence Measures 3. Normal Mixture Copulas 4. Archimedean Copulas 5. Fitting Copulas to Data

c 2006 (Embrechts, Frey, McNeil)

71

C1. Basic Copula Primer Copulas help in the understanding of dependence at a deeper level; They show us potential pitfalls of approaches to dependence that focus only on correlation; They allow us to dene useful alternative dependence measures; They express dependence on a quantile scale, which is natural in QRM; They facilitate a bottom-up approach to multivariate model building; They are easily simulated and thus lend themselves to Monte Carlo risk studies.c 2006 (Embrechts, Frey, McNeil) 72

What is a Copula?A copula is a multivariate distribution function with standard uniform margins. Equivalently, a copula if any function C : [0, 1]d [0, 1] satisfying the following properties: 1. C(u1, . . . , ud) is increasing in each component ui. 2. C(1, . . . , 1, ui, 1, . . . , 1) = ui for all i {1, . . . , d}, ui [0, 1]. 3. For all (a1, . . . , ad), (b1, . . . , bd) [0, 1]d with ai bi we have:2 2

i1=1 id=1

(1)i1++id C(u1i1 , . . . , udid ) 0,

where uj1 = aj and uj2 = bj for all j {1, . . . , d}.c 2006 (Embrechts, Frey, McNeil) 73

Probability and Quantile TransformsLemma 1: probability transform Let X be a random variable with continuous distribution function F . Then F (X) U (0, 1) (standard uniform). P (F (X) u) = P (X F (u)) = F (F (u)) = u, Lemma 2: quantile transform Let U be uniform and F the distribution function of any rv X. d 1 Then F (U ) = X so that P (F (U ) x) = F (x). These facts are the key to all statistical simulation and essential in dealing with copulas. u (0, 1).

c 2006 (Embrechts, Frey, McNeil)

74

Sklars TheoremLet F be a joint distribution function with margins F1, . . . , Fd. There exists a copula C such that for all x1, . . . , xd in [, ] F (x1, . . . , xd) = C(F1(x1), . . . , Fd(xd)). If the margins are continuous then C is unique; otherwise C is uniquely determined on RanF1 RanF2 . . . RanFd. And conversely, if C is a copula and F1, . . . , Fd are univariate distribution functions, then F dened above is a multivariate df with margins F1, . . . , Fd.

c 2006 (Embrechts, Frey, McNeil)

75

Sklars Theorem: Proof in Continuous CaseHenceforth, unless explicitly stated, vectors X will be assumed to have continuous marginal distributions. In this case: F (x1, . . . , xd) = P (X1 x1, . . . , Xd xd) = P (F1(X1) F1(x1), . . . , Fd(Xd) Fd(xd)) = C(F1(x1), . . . , Fd(xd)). The unique copula C can be calculated from F, F1, . . . , Fd using C(u1, . . . , ud) = F (F1 (u1), . . . , Fd (ud)) .

c 2006 (Embrechts, Frey, McNeil)

76

Copulas and Dependence StructuresSklars theorem shows how a unique copula C describes in a sense the dependence structure of the multivariate df of a random vector X. This motivates a further denition. Denition: Copula of X The copula of (X1, . . . , Xd) is the df C of (F1(X1), . . . , Fd(Xd)). Invariance C is invariant under strictly increasing transformations of the marginal distributions. If T1, . . . , Td are strictly increasing, then (T1(X1), . . . , Td(Xd)) has the same copula as (X1, . . . , Xd).

c 2006 (Embrechts, Frey, McNeil)

77

The Frchet Bounds eFor every copula C(u1, . . . , ud) we have the important boundsd

maxi=1

ui + 1 d, 0

C(u) min {u1, . . . , ud} .

(10)

The upper bound is the df of (U, . . . , U ) and the copula of the random a.s. vector X where Xi = Ti(X1) for increasing functions T2, . . . , Td. It represents perfect positive dependence or comonotonicity. The lower bound is only a copula when d = 2. It is the df of the a.s. vector (U, 1 U ) and the copula of (X1, X2) where X2 = T (X1) for T decreasing. It represents perfect negative dependence or countermonotonicity. The copula representing independence is C(u1, . . . , ud) =c 2006 (Embrechts, Frey, McNeil)

d i=1 ui.78

Parametric CopulasThere are basically two possibilities: Copulas implicit in well-known parametric distributions. Sklars Theorem states that we can always nd a copula in a parametric distribution function. Denoting the df by F and assuming the margins F1, . . . , Fd are continuous, the implied copula is C(u1, . . . , ud) = F (F1 (u1), . . . , Fd (ud))

. Such a copula may not have a simple closed form. Closed form parametric copula families generated by some explicit construction that is known to yield copulas. The best example is the well-known Archimedean copula family. These generally have limited numbers of parameters and limited exibility; the standard Archimedean copulas are dfs of exchangeable random vectors.c 2006 (Embrechts, Frey, McNeil) 79

Examples of Implicit CopulasGaussian CopulaGa CP (u) = P 1(u1), . . . , 1(ud) ,

where denotes the standard univariate normal df, P denotes the Ga joint df of X Nd(0, P ) and P is a correlation matrix. Write C when d = 2. P = Id gives independence and P = Jd gives comonotonicity. t Copulat C,P (u) = t,P t1(u1), . . . , t1(ud) , where t is the df of a standard univariate t distribution, t,P is the joint df of the vector X td(, 0, P ) and P is a correlation matrix. t Write C, when d = 2.

P = Jd gives comonotonicity, but P = Id does not give independence.c 2006 (Embrechts, Frey, McNeil) 80

Examples of Explicit (Archimedean) CopulasGumbel CopulaGu C (u1, . . . , ud)

= exp ( log u1) + + ( log ud)

1/

.

1: = 1 gives independence; gives comonotonicity. Clayton CopulaCl C (u1, . . . , ud)

=

u 1

+ +

u d

d+1

1/

.

> 0: 0 gives independence ; gives comonotonicity.

c 2006 (Embrechts, Frey, McNeil)

81

Simulating CopulasGa Simulating Gaussian copula CP

Simulate X Nd(0, P ) Set U = ( (X1) , . . . , (Xd)) (probability transformation)t Simulating t copula C,P

Simulate X td(, 0, P ) Set U = (t (X1) , . . . , t (Xd)) (probability transformation) t is df of univariate t distribution. Simulation of Archimedean copulas is less obvious, but also turns out to be fairly simple in the majority of cases.c 2006 (Embrechts, Frey, McNeil) 82

Simulating Copulas IIGaussian1.0 1.0 0.0 0.2 0.4 X1 0.6 0.8 1.0

Gumbel 0.0 0.2 0.4 X1 0.6 0.8 1.0

0.8

0.6

X2

0.4

X2 0.0 0.2 0.4

0.0

0.2

Clayton1.0 0.0 0.2 0.4 X1 0.6 0.8 1.0

0.6

0.8

t41.0 0.0 0.2 0.4 X1 0.6 0.8 1.0

0.8

0.6

X2

0.4

X2 0.0 0.2 0.4

Gauss: = 0.7, Gumbel: = 2, Clayton: = 2.2, t: = 0.71, = 4c 2006 (Embrechts, Frey, McNeil) 83

0.0

0.2

0.6

0.8

Meta-Distributions and Their SimulationBy the converse of Sklars Theorem we know that if C is a copula and F1, . . . , Fd are univariate dfs, then F (x) = C(F1(x1), . . . , Fd(xd)) is a multivariate df with margins F1, . . . , Fd. We refer to F as a meta-distribution with the dependence structure represented by C. For example, if C is a Gaussian copula we get a meta-Gaussian distribution and if C is a t copula we get a meta-t distribution. If we can sample from the copula C, then it is easy to sample from F : we generate a vector (U1, . . . , Ud) with df C and then return (F1 (U1) , . . . , Fd (Ud)) .

c 2006 (Embrechts, Frey, McNeil)

84

Simulating Meta-Gaussian DistributionsGaussian4

Gumbel4

2

0

X2

2

4

4

2

0 X1

2

4

44

2

X2

0

2

2

0 X1

2

4

Clayton4 4

t4

4 2 0 X1 2

2

0

X2

2

X24

4

4

2

0 X1

2

4

2

0

2

4

Linear correlation (X1, X2) 0.7 in all cases.c 2006 (Embrechts, Frey, McNeil) 85

A Note on Symmetry of CopulasExchangeability A copula is exchangeable if it is the df of an exchangeable random d vector U satisfying (U1, . . . , Ud) = (U(1), . . . , U(d)) for any permutation ((1), . . . , (d)) of (1, . . . , d). The copula must satisfy C(u1, . . . , ud) = C(u(1), . . . , u(d)). Examples: Clayton, Gumbel, or Gauss and t when P is an equicorrelation matrix. Radial Symmetry A copula is radially symmetric if it is the df of a random vector U d satisfying (U1, . . . , Ud) = (1 U1, . . . , 1 Ud). Examples: Gauss and t.c 2006 (Embrechts, Frey, McNeil) 86

Copula References [Embrechts et al., 2002] (dependence in QRM) [Joe, 1997] (on dependence in general) [Nelsen, 1999] (standard reference on bivariate copulas) [Daul et al., 2003] (summary article) [Cherubini et al., 2004] (copulas in nance)

c 2006 (Embrechts, Frey, McNeil)

87

C2. Copula-Based Dependence MeasuresConsider a pair of random variables (X1, X2) with continuous marginal distributions F1 and F2 and unique copula C. In this section we consider scalar measures of dependence for (X1, X2) which depend only on C, and not on the marginal distributions. We consider coecients of tail dependence, which provide a way of comparing the extremal dependence properties of copulas, i.e. the amount of dependence in the joint tails of the bivariate distribution. We consider rank correlations, which turn out to be useful in the calibration of copulas to empirical data. (They are more useful than the standard linear correlation, which is not a copula-based measure.)

c 2006 (Embrechts, Frey, McNeil)

88

Tail Dependence or Extremal DependenceWhen limit exists, coecient of upper tail dependence is u(X1, X2) = lim P (X2 > F2 (q) | X1 > F1 (q)), q1

Analogously the coecient of lower tail dependence is l(X1, X2) = lim P (X2 F2 (q) | X1 F1 (q)) . q0

These are functions of the copula given by u lc 2006 (Embrechts, Frey, McNeil)

C(q, q) 1 2q + C(q, q) = lim , = lim q1 q1 1 q 1q C(q, q) = lim . q0 q89

Tail DependenceClearly u [0, 1] and l [0, 1]. For copulas of elliptically symmetric distributions u = l =: . This is true, more generally, for all copulas with radial symmetry. Terminology: u (0, 1]: upper tail dependence, u = 0: asymptotic independence in upper tail, l (0, 1]: lower tail dependence, l = 0: asymptotic independence in lower tail.

c 2006 (Embrechts, Frey, McNeil)

90

Examples of Tail DependenceThe Gaussian copula is asymptotically independent for || < 1. The t copula is tail dependent when > 1. = 2t+1 +1 1 / 1 + .

The Gumbel copula is upper tail dependent for > 1. u = 2 21/ . The Clayton copula is lower tail dependent for > 0. l = 21/ . All formulas are derived in the book.c 2006 (Embrechts, Frey, McNeil) 91

Rank CorrelationSpearmans rho S (X1, X2) = (F1(X1), F2(X2)) = (copula)1 1

S (X1, X2) = 120 0

{C(u1, u2) u1u2}du1du2.

Kendalls tau Take an independent copy of (X1, X2) denoted (X1, X2). (X1, X2) = 2P (X1 X1)(X2 X2) > 0 11 1

(X1, X2) = 40c 2006 (Embrechts, Frey, McNeil)

C(u1, u2)dC(u1, u2) 1.092

Properties of Rank CorrelationThe following statements are true for Spearmans rho (S ) or Kendalls tau ( ), but not for Pearsons linear correlation (). S depends only on copula of (X1, X2). S is invariant under strictly increasing transformations of the random variables. S (X1, X2) = 1 X1, X2 comonotonic. S (X1, X2) = 1 X1, X2 countermonotonic.

c 2006 (Embrechts, Frey, McNeil)

93

Sample Rank CorrelationsConsider iid bivariate data {(X1,1, X1,2), . . . , (Xn,1, Xn,2)}. The standard estimator of (X1, X2) is 1 sgn [(Xi,1 n 2 1i 0.

The class of such functions coincides with the class of Laplace transforms of dfs G on R+ satisfying G(0) = 0. Recall that the Laplace transform G of G is given by

G(t) =0

etxdG(x), t 0.

For this reason we refer to generators with completely monotonic inverses as LT-Archimedean generators and the resulting copulas as LT-Archimedean copulas.c 2006 (Embrechts, Frey, McNeil) 104

Construction of an LT-Archimedean CopulaLet V be a rv with df G and let U1, . . . , Ud be conditionally independent given V with conditional distribution function given by P (Ui u | V = v) = exp(v G1(u)) for u [0, 1]. The distribution of (U1, . . . , Ud) is an LT-Archimedean copula: F (u1, . . . , ud) = P (U1 u1, . . . , Ud ud)

=0

P (U1 u1, . . . , Ud ud | V = v) dG(v) e0 b b v (G1(u1)++G1(ud))

=

dG(v)

= G G1(u1) + + G1(ud) . [Marshall and Olkin, 1988]c 2006 (Embrechts, Frey, McNeil) 105

Sampling LT-Archimedean CopulasFor any LT-Archimedean copula generator there exists a distribution function G on R+ with G(0) = 0 such that G = 1. The main practical problems are (i) to nd the G and (ii) to sample from the corresponding distribution. If this can be achieved we then have the algorithm. 1. Generate a variate V with df G such that G = 1. 2. Generate independent uniform variates X1, . . . , Xd. 3. Return (U1, . . . , Ud) = (G( ln(X1)/V ), . . . , G( ln(Xd)/V )).

c 2006 (Embrechts, Frey, McNeil)

106

Special CasesClayton copula We generate a gamma variate V Ga(1/, 1) with > 0. The df of V has Laplace transform G(t) = (1 + t)1/ . Gumbel copula We generate a positive stable variate V St(1/, 1, , 0) where = (cos(/(2))) and > 1. This df has Laplace transform G(t) = exp(t1/ ) as desired. For denitions of these distributions, consult the book.

c 2006 (Embrechts, Frey, McNeil)

107

Simulating Gumbel Copula0.0 0.4 0.8 0.0 0.4 0.8

1.0

0.0

0.6

0.8

0.2

0.4

gumbel[,2]

1.0

0.0

0.0 0.4 0.8

0.0 0.4 0.8

0.6

0.8

1000 points from 4-dimensional Gumbel with = 2.c 2006 (Embrechts, Frey, McNeil) 108

0.2

0.4

gumbel[,4]

0.0

0.2

0.4

gumbel[,3]

0.6

0.8

1.0

0.0

0.2

0.4

gumbel[,1]

0.6

0.8

1.0

Partially Exchangeable Archimedean CopulasBy mixing generators it is possible to create copulas with group structure. A possible 3-dimensional construction is C(u1, u2, u3) = 1 2 1 (1(u1) + 1(u2)) + 2(u3) . 2 1 1. 1, 2, 3 are LT-Archimedean generators. 2. Derivative of 2 1 must be completely monotonic. 1 3. All bivariate margins are bivariate Archimedean copulas. 4. If (U1, U2, U3) have this df then the pair (U1, U2) are, roughly speaking, more dependent than the pairs (U1, U3) and (U2, U3).c 2006 (Embrechts, Frey, McNeil)

109

C5. Fitting Copulas to DataWe have data vectors X1, . . . , Xn with identical distribution function F . We write Xt = (Xt,1, . . . , Xt,d) for an individual data vector and X = (X1, . . . , Xd) for a generic random vector with df F . We assume further that this df F has continuous margins F1, . . . , Fd and thus by Sklars theorem a unique representation F (x) = C(F1(x1), . . . , Fd(xd)). This module devoted to the problem of estimating the parameters of a parametric copula C . The main method we consider is maximum likelihood estimation, but we rst outline a simpler method-of-moments procedure using sample rank correlation estimates; this method has the advantage that marginal distributions do not need to be estimated so that inference about the copula is margin-free.c 2006 (Embrechts, Frey, McNeil) 110

Method-of-Moments Using Rank CorrelationRecall the standard estimators of Kendalls rank correlation and Spearmans rank correlation. We will use the notation R and RS to denote matrices of pairwise estimates. These can be shown to be positive semi-denite (see book). Calibrating Gauss copula with Spearmans rhoGa Suppose we assume a meta-Gaussian model for X with copula CP and we wish to estimate the correlation matrix P . It follows from Theorem 5.36 in book that

6 ij S (Xi, Xj ) = arcsin ij , 2 where the nal approximation is very accurate. This suggests we estimate P by the matrix of pairwise Spearmans rank coecients RS .c 2006 (Embrechts, Frey, McNeil) 111

Calibrating t Copula with Kendalls taut Suppose we assume a meta t model for X with copula C,P and we wish to estimate the correlation matrix P . The theoretical relationship between Spearmans rho and P is not known in this case, but a relationship between Kendalls tau and P is known.

It follows from Proposition 5.37 in book that 2 (Xi, Xj ) = arcsin ij , so that a possible estimator of P is the matrix R with components given by rij = sin(rij /2) This may not be positive denite, in which case R can be transformed by the eigenvalue method given in Algorithm 5.55 to obtain a positive denite matrix that is close to R. The remaining parameter of the copula could then be estimated by maximum likelihood.c 2006 (Embrechts, Frey, McNeil) 112

Maximum Likelihood MethodTo estimate the copula by ML we require a so-called pseudo-sample of observations from the copula. To construct such a sample we are required to estimate marginal distributions. This can be done with 1. parametric models F1, . . . , Fd, 2. a form of the empirical distribution function such as n 1 Fj (x) = n+1 i=1 1{Xi,j x}, j = 1, . . . , d, 3. empirical df with EVT tail model. The second method, known as pseudo-maximum likelihood, means that we essentially work with the ranks of the original data, standardized to lie on the copula scale. For statistical properties see [Genest and Rivest, 1993].c 2006 (Embrechts, Frey, McNeil) 113

Estimating the CopulaWe form the pseudo-sample Ui = Ui,1, . . . , Ui,d = F1(Xi,1), . . . , Fd(Xi,d) , i = 1, . . . , n.

and t parametric copula C by maximum likelihood. Copula density is c(u1, . . . , ud; ) = u1 u C(u1, . . . , ud; ), where d denote unknown parameters. The log-likelihood is n

l(; U1, . . . , Un) =i=1

log c(Ui,1, . . . , Ui,d; ).

Independence of vector observations assumed for simplicity.

c 2006 (Embrechts, Frey, McNeil)

114

BMW-Siemens Example I1.0 0.0 0.2 0.4 BMW 0.6 0.8 1.0

SIEMENS 0.0 0.2 0.4

The pseudo-sample from copula after estimation of margins.c 2006 (Embrechts, Frey, McNeil) 115

0.6

0.8

BMW-Siemens Example IICopula Gauss t Gumbel Clayton , std.error(s) log-likelihood 0.70 0.0098 610.39 0.70 4.89 0.0122,0.73 649.25 1.90 0.0363 584.46 1.42 0.0541 527.46

Goodness-of-t. Akaikes criterion (AIC) suggests choosing model that minimizes AIC = 2p 2 (log-likelihood), where p = number of parameters of model. This is clearly t model. Remark. Formal methods for goodness-of-t also available.c 2006 (Embrechts, Frey, McNeil) 116

Dow Jones Example0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

T

0.0 0.2 0.4 0.6 0.8 1.0

GE

IBM

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

MCD

0.0 0.2

MSFT

0.4

0.6

0.8

1.0

The pseudo-sample from copula after estimation of margins.c 2006 (Embrechts, Frey, McNeil) 117

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Dow Jones Example II:650 loglikelihood 500 550 600

5

10 nu

15

20

Daily returns on ATT, General Electric, IBM, McDonalds, Microsoft. Form of likelihood for nu indicates non-Gaussian dependence.c 2006 (Embrechts, Frey, McNeil) 118

II: Modelling Extreme RisksA. Extreme Value Theory B. Threshold Models C. More Advanced Topics in Extremes

c 2006 (Embrechts, Frey, McNeil)

119

A. Extreme Value Theory: Maxima1. Limiting Behaviour of Maxima 2. The Fisher-Tippett Theorem 3. Maximum Domains of Attraction 4. Maxima of Dependent Data 5. The Block Maxima Method

c 2006 (Embrechts, Frey, McNeil)

120

A1. Limiting Behaviour of MaximaBroadly speaking, there are two principal kinds of model for extreme values. The most traditional models are models for block maxima, which are the subject of this module. These are models for the largest observations collected from large samples of identically distributed observations. The asymptotic theory of maxima is the main subject of classical EVT. A more modern and powerful group of models are the models for threshold exceedances. These are models for all large observations that exceed a some high level, and are generally considered to be the most useful for practical applications, due to their more ecient use of the (often limited) data on extreme outcomes.

c 2006 (Embrechts, Frey, McNeil)

121

Notation for Study of MaximaLet X1, X2, . . . be iid random variables with distribution function (df) F . In risk management applications these could represent nancial losses, operational losses or insurance losses. Let Mn = max (X1, . . . , Xn) be the maximum loss in a sample of n losses. Clearly P (Mn x) = P (X1 x, . . . , Xn x) = F n(x) . It can be shown that, almost surely, Mn xF , where xF := sup{x R : F (x) < 1} is the right endpoint of F . But what about normalized maxima?n

c 2006 (Embrechts, Frey, McNeil)

122

Limiting Behaviour of Sums or Averages(See [Embrechts et al., 1997], Chapter 2.) We are familiar with the central limit theorem. Let X1, X2, . . . be iid with nite mean and nite variance 2. Let Sn = X1 + X2 + . . . + Xn. Then 2 x n (x) , P (Sn n) / n where is the distribution function of the standard normal distribution x 1 u2/2 (x) = e du . 2 Note, more generally, the limiting distributions for appropriately normalized sample sums are the class of stable distributions; Gaussian distribution is a special case.c 2006 (Embrechts, Frey, McNeil) 123

Limiting Behaviour of Sample Maxima(See [Embrechts et al., 1997], Chapter 3.) Let X1, X2, . . . be iid from F and let Mn = max (X1, . . . , Xn). Suppose we can nd sequences of real numbers cn > 0 and dn such that (Mn dn) /cn, the sequence of normalized maxima, converges in distribution, i.e. P ((Mn dn) /cn x) = F (cnx + dn) H(x) , for some nondegenerate df H(x). If this condition holds we say that F is in the maximum domain of attraction of H, abbreviated F MDA(H) . Note that such an H is determined up to location and scale, i.e. will specify a unique type of distribution.c 2006 (Embrechts, Frey, McNeil) 124

n

n

Types of DistributionsDenition (Equality in type) Two random variables U and V are of the same type d U = aV + b for a > 0, b R. In terms of their dfs FU and FV this means FU (x) = FV ((x b)/a). Thus random variables of the same type have the same df, up to possible changes of scale and location.

c 2006 (Embrechts, Frey, McNeil)

125

A2. The Fisher-Tippett TheoremThe generalized extreme value (GEV) distribution has df H (x) = exp (1 + x)1/ exp (ex) = 0, = 0,

where 1 + x > 0 and is the shape parameter. Note, this parametrization is continuous in . For >0 0 and work with H,, (x) := H ((x )/). Clearly H,, is of type H .c 2006 (Embrechts, Frey, McNeil) 126

GEV Distribution Functions and Densities1.0

0.8

H(x)

0.6

h(x) 0.4 0.0 0.2

2

0

2 x

4

6

8

0.0

0.1

0.2

0.3

0.4

2

0

2 x

4

6

8

Solid line corresponds to = 0 (Gumbel); dotted line is = 0.5 (Frchet); dashed line is = 0.5 (Weibull). = 0 and = 1. ec 2006 (Embrechts, Frey, McNeil) 127

Fisher-Tippett Theorem (1928)Theorem: If F MDA(H) then H is of the type H for some . If suitably normalized maxima converge in distribution to a non-degenerate limit, then the limit distribution must be an extreme value distribution. Remark 1: Essentially all commonly encountered continuous distributions are in the maximum domain of attraction of an extreme value distribution. Remark 2: We can always choose normalizing sequences cn and dn so that the limit law H appears in standard form (without relocation or rescaling).

c 2006 (Embrechts, Frey, McNeil)

128

A3. Maximum Domains of AttractionWhen does F MDA (H ) hold? Frchet Case ( > 0) e Gnedenko (1943) showed that for > 0 F MDA (H ) 1 F (x) = x1/ L(x) , for some slowly varying function L(x). A function L on (0, ) is slowly varying if limx L(tx) = 1 , t > 0 . L(x) Summary: The distributions in this class have heavy tails that decay like power functions. Not all moments are nite. Examples are Pareto, log-gamma, F, t and Cauchy.c 2006 (Embrechts, Frey, McNeil) 129

Maximum Domains of Attraction IIGumbel Case ( = 0) This class contains distributions with tails that decay faster than power tails, for example exponentially. They are lighter-tailed distributions for which all moments are nite. Examples are the Normal, lognormal, exponential and gamma. Weibull Case ( < 0) For < 0, F MDA(H ) if and only if xF < and 1 F (xF x1) = x1/ L(x), for some slowly varying function L(x). Examples are short-tailed distributions like uniform and beta.

c 2006 (Embrechts, Frey, McNeil)

130

ExamplesRecall: F MDA(H ) , i there are sequences cn and dn with P ((Mn dn) /cn x) = F (cnx + dn) H(x) . We have the following examples: The exponential distribution, F (x) = 1 ex, > 0, x 0, is in MDA(H0) (Gumbel-case). Take cn = 1/, dn = (log n)/. The Pareto distribution, F (x) = 1 +x n n

,

, > 0,

x 0,

is in MDA(H1/) (Frchet case). e n1/ .c 2006 (Embrechts, Frey, McNeil)

Take cn = n1//, dn =131

A4. Maxima of Dependent Data Let (Xi)iZ be a (strictly) stationary time series; let (Xi)iZ be an associated iid series with the same marginal distribution F . Let Mn = max(X1, . . . , Xn) and Mn = max(X1, . . . , Xn) be the respective maxima of n-blocks. For many processes (Xi)iZ it may be shown that there exists a real number in (0, 1] such thatn

lim P {(Mn dn)/cn x} = H(x),

for a non-degenerate limit H(x) if and only ifnc 2006 (Embrechts, Frey, McNeil)

lim P {(Mn dn)/cn x} = H (x).132

Extremal IndexThe value is known as the extremal index of the process (to be distinguished from the tail index of distributions in the Frchet class). e For processes with an extremal index normalized block maxima converge in distribution provided that maxima of the associated iid process converge in distribution, that is, provided the underlying distribution F is in MDA(H ) for some . H (x) is a distribution of the same type as H (x). It is also a GEV distribution with exactly the same scaling parameter . Only the location and scaling of the distribution change.

c 2006 (Embrechts, Frey, McNeil)

133

Interpretation of Extremal IndexWriting u = cnx + dn we observe that, for large enough n P (Mn u) P Mn u = F n (u), so that at high levels the probability distribution of the maximum of n observations from the time series with extremal index is like that of the maximum of n < n observations from the associated iid series. In a sense n can be thought of as counting the number of roughly independent clusters of observations in n observations and is often interpreted as the reciprocal of the mean cluster size.

c 2006 (Embrechts, Frey, McNeil)

134

Extremal Index: ExamplesNot every strictly stationary process has an extremal index (EKM page 418) but for the kinds of time series processes that interest us in nancial modelling an extremal index generally exists. We distinguish between the cases when = 1 and the cases when < 1; for the former there is no tendency for extremes to cluster at high levels and large sample maxima behave exactly like maxima from similarly-sized iid samples. Strict white noise processes (iid rvs) have extremal index = 1. ARMA processes with iid Gaussian innovations have = 1 (EKM, pages 216218). However, if the innovation distribution is in MDA(H ) for > 0, then < 1 (EKM, pages 415415). ARCH and GARCH processes have < 1 (EKM, pages 476480).c 2006 (Embrechts, Frey, McNeil) 135

A5. The Block Maxima MethodAssume that we have a large enough block of n iid random variables so that the limit result is more or less exact, i.e. an > 0, dn R such that, for some , P Mn d n x cn H (x) .ydn cn

Now set y = cnx + dn. P (Mn y) H We wish to estimate , dn and cn.

= H,dn,cn (y).

Implication: We collect data on block maxima and t the three-parameter form of the GEV. For this we require a lot of raw data so that we can form suciently many, suciently large blocks.

c 2006 (Embrechts, Frey, McNeil)

136

ML Inference for MaximaWe have block maxima data y = Mn , . . . , Mn from m blocks of size n. We wish to estimate = (, , ) . We construct a loglikelihood by assuming we have independent observations from a GEV with density h ,m (1) (m)

L(; y) = logi=1

(i) h Mn 1n1+M (i)/>0on

,

and maximize this w.r.t. to obtain the MLE = (, , ) . Clearly, in dening blocks, bias and variance must be traded o. We reduce bias by increasing the block size n; we reduce variance by increasing the number of blocks m.c 2006 (Embrechts, Frey, McNeil) 137

The S&P 500 ExampleIt is the early evening of Friday the 16th October 1987. In the equity markets it has been an unusually turbulent week, which has seen the S&P 500 index fall by 9.21%. On that Friday alone the index is down 5.25% on the previous day, the largest oneday fall since 1962. At our disposal are all daily closing values of the index since 1960. We analyse annual maxima of daily percentage falls in the index. (1) (28) These values M260, . . . , M260 are assumed to be iid from H,, . Remark. Although we have only justied this choice of limiting distribution for maxima of iid data, it turns out that the GEV is also the correct limit for maxima of stationary time series, under some technical conditions on the nature of the dependence. These conditions are fullled, for example, by GARCH processes.c 2006 (Embrechts, Frey, McNeil) 138

S&P 500 Return DataS&P 500 to 16th October 1987

-605.01.60

-4

-2

0

2

4

05.01.65

05.01.70

05.01.75 Time

05.01.80

05.01.85

c 2006 (Embrechts, Frey, McNeil)

139

Assessing the Risk in S&PWe will address the following two questions: What is the probability that next years maximum exceeds all previous levels? What is the 40year return level R260,40? In the rst question we assess the probability of observing a new record. In the second problem we dene and estimate a rare stress or scenario loss.

c 2006 (Embrechts, Frey, McNeil)

140

Return LevelsRn,k , the k nblock return level, is dened by P (Mn > Rn,k ) = 1/k ; i.e. it is that level which is exceeded in one out of every k nblocks, on average. We use the approximation1 Rn,k H,, (1 1/k) + ( log(1 1/k))

1

.

We wish to estimate this functional of the unknown parameters of our GEV model for maxima of nblocks.

c 2006 (Embrechts, Frey, McNeil)

141

S-Plus Maxima Analysis with EVIS> out out $n.all: [1] 6985 $n: [1] 28 $data: 1960 2.268191 1968 1.899367 1976 1.797353 1984 1.820587

1961 2.083017 1969 1.903001 1977 1.625611 1985 1.455301

1962 6.675635 1970 2.768166 1978 2.009257 1986 4.816644

1963 1964 1965 1966 1967 2.806479 1.253012 1.757765 2.460411 1.558183 1971 1972 1973 1974 1975 1.522388 1.319013 3.051598 3.671256 2.362394 1979 1980 1981 1982 1983 2.957772 3.006734 2.886327 3.996544 2.697254 1987 5.253623

$par.ests: xi sigma mu 0.3343843 0.6715922 1.974976 $par.ses: xi sigma mu 0.2081 0.130821 0.1512828 $nllh.final: [1] 38.33949

c 2006 (Embrechts, Frey, McNeil)

142

S&P Example (continued)Answers: Probability is estimated by 1 H,, max M260, . . . , M260 R260,40 is estimated by1 H,, (1 1/40) = 6.83 . (1) (28)

= 0.027 .

It is important to construct condence intervals for such statistics. We use asymptotic likelihood ratio ideas to construct asymmetric intervals the socalled prole likelihood method.c 2006 (Embrechts, Frey, McNeil) 143

Estimated 40-Year Return LevelS&P Negative Returns with 40 Year Return Level20 -505.01.60

0

5

10

15

05.01.65

05.01.70

05.01.75 Time

05.01.80

05.01.85

c 2006 (Embrechts, Frey, McNeil)

144

ReferencesOn EVT in general: [Embrechts et al., 1997] [Reiss and Thomas, 1997] On Fisher-Tippett Theorem: [Fisher and Tippett, 1928] [Gnedenko, 1943] Application of Block Maxima Method to S&P Data: [McNeil, 1998]c 2006 (Embrechts, Frey, McNeil) 145

B. EVT: Modelling Threshold Exceedances1. The Generalized Pareto Distribution 2. Modelling Excess Losses over Thresholds 3. Modelling Tails of Loss Distributions 4. The POT Model

c 2006 (Embrechts, Frey, McNeil)

146

B1. Generalized Pareto DistributionThe GPD is a two parameter distribution with df G, (x) = 1 (1 + x/)1/ 1 exp(x/) = 0, = 0,

where > 0, and the support is x 0 when 0 and 0 x / when < 0. This subsumes: > 0 Pareto (reparametrized version) = 0 exponential < 0 Pareto type II. Moments. For > 0 distribution is heavy tailed. E X k does not exist for k 1/.c 2006 (Embrechts, Frey, McNeil) 147

GPD: Dfs and Densities1.0 1.0 g(x) 0.4 0.2 0.0

0.8

0.6

G(x)

0

2

4 x

6

8

0.0

0.2

0.4

0.6

0.8

0

2

4 x

6

8

Solid line corresponds to = 0 (exponential); dotted line is = 0.5 (Pareto); dashed line is = 0.5 (Pareto type II). = 1.c 2006 (Embrechts, Frey, McNeil) 148

The Role of the GPDThe GPD is a natural limiting model for excess losses over high thresholds. To discuss this idea we need the concepts of the excess distribution over a threshold and the mean excess function. Let u be a high threshold and dene the excess distribution above the threshold u to have the df F (x + u) F (u) , Fu(x) = P (X u x | X > u) = 1 F (u) for 0 x < xF u where xF is the right endpoint of F . The mean excess function of a rv X is e(u) = E(X u | X > u). It is the mean of the excess distribution above the threshold u expressed as a function of u.c 2006 (Embrechts, Frey, McNeil) 149

Examples1. Exponential. F (x) = 1 ex, > 0, x 0. Fu(x) = F (x) , x 0.

The lack-of-memory property. e(u) = 1/ for all u 0 2. GPD. F (x) = G, (x). Fu(x) = G,+u(x), 0 x < if 0 0x

u if < 0.

The excess distribution of GPD remains GPD with same shape. + u e(u) = , 1 0 u < if 0 0u

if < 0.150

Mean excess function is linear in threshold u.c 2006 (Embrechts, Frey, McNeil)

Asymptotics of Excess DistributionTheorem.[Balkema and de Haan, 1974, Pickands, 1975] We can nd a function (u) such thatuxF 0x 0. This is clearly an idealized model but facilitates a number of later calculations. Given data X1, . . . , Xn a random number Nu will exceed the threshold. We relabel these exceedances X1, . . . , XNu and calculate excess amounts Yi = Xi u, for i = 1, . . . , Nu. To estimate and we t the GPD to the excess data. We may use various tting methods including maximum likelihood (ML) and probability-weighted moments (PWM).

c 2006 (Embrechts, Frey, McNeil)

152

Danish Fire Loss DataThe Danish data consist of 2167 losses exceeding one million Danish Krone from the years 1980 to 1990. The loss gure is a total loss for the event concerned and includes damage to buildings, damage to contents of buildings as well as loss of prots. The data have been adjusted for ination to reect 1985 values.Large Insurance Claims

0030180

50

100

150

200

250

030182

030184

030186 Time

030188

030190

c 2006 (Embrechts, Frey, McNeil)

153

EVIS POT Analysis$par.ests: xi beta 0.4969857 6.975468 $par.ses: xi beta 0.1362838 1.11349 > out out $n: [1] 2167 $data: [1] 11.37482 [4] 11.71303 [7] 13.62079 ...etc... [106] 144.65759 [109] 17.73927 $threshold: [1] 10 $p.less.thresh: [1] 0.9497 $n.exceed: [1] 109 c 2006 (Embrechts, Frey, McNeil) 154 $varcov: [,1] [,2] [1,] 0.01857326 -0.08194611 [2,] -0.08194611 1.23986096 $information: [1] "observed" $converged: [1] T $nllh.final: [1] 374.893 $

26.21464 14.12208 12.46559 17.56955 21.96193 263.25037 28.63036 19.26568

Estimating Excess dfEstimate of Excess Distribution1.0

Fu(x-u)

0.0

10

0.2

0.4

0.6

0.8

50 x (on log scale)

100

c 2006 (Embrechts, Frey, McNeil)

155

Serially Dependent DataIn the application of ML we usually assume the underlying losses, and hence the excess losses, are iid. This is unproblematic for insurance or operational risk data but not for nancial time series. If the data are serially dependent but show no tendency for extremes to cluster (suggesting = 1) then high-level threshold exceedances occur as a Poisson process and the excess loss amounts over the threshold can essentially be modelled as iid data. If extremal clustering is present (suggesting < 1 as in GARCH process), then iid assumption is unsatisfactory. The easiest approach is to neglect this problem. Although the likelihood is misspecied with respect to the serial dependence of the data, the point estimates should still be reasonable although standard errors may be too small.c 2006 (Embrechts, Frey, McNeil) 156

Excesses Over Higher ThresholdsFrom the model