Rossmanith - Completion of Market Data

7/25/2019 Rossmanith - Completion of Market Data

1/62

Completion of Market Data

Dr Richard Rossmanith

Kellogg College

University of Oxford

A thesis submitted in partial fulfilment of the requirements for the MSc in

Mathematical Finance

March 2003


2/62

3

Acknowledgements

I thank my employer Arthur Andersen Financial and Commodity Risk Consulting Division,Frankfurt (now d-fine GmbH) for the opportunity to participate in the Mathematical Financecourse at the University of Oxford and to prepare this MSc thesis. The company financed thiscourse and gave me part of the necessary time off.

I am especially grateful to Dr Hans-Peter Deutsch who had the idea for our groupsparticipation in the course and who made it all possible. For fruitful discussions about interestrate dynamics and time series analysis, and mutual support concerning library and IT issues, I

include my colleagues Dr Christian Hoffmann, Peter Tichatschke, Dr Frank Hofmann, MichaelGiberman, Dr Andreas Werner and Jrgen Topper in my thank-you list.

I am indebted to our client Bayerische Landesbank, its team manager Dr Walter Prem, and itsproject manager Oliver Bopp for the permission to use their test and development market database servers for my empirical evaluations, even on week-ends. Special thanks also to Kai Raddeand Jens Erler for our discussions on the mathematical methods implemented.

From the University of Oxford, I want to express my gratitude to Dr Jeff Dewynne, Dr SamHowison, and Dr Paul Wilmott for launching the course and for their lectures. Particular thanksgo to my academic supervisor Jeff Dewynne, also for his understanding that my academic workwas squeezed between the time for my profession and the time for my family. I thank all


3/62

4

external practitioner lecturers for their lectures, but most notably Dr Jamil Baz, Dr Chris Hunterand Dr Riccardo Rebonato for the insights into interest rate dynamics I gained from theirs.Special thanks to Tiffany Fliss for organising the course in Oxford. I also thank her successorsAnna Turner, Rosalind Sainty, and Riaz Ahmad for their efforts.

I am also grateful to Kellogg College generally and to its President Dr Geoffrey Thomaspersonally for making our time and stay there very enjoyable.

Outside of Oxford, in Scotland, I thank the School of Mathematics and Statistics at the

University of St. Andrews, and the Head of the School, Professor Dr Edmund Robertson, fortheir organisational, office, IT and library support during my term as Honorary Lecturer there.Additionally, I want to state that it was a great pleasure to work together with an expert such asProfessor Dr Alan Cairns on our joint introductory lecture on Financial Mathematics. InGermany, from the Universitt Augsburg and the Ludwig-Maximilians-Universitt Mnchen,thanks to Dr Gregor Dorfleitner, Dr Thomas Klein, and Gregor Rossmanith for usefuldiscussions and references.

Finally, I thank my family for their love and encouragement, and especially my wife Batricefor her understanding of the time constraints involved, and my mother Gertraud, and mymother-in-law Christiane Sacreste for their support and backup, not only during weekends.


4/62

5

Completion of Market Data

Dr Richard RossmanithMSc in Mathematical Finance

Hilary Term 2003

Abstract

Incomplete data is a very common problem financial institutions face when they collectfinancial time series in IT data bases from commercial data vendors for regulatory, accounting,and benchmarking purposes. The thesis at hand presents several data completion techniques,some of which are productively used in practice, and others which are less established. Optimalcompletion methods are then recommended, based on an empirical study for the completion ofswap and forward rate curves in the currencies Deutsche Mark (respectively Euro), PoundSterling, and US Dollar. The source code of all completion routines, programmed in a standard

professional market data base environment, is listed in an appendix (it is available in electronic

form upon request).


5/62

6

Dedication

Pour Batrice et Lukas.


6/62

7

Contents

CHAPTER 1 INTRODUCTION ........................................................................................10

1.1 RELEVANCE OF MARKET DATA IN RISK CONTROL .......................................................10

1.2 SEPARATION OF MARKET DATA FOR TRADING AND FOR RISK CONTROL .....................11

1.3 BLOOMBERG, REUTERS, RISKMETRICS ET AL....................................................12

1.4 MARKET DATA COMPLETION ........................................................................................12

1.5 MARKET DATA VALIDATION .........................................................................................13

1.6 ASSET CONTROL AND FORMULA ENGINE.....................................................................14

CHAPTER 2 STRUCTURAL INTERPOLATION..........................................................15

2.1 OVERVIEW OVER STRUCTURAL INTERPOLATION METHODS .........................................15

2.2 LOG-LINEAR INTERPOLATION ON DISCOUNT FACTORS .................................................15

2.3 INTERPOLATION ALGORITHM........................................................................................17

2.4 CODE IMPLEMENTATION ...............................................................................................17

2.5 PROBLEMS WITH STRUCTURAL INTERPOLATION...........................................................17

CHAPTER 3 PREVIOUS-DAY EXTRAPOLATION .....................................................183.1 ABSOLUTE VALUE EXTRAPOLATION .............................................................................18

3.2 VARIANT: RELATIVE VALUE EXTRAPOLATION .............................................................18

3.3 VARIANT: ABSOLUTE VALUE INTERPOLATION .............................................................19


CHAPTER 4 CONDITIONAL EXPECTATION.............................................................20

4.1 JOINT GEOMETRICAL BROWNIAN MOTION ....................................................................20

4.2 COVARIANCE ESTIMATION ............................................................................................21

4.3 CONDITIONAL EXPECTATION FOR NORMAL RANDOM VARIABLES................................22

4.4 ESTIMATION ALGORITHM..............................................................................................22


CHAPTER 5 EXPECTATION MAXIMISATION ..........................................................24

5.1 MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA THE EM ALGORITHM ..............24

5.2 CONDITIONAL EXPECTATION AS PART OF EXPECTATION MAXIMISATION ....................24

5.3 CONVERGENCE OF THE EM ALGORITHM IN A GARCH MODEL ...................................25

5.4 COMPLETION ALGORITHM.............................................................................................26


7/62

8


CHAPTER 6 PRINCIPAL COMPONENT ANALYSIS .................................................286.1 SHIFT, TWIST, AND HUMP..............................................................................................28

6.2 EXTRAPOLATING PRINCIPAL COMPONENTS ..................................................................29

6.3 COMPLETION ALGORITHM.............................................................................................30


6.5 COMPARISON WITH CONDITIONAL EXPECTATION.........................................................30

CHAPTER 7 METHOD COMPARISON .........................................................................32

7.1 EXAMPLE CURVES OF SEPTEMBER 2001 .......................................................................32

7.2 EMPIRICAL AND SIMULATED DATA ...............................................................................33

7.3 EXPERIMENTAL SET-UP .................................................................................................34

7.4 COMPLETION STATISTICS FOR THE DEM YIELD CURVE ...............................................35

7.5 COMPLETION STATISTICS FOR THE GBP YIELD CURVE ................................................37

7.6 COMPLETION STATISTICS FOR THE USD YIELD CURVE ................................................39

CHAPTER 8 CONCLUSION AND RECOMMENDATION..........................................40

CHAPTER 9 OUTLOOK AND QUESTIONS..................................................................41

BIBLIOGRAPHY......................................................................................................................42

A APPENDIX: FORMULA ENGINE CODE .....................................................................44

A.1 FORWARD.FE .................................................................................................................44

A.2 BACKEXTRAPOLATE.FE .................................................................................................45

A.3 EXTRAPOLATION.FE ......................................................................................................46

A.4 COVARMATRIX.FE .........................................................................................................46

A.5 CONDITIONALDISTRIBUTION .FE ....................................................................................48

A.6 PCAESTIMATION.FE .......................................................................................................49

A.7 COMPLETE.FE ................................................................................................................50

B APPENDIX: HISTOGRAMS ...........................................................................................55

B.1 ERROR HISTOGRAMS OF THE DEM YIELD CURVE COMPLETION ...................................55

B.2 ERROR HISTOGRAMS OF THE GBP YIELD CURVE COMPLETION ....................................58

B.3 ERROR HISTOGRAMS OF THE USD YIELD CURVE COMPLETION....................................61


8/62

9

Von nix kommt nix. Nothing will come of nothing.

Bayerischer Volksmund William ShakespeareSpruch King Lear


9/62

10

Chapter 1 Introduction

1.1 Relevance of market data in Risk Control

Modern Risk Control of a large banks trading portfolio does not involve a single piece ofsoftware only, but a collection of IT systems, as illustrated by the following data flow chart:

Market Data Base

FO System(Money

Market)

Risk

Engine

FO System

(Bonds, Swaps,

IR Derivatives)

FO System(Stocks, Equity

Derivatives)

FO System(FX Options)

(etc. )

Value-at-

Risk Report

The risk engine collects all trading positions from the Front Office (FO) systems. It thenevaluates their prices based on the data provided by the trade-independent market data base, andit assesses the associated risks according to some internal statistical model. All this is finallysummarised in a Value-at-Risk report for the banks management.

In this framework, many banks have focused on the perfection of their internal mathematicalmodel a task both financially and scientifically rewarding. They frequently found, however,that the reliability of the models results depended heavily on the quality of the input marketdata1 generally a much neglected topic. Consequently, there has been a recent2 perceptibleshift in attention towards the implementation of comprehensive and redundant market data basesystems which provide data completion, data validation, and data cleansing.

I have professionally consulted several large German banks on this topic, and together with theclients management co-directed the subsequent system and process integration projects. For

1Particularly sensitive risk models are the ones which use empirical distributions (historical simulation), but other

methods (e.g. Monte Carlo simulation) are affected as well.2This started around A.D. 2000, just after the (overrated) Euro introduction and millennium bug issues.


10/62

11

this thesis, I have now taken the time to study the particular problem of data completion indetail. So in the following text, we will encounter several completion techniques, some of a

heuristic nature, and others of a more profound statistical nature. The empirical sample data forour studies are swap rate curve time series in major currencies.

1.2 Separation of market data for Trading and for Risk Control

Apart from regulatory compliance (trader independence), which is of course a legal matter,there are other intrinsic criteria why market data for the Front Office and market data for RiskControl should be considered unequal:

The first difference concerns the consumptionof data. On the one hand, an equity (or swap, orFX, etc.) trader needs an up-to-date intra-day, high-performance partial view on the market, i.e.current levels of equity (or swap, or FX, or ) prices. Hence each Front Office system usually

has a direct unfiltered connection to a single data provider. On the other hand, in Risk Control,all financial markets need to be considered.

3This drastically increases the data volume, and so

the (technical) speed of the data feed cannot be as high. Therefore, one usually takes a one-snapshot-per-day view on the market here.

A second difference is how market data isprocessed. The data vendors (cf. Section 1.3below)usually deliver data fields for parameters which for the trader specify the conditions underwhich a deal is struck.4If such a data field, due to some technical error, contains a wrong (ormissing) value, any sufficiently experienced trader will spot the mistake immediately. However,in a fully automated Risk Control process (as described in the preceding Section 1.1), a largedata set is compressed into a single number for the Value-at-Risk report. By simply looking atthis report, it is virtually impossible to decide if the input data set contains (significantly)

erroneous values. Therefore, the input values must be filtered for errors, and missing valuesmust be compensated for.

A third and significant difference is the interpretation of the data. For option valuationpurposes, one usually assumes a risk-neutral measure to obtain no-arbitrage prices. A marketsnapshot including all underlying prices and associated parameters (discount factors, impliedoption volatilities, etc.) is generally sufficient for this task. In contrast, the real world measureapplies in Risk Control.

Of course, one might argue in the latter case that the change of measures from risk-neutral toreal-world (or risk-averse) affects only the market parameter drifts, but not their volatilitiesand correlations. So in principle, implied option volatilities could be used instead of historicalvolatilities. But we would encounter difficulties with this approach: From a practical point of

view, there are often not enough liquid traded options for each underlying. This is particularlytrue if one wants to calculate implied correlations. Conceptually, even if we could obtainsufficient implied volatility data (from traded options), we still would have to assess the riskassociated with the (traded) implied volatility itself (vega risk) i.e. the (historical) volatilityof the (implied) volatility.

3All markets in which the bank participates, of course.4 Usually the cash price for a standardised volume in the contract (e.g., price per share for stocks), but in some

markets, other parameters are more customary (e.g., clean prices for bonds, or fixed leg rates for swaps, or implied

Black 76 volatilities for swaptions).


11/62

12

So in order to model future profit/loss distributions for Risk Control, real world time seriesmust be considered.5 This adds the historical dimension to market data, and to all financial

instruments, so that interest rate curves become two-dimensional objects, implied volatilitysurfaces three-dimensional, etc.

1.3 Bloomberg, Reuters, RiskMetrics et al.

Among the companies that independently collect market data and sell it to financial institutionsare Bloomberg, Reuters, Datastream, Telerate/Bridge, Telekurs, ISMA, DRI, Olsen, and Risk-Metrics. The first two are the leading providers and cover market data in general, whereas mostof the others have particular strengths in certain areas. In particular, the provider RiskMetricshas specialised in empirical distribution parameters (historical volatilities and correlations).

In principle, a bank could purchase todays general market data from one provider, plus

RiskMetrics volatilities and correlations, and assess its risk via, say, a Monte Carlo simulation.This is indeed what several small banks are doing.

Many larger banks however feel they have to produce their own market data, for severalreasons: The first is vendor independencethrough redundancy, so they collect, say, Bloombergand Reuters (and ISMA, etc.; possibly supplemented by bank-internal sources) data in acentralised data base. The second is data quality, so they apply data filtering, cleansing andcompletion processes described in Section 1.2 to the data; this requires technology (automaticroutines) and staff (manual interaction). The third incentive for internal centralised market datais method transparency, since now they can specify (and change, if so required) their own driftand covariance estimators (length of input time series, equally or exponentially weighted, etc.),rather than rely on the ready-made numbers provided by RiskMetrics.

6

An argument which is also often heard in German banks is that they want to put more weight onrisk factors which are specific to the German market, and which in their opinion are notsufficiently catered for by American or British based data vendors.

1.4 Market data completion

As has been outlined before, it is imperative to obtain a complete market data set for thepurpose of Risk Control. But because of market holidays, technical failures and networkingproblems, periods with low liquidity in certain markets,7 or other problems, the data setpurchased from vendors as described in Section 1.3 will, in general, not be complete.

The first line of defence here is, of course, vendor redundancy (mentioned in Section 1.3 aswell). But this will not work in every case. Apart from the notorious market holidays, considerthe much more common case of an incomplete interest rate curve: It does not make sense to

5This is regardless of whether one uses a full historical simulation, or only a set of historically estimated statistical

parameters, such as in the delta-normal model, or the variance-covariance method, or the Monte Carlo simulation.6This holds true for J.P. Morgan themselves [13]: RiskMetrics is based on, but differs significantly from, the risk

measurement methodology developed by J.P. Morgan for the measurement, management, and control of market risks

in its trading, arbitrage, and own investment account activities.7Or even too much liquidity: in stressful times often close to the end of a month or a year, brokers mainly work on

the phone for their primary clients, the traders. They do not always have enough resources to update their Reuters (or

other) pages in time, where they publish so-called indicative market values. These pages are independently

downloaded and processed by their secondary clients, the controllers. Unfortunately, market data for controlling is

most important at precisely these times the end of a month or a year, or generally at times of big market moves.


12/62

13

replace a single grid point of a curve delivered by, say, Olsen, with a value delivered by, say,Bloomberg, since it may distort the overall shape of the curve because of different general

levels of the Bloomberg and Olsen curves (due to different contributing brokers, or differentcurve construction and interpolation methods). So the market data set will still be incomplete.

Therefore, methods are needed which will fit missing values into an incomplete market data setin a way that affects the delivered information as little as possible. Several such methods will bedescribed in the following sections, as follows:

structural interpolation (Chapter 2),

previous-day extrapolation (Chapter 3),

conditional expectation (Chapter 4),

expectation maximisation (the EM algorithm of Dempster et. al., Chapter 5),

principal component analysis (PCA, Chapter 6).

Some of the above methods have been implemented in the market data projects mentioned inSection 1.1.Keep in mind, however, that a bank as a competitive institution cannot only strivefor precision, but also has to control the cost of its IT projects. Therefore, if a simpler methodunder-performs a more sophisticated one only insignificantly, the former will still be preferredto the latter. The weights associated with these possibly conflicting goals of numericalexcellence and cost control depend on the business priorities of each institution, and the actualchoice can only be made individually.

1.5 Market data validation

Apart from missing values, a second problem affects data quality, namely the so-called outliers.These are values which are incorrectly delivered by a data vendor for various reasons, includingtechnical problems (network problems, incorrect data mappings, ), conceptual problems(wrong or unsuitable mathematical techniques, wrong reference data, sudden changes of rawdata contributors, ), or human error (manual input).8

The redundancy principle mentioned in Section 1.3 is of some help here, although its merits arelimited, due to one problematic feature of many errors appearing in practice namely, theircommon source: Often the faulty value stems directly from the contributing exchange or broker,and many data vendors will forward it to the client.

So we need a data validation and cleansing process. Such a process requires a certain amount of

human judgement and manual interaction, but it can be usefully supported by coded and fullyautomated outlier filtering routines which inspect all delivered data and flag certain values assuspect values.

Such filtering routines usually compare a value as it is delivered with some estimatedexpectation of what it should be.9 They will mark the delivered value as suspect, if itdeviates from its expected value by more than a specified threshold. The expected value ofa delivery can be obtained by first treating it as missing, and then applying any data

8The infamous decimal point error is in fact less frequent than is generally assumed. However, many errors seem

indeed to be caused manually; e.g. Eurodollar futures seem to be often precisely 100 basis points off, as suggested by

the following series: 96.625%, 96.750%, 96.875%, 96.125%, 97.000%.9The term expectation is used in a broad colloquial sense here, and not (yet) in the strict sense of mathematical

statistics.


13/62

14

completion method to estimate it. We see that the problem of data validation, or at least thepart which allows automation, is in fact very similar to the problem of data completion as

described in Section 1.4. The suspect threshold can either be a fixed value, or be madeflexible dependent on the delivered data as well. In the latter case, data validation requires theadditional estimation of an error bar, so it goes somewhat beyond data completion.

10

After a value has been classified as an outlier, it must be deleted, and replaced by a correctedvalue. This again leads to the problem of data completion.

We may conclude that the problems of data filtering, data validation, data correction, and datacompletion are closely related. For the sake of this thesis, it is sufficient if we restrict ourattention to the problem of data completion (although there is certainly room for furtheranalysis; see Chapter 9 for an outlook).

1.6 Asset Control and Formula Engine

A number of commercial software products support financial market data base systems, such asFame, Xenomorph, Odyssey, and Asset Control. The latter has a particularly broad user base inEurope.

It technically is based on the usual server/client architecture. The server connects staticreference data for financial instruments (such as data vendor codes, instrument ID codes,currency ISO codes, instrument maturity information, etc.; stored in a standard relational data

base such as Oracle or Sybase) with dynamic market data (time series for bid, ask, last trade,and other prices, stored in UNIX flat files for fast access).

One of the key features of Asset Control are its data completion and validation functions.

Conceptually, these are filtering routines as mentioned in Section 1.5. Technically, they are(built-in and/or customisable) Formula Engine programs which can directly be linked to anymarket data time series.11The Formula Engine is a macro language with a C-like syntax.

The completion methods examined in this thesis have all been coded in the Formula Engine.The source code is listed in Appendix A.

The Asset Control software is documented in [1].

10E.g., in the case of the conditional expectation method mentioned in Section 1.4, and described in detail in Chapter

4, one would additionally have to estimate the conditional variance.11The validation functions are triggered whenever a new delivered value is appended to a time series. Each value is

then marked with a so-called status flag. There are flags for normal values, suspect values, estimated

(interpolated, completed) values, manually validated values, etc. The user can flexibly define additional flags.


14/62

15

Chapter 2 Structural interpolation

Many financial calculations12require the interpolation of interest rates along the yield curve. Soit is quite natural to start with this interpolation method for our problem of data completion. Wealso refer to this method as interpolation along the maturity axis, or asstructural interpolation.

Notice that it does not take into account the historical dimension of interest rates (the ratechanges from one trading day to the next), but only the market data as of today (thestructuraldimension).

2.1 Overview of structural interpolation methods

We can choose from various interpolation techniques which all have their advantages and dis-advantages:

Linear interpolation is popular because it is simple to implement. The question remains as towhich representation we want to interpolate the curve: linear on par rates, or on zero rates, or ondiscount factors? Using linear interpolation everywhere leads to inconsistencies and tomagnified interpolation errors. Using linear interpolation for one fixed representation (e.g. zerorates) may lead to discontinuities in other representations (e.g. instantaneous forward rates).

Spline interpolationcan be used to obtain continuous forward rates, but it displays an undesirednon-local property: small value changes in one point of the curve can lead to large interpolatedvalue changes, even in remote intervals of the curve [5].This means that e.g. a small data error

in the 1Y rate may lead to a magnified interpolation error in the [15Y, 20Y] rate interval.Because of this erratic behaviour, splines are seldom used for yield curves in practice.13

A more sophisticated interpolation method used by some trading IT systems (cf. [7])is based onHermite polynomials. It delivers continuous forward rates, but avoids the mentioned problem ofglobal error propagation. This however comes at the price of more complexity, and lessintuition.

The favoured method for many practical financial applications is log-linear interpolation ondiscount factors.14 It is almost as simple as linear interpolation on rates, and hence deliverssufficient interpolation quality at a reasonable (implementation) price.

15 It is also financially

intuitive: Without knowledge of the interval between two known values of the discount factorcurve, we expect the value of money put in the bank to grow exponentially (the Zinseszins

phenomenon).

2.2 Log-linear interpolation on discount factors

Let B(T) denote the discount factor (price of a zero coupon bond) for maturity time T > 0,observed in the market today (time t= 0). Notice thatB(T) has a term structure, so it will be a

12 Such as bootstrapping, i.e. the derivation of yields from traded instrument data, or the transformation of the

yield curve from one representation (as par rates, zero rates, forward rates, discount factors) to another.13Nevertheless, most financial pricing libraries and trading systems offer this interpolation method.14In the case where continuity of forward rates is not a requirement; cf. Footnote 17.15According to [20], it is a popular method for productive use at Deutsche Bank.


15/62

16

curve, if it is given in continuous time T. However, in practice, market discount factors will begiven only at discrete maturity dates T1, , TN. So in order to obtain values between two grid

points Ti and Ti+1, we must interpolate. In accordance with the favoured method ofSection 2.1,namely log-linear interpolation on discount factors, we make the following

Assumption: The log discount factor curve logB(T) is piecewise linear (between neighbouring

grid points on the maturity axis T).

Since the (instantaneous) forward rate curve is the (negative) derivative of the log discountfactor curve (cf. [4],equation 15.2, or [16],equation 1.3),

T

TBTf

=

)(log)( ,

this immediately yields the following

Implication: The instantaneous forward rate curve f

(T

) is piecewise constant (betweenneighbouring grid points on the maturity axis T).16,17

Example: Consider the DEM forward curve of 01/10/1996 on a term grid of T= 0, 1, 2, 3, 4, 5years. Since the distance between two grid points is equal to 1, and the log discount curve islinear in between, the above formula for the forward rates simplifies to f(T) = logB(T) logB(T+1) on the grid (notice thatB(0) = 1 by the definition of discount factors). Between grid

points, the forward curve is constant:

DEM 01/10/1996

term (in years) 0 1 2 3 4 5

discount factor 100.00% 96.83% 92.92% 88.05% 82.63% 77.12%

log discount 0.000% -3.218% -7.341% -12.722% -19.083% -25.986%

forward rate 3.218% 4.123% 5.381% 6.362% 6.903% 7.267%

-27.000%

-22.000%

-17.000%

-12.000%

-7.000%

-2.000%

3.000%

8.000%

0 2 4forward rate

log discount

16More precisely, a left continuous step function; notice that we take right normed derivatives, since we are looking

forward - otherwise we should talk about backward rates .17In particular, forward rates are not necessarily continuous everywhere; cf. Footnote 14.


16/62

17

2.3 Interpolation algorithm

Suppose that in Example 2.2,the forward ratesf

(0),f

(1),f

(3),f

(4),f

(5) have been delivered bysome data provider, but the data point f

(2) is missing. Then B(3) is unknown, as well as B(4)

and B(5). However, we do know the difference between log B(3) and logB(4), from the (non-missing, i.e. delivered) data pointf(3). By the above assumption, we want to interpolate the logdiscount curve linearly, so we put a straight line between log B(2), logB(3), logB(4). Since theslope between logB(3) and logB(4) is already determined, we must have the same slope

between logB(2) and log

B(3). This determines f

(2) := f

(3). If f

(3) is also missing, we use

f(2) :=f(4) instead (with a similar argument).

In other words, we interpolate from right to left from the first delivered data point to themissing data point.18

2.4 Code implementation

The formula f(T) = log

B(T

) log

B(T+1

) of Example 2.2 is implemented

19 in the routine

forward.fe (cf. Appendix A.1), the actual interpolation method is implemented in the

routine backextrapolate.fe(cf. Appendix A.2).

2.5 Problems with structural interpolation

Suppose that in Example 2.2,the data provider did not deliver the data point for the maturityinterval [0, 1). Then the true value 3.218% is replaced by the interpolated value 4.123% fromthe neighbouring maturity interval, resulting in an interpolation error of 91 bp.20Compared with

the previous days (30/09/1996) true value of 3.266%, this creates a single-day jump scenario of86 bp on the short end of the DEM curve, which grossly distorts the risk calculation.

Notice that this is not at all a contrived problem. Particularly at the short and long ends of yieldcurves, data deliveries are sometimes incomplete.

Notice also that this problem does not stem from the fact that we chose the log-linear method inSection 2.1. It simply stems from the fact that we ignore the yield curve history, and onlyconsider todays values. All methods listed in Section 2.1 suffer from this problem.

18More precisely, by Footnote 16, there exists for every maturity Ta small period > 0 such that f(T) = f(T+).

Now iff(T) is unknown, we look for the smallest > 0 such thatf(T+) is known, and we estimatef(T) :=f(T+).19together with a standard formula which first deduces discount factors from swap rates201 bp = 1 basis point = 0.01% = 10 4 .


17/62

18

Chapter 3 Previous-day extrapolation

In order to avoid the problem with yield curve interpolation described in Section 2.5,we needto take the history into account. The simplest solution is to compare todays possiblyincomplete values with yesterdays values which are assumed to be complete (or at least in turncompleted). We discuss variants of this approach.

3.1 Absolute value extrapolation

This data completion method is simple: If todays value is missing, use yesterdays value

instead. In the example of Section 2.5,we estimate the short end of the 01/10/1996 DEM curvewith the delivered value 3.266% for the previous day 30/09/1996. Since the true 01/10/1996value is 3.218%, the estimation error reduces to only 5 bp.21

Of course we can also search for particular problem cases for this method: If the previous daysvalue is not actually delivered, but estimated in turn, the interpolation errors tend to increase.

3.2 Variant: Relative value extrapolation

This is an attempt at simultaneously solving the problems of both structural interpolation(Section 2.5)and historical value extrapolation (Section 3.1). Instead of extrapolating absolutevalues, we extrapolate the relative daily changes from yesterday to today.

How this works is easiest seen by referring again to the example values in Section 2.5:

maturity interval [0, 1) [1, 2)

forward rate 30/09/1996 3.266% 4.211%

forward rate 01/10/1996 (3.218%) 4.123%

multiplicative change22 (-1.470%) -2.090%

extrapolated change -2.090%

completed values 3.198% 4.123%

estimation error 2 bp 0 bp

The values in parentheses are supposed to be missing in the data delivery; italics signify theestimate, computed as follows: The delivered relative change of 2.090% for the maturity

21 The magnitude of the error tolerance depends of course on the type of application. For the value-at-risk

calculations described in Section 1.1, single-digit errors are usually deemed acceptable. A higher level of accuracy is

generally required by profit and loss calculations.22Meaning change = today / yesterday 1. To be consistent with later methods (cf. 4.1), we assume geometrical

Brownian motion (resulting in a log-normal distribution). If we assumed Brownian motion (normal distribution)

instead, one would have to use additive changes (= today yesterday). The error in the example would then be 4 bp,

which is still acceptable.

It is common to assume log-normally distributed forward rates, see e.g. [16], Section 1.5. For an empirical study of

issues related to log-normal distributional assumptions for forward rates and for swap rates, see [17].


18/62

19

interval [1, 2) is extrapolated to the interval [0, 1), and applied to the previous days deliveredvalue 3.266%. The resulting estimated value 3.198% differs from the true value 3.218% only by

an acceptably small error of 2 bp.

3.3 Variant: Absolute value interpolation

A variant interpolation method for the univariate case23 has been published as an earlierthesis [21] for the current Diploma in Mathematical Finance program. Let us quickly present theessential idea:

It proposes to interpolate a missing value f(t) of some day tnot only from the previous tradingday t1, but as a convex combination of several past values, and possibly even future values. So

if the circonflexesymbol ^ denotes estimation, then ( ) = +=d

di iitftwtf )()(: according to

some suitably chosen weights wi(t

) 0 (i = d, d+1, , 0, 1, , d

) such thatwith historical depth d.1)( = =

d

di itw

Since the simple previous value extrapolation of Section 3.1 is included in this method (just setw1(t

) =1 and wi(t) = 0 for all i1), it should give at least as good results. But notice that,

except for that special case, this approach is not compatible with Markov processes, such asBrownian motion. Nevertheless it is claimed in Theorem 4 in Section 2.3.1 of [21] that theabove estimation will be unbiased, if the underlying process is a Brownian motion. In fact thisis only true for the unconditional expectation, but notfor the conditionalexpectation given all

past values, which actually is more relevant in our context.

Apart from this criticism, it should be noted that the thesis [21] also proposes an algorithm for

outlier detection, and subsequent outlier correction via replacement with the estimated value (cf.our remarks in Section 1.5), by constructing optimal (in a certain sense) weights wi(t

). Thethesis also gives estimation formulas for the interpolation error.

We will not pursue this path any further in this thesis. Instead, we will look at the problem froma different angle, by studying multivariate data completion methods in the following chapters.Let us just point out here that we will encounter somewhat similar problems with non-Markoveffects, and with the distinction between conditional and unconditional distributions. We willsort out these problems in Section 5.3 by switching to GARCH processes instead of Markov

processes.


See Appendix A.3 for the routine extrapolation.fewhich implements the absolute value

extrapolation method of Section 3.1.

For the relative value extrapolation method of Section 3.2, there is no separate routine. It isimplemented by applying the routine backextrapolate.fe (see Appendix A.2) to the

daily changes (rather than the absolute values as in Section 2.4). This application is performedby a particular case (switch (method) case "RELATIVE") of the general wrapper

function complete.fe(see Appendix A.7).

23e.g. for a stock index or in our case of yield curves, for an individualforward rate or an individualswap rate


19/62

20

Chapter 4 Conditional expectation

In the preceding sections, we presented heuristic data completion methods for time series ofyield curves: either along the structure (Chapter 2), or along the history (Section 3.1), or somemixture of both (Section 3.2).

We now approach the latter idea more sophisticatedly. We want to use the maximum availableinformation from the delivered data set in both the structural and the historical dimension, i.e.,we want to not only consider neighbouring, but rather allstructure grid points, and we do notwant to restrict the reference period to the previous day only, but extend it farther into the past.According to statistical theory, the best tool

24 to achieve these means is the conditional

expectationoperator.25

Before we apply this concept to our problem, let us fix some notation.

4.1 Joint geometrical Brownian motion

In the risk control practice of many banks, it is customary to start with the RiskMetrics model offinancial returns (cf. [13],Section 4.6, equation 4.54). We adapt this model to our situation. Let

f(t, T) be the (instantaneous) forward rate (process) for maturity term T, observed at calendar

date t. For a given maturity grid T1, , Tm, we write fi(t) = f(t, Ti

). We assume jointgeometrical Brownian motion without drift forf1, ,fm,

d(logf

i

) = idw

i,

where w1, , wmare standard Wiener processes with

dwi (t) ~ N(0, dt),

[ ]

==

,if,0

,if,)(),(Cov

tu

tudtudwtdw

ij

ji

for some i> 0, ij[-1, 1], for all i,j= 1, , m.

Remarks:

As usual, the covariance of the (log) forward rates then is ij dt := Cov[dlogfi

(t),

d

log

fj

(t)] = ijijdt; notice that in particular, ii= i2

. Let us clarify the somewhat sloppy formulation without drift employed above: In

accordance with RiskMetrics, we more precisely assume that the drift of thearithmetical Brownian motion (logfi

) is zero. Then Its Lemma (cf. [4],Section 3.4.2,or [10],Section 10.6) implies dfi

/

fi

= i2dt+ idwi, and we obtain a non-zero drift

i2/

2 for the geometrical Brownian motionfiitself.

In view of option pricing applications, there is an additional drift aspect: in the Libormarket model (cf. [2] and [12]), or more generally in the Heath-Jarrow-Mortonframework (cf. [5] and [9]), the assumption of zero drift for one particular forward rate

24The term best is used in the sense of sufficiency. See [15] for details.25For a short introduction to conditional expectation, see my homework essay [19] for the current M.Sc. course.


20/62

21

imposes the risk-neutral martingale measure for this particular rate on all otherforward rates, which results in non-zero drifts for the other rates (see also [11]). Notice

however that we work in the so-called real world measure, and not in any risk-neutralone. So let us emphasize that the assumption of zero drift everywhere is perfectly goodfor data completion purposes. For an empirical justification, see [13],Section 5.3.1.1.

We allow correlation ij between grid points i, j {1, , m}, but we do not allowautocorrelation.26 This seems in contrast to the RiskMetrics model, which explicitlyassumes autocorrelation in [13], Section 4.6. But RiskMetrics is not stubborn here,since it also provides evidence that returns are not autocorrelated ([13],Section 4.3.2.3), although not statistically independent, i.e. the underlying process isnon-Markov. However, for the normal distribution, which is assumed in [13],equation 4.54 as well, it is well-known that zero correlation is equivalent toindependence.27 Later on, for the maximum likelihood estimation of the conditional

expectation (cf. [13], Section 8.2.2), they work under the assumption of statisticalindependence between time periods. We will do just that, and drop autocorrelation forthe moment.28

Our assumption of constant distribution parameters i, ij(i,j= 1, , m) is a restrictionof the RiskMetrics model.29

4.2 Covariance estimation

For any particular realisation30

of the market, and for any grid point i= 1, , m, wedenote byfi

the path followed by the forward ratefifor the maturity term Ti. Given the seriesof realised values

fi(t),fi

(t t),fi(t 2t), ,fi

(t dt)

of this path at historical calendar dates t, t t, t 2t, , t dt(for some historical depth d,and equidistant time steps t), we first construct the series of its log-changes

ri (s) := (logfi

)(s) := logfi(s) logfi

(s t), s= t, t t, , t (d1)t.

For i, j= 1, , m, we then estimate the covariance parameter ijof 4.1by the exponentiallyweighted moving average (EWMA) of the products of log-change series iandj,

( ) (

=

=

1

01

11:)(

d

k

ji

k

dijtktrtktr

tt

)

for some decay factor (0, 1], where in the limit case = 1 of the equally weighted movingaverage, the singular term (1)/(1d) is replaced by its limit value 1/d. Our formulacorresponds to the formulae given by RiskMetrics in [13], Section 5.2.1, Table 5.1. We have,

26N.B. Autocorrelation is implicitly already excluded by our notation. Recall that all Wiener processes are Markov,

and thus changes over time are independent.27To be fair, one has to add that leptokurtotic distributions are indeed discussed in [13], Section 4.5, and the choice

of the normal distribution is only made for the sake of simplicity and analytical tractability.28We will extend our model in Section 5.3 further below, such that these seemingly inconsistent statements will be

reconciled.29We will relax this restriction later on, in Section 5.3. Cf. Footnote 28.30In the language of random variables. For a short introduction to random variables, see my homework essay [18]

for the current M.Sc. course.


21/62

22

however, corrected the weight parameters (cf. ibid, equations 5.1 and 5.2, and notice that

=

=

1

110

dd

k

k, cf. [6], 1 Satz 1).

Then ij is an unbiased estimator for the covariance ij. This necessarily implies bias for

volatility and correlation, since both are non-linear functions of the covariance (due to Jensensinequality for the expectation operator, cf. [8],Section 5.6, Exercise 15).

We still have to choose the decay factor. Under our assumption of constant processparameters ij, the choice = 1 yields the best linear unbiased estimator (BLUE), and themaximum likelihood estimator (ML) at the same time (but notice that even then, the

estimation )( tij will depend on calendar time t). However, our framework works also for

mildly time varying distributional parameters ij. If we relax our assumptions to this moregeneral case, it may be more favourable to choose a value < 1, where recent data carry a

higher weight than past ones. Empirical studies by RiskMetrics evince that for the case t= 1(trading) day, the value

= 0.94

yields optimal results (cf. [13],Section 5.3.2.2, and Table 5.9). Since we are indeed concernedwith daily market data (cf. Section 1.2), we will henceforth use this value.31

4.3 Conditional expectation for normal random variables

Let a, b be positive integers, and let X: a, Y: b be normally distributedvector-valued32centred33random variables with covariance matrices V

a aand W b b,

respectively. Suppose that the joint distribution of (X, Y

)T

is normal with covariance matrix

=

WU

UVT

m m,

where m= a + b, and U= Cov [X, Y] = E [XYT] = Cov [ Y,X ]T= U Tm m. Letxabe a realisation ofX. Then the conditional expectationof Y, given thatX=x, is given by

E[ Y|X=x] = UT

V 1

x.

For a derivation of this result, see [19],Proposition 5.3.

4.4 Estimation algorithm

Fix the notation of 4.2, and set t = 1, where (calendar) time is measured in (trading) days.Suppose that we have a complete historically realised data matrix

(fi(t k) : i= 1, , m, k= 1, , d),

31See Section 5.3 for the theoretical framework motivated by these empirical results. See also Footnote 29.32As for notation, we use column vectors. Transposed vectors and matrices are denoted by the capital letter T in the

exponent.33with zero mean, i.e. null unconditional expectation vector


22/62

23

and estimate the covariance matrix as in 4.2by setting

:= ()1(

tij

: i,j = 1, , m).

Now consider an incompletely delivered data vector (f1(t), , fm

(t)) for todays forwardcurve. By applying a permutation on {1, , m} if necessary, we may assume w.l.o.g. that

f1(t

), ,fa

(t) are known values, and thatfa+1

(t), ,fm

(t) are missing. In the notation of

Section 4.3,this can be written as X:= ( f1(t), ,fa

(t)),x:= (f1(t), ,fa

(t)), Y:= (fa+1(t),

,fm(t)), and the missing valuesy:= (fa+1

(t), ,fm(t)) can then be estimated by

[ ]xXYy ==

|E: .


The main work, namely the calculation of the conditional expectation, is performed by theroutine conditionaldistribution.fe, which is listed in Appendix A.5. (It actually

calculates not only the conditional expectation, but the conditional covariance matrix as well;this is only for the sake of completeness of the code, and it is not used here.

34) It assumes that

all parameters are already ordered such that delivered values come first, followed by themissing values. If this is not the case, the calling routine must apply a suitable permutation.Such a calling routine is included in one particular case ( switch (method) case "CE")

of the general wrapper function complete.fe(cf. Appendix A.7), which actually starts by

estimating the (unconditional historical) distribution parameters with the help of the subroutinecovarmatrix.fe(listed in Appendix A.4).

34But it may become useful for the study of the questions listed in the outlook Chapter 9.


23/62

24

Chapter 5 Expectation Maximisation

5.1 Maximum likelihood from incomplete data via the EM algorithm

In [13],Section 8.2 and not only there, the expectation maximisationalgorithm is presentedas the state of the art data completion algorithm. So we must also include it in our analysishere. Unfortunately, the exposition in [13] is rather difficult to read; let us instead refer to themore general, but also much more clearly written original paper [3] of Dempster et. al.:

This paper presents a general approach to iterative computation of maximum-

likelihood estimates when the observations can be viewed as incomplete data.[] Each iteration of the EM algorithm involves two steps which we call theexpectation step (E-step) and the maximization step (M-step). [] We nowpresent a simple characterization of the EM algorithm which can usually beapplied when []

35 holds. Suppose that

(p) denotes the current value of

afterpcycles of the algorithm. The next cycle can be described in two steps, asfollows:

E-step: Estimate the [missing values] by finding y(p)

= E [ y|x, (p)

].

M-step: Determine (p+1)

as the [] maximum likelihood36

estimator of , whichdepends on (x, y) only through y.

What now, one may ask, is the difference to the conditional expectation described in Chapter 4?Certainly not the E-step, which is just another application of Section 4.3.The tiny differencelies with the M-step, where the estimator for is now given as

(p+1) = ( )( tij : i,j = 1, , m).

Notice that here we use the covariance estimator at time t, rather than at time t1 as inSection 4.4.This means that the completed (in the E-step) data vector for todays (originallyincomplete) data delivery is included in the estimation (in the M-step) of the distributional

parameters which in turn govern the estimation of the completed data vector (in the next E-step). This can obviously only be achieved by an iterative approach.

5.2 Conditional expectation as part of expectation maximisation

Philosophically, one may argue about which makes more sense; the inclusion or the exclusionof missing (respectively, to be completed) values in the estimation of the distribution

parameters which govern the completion. On a numerical level however, in anticipation of ourempirical study in Chapter 7,the results are virtually indistinguishable. It is in fact very difficult

35The condition mentioned here is that we work in the framework of a regular exponential family with parameter

vector , complete data vector x, sufficient statistics t(x), and observed data y. This condition is indeed satisfiedhere: We work in the framework of a normal distribution family (instead of the more general exponential family)

with covariance matrix (instead of the more general parameter ), complete data vector (x,y) (instead ofxin theoriginal text), missing data y = projection(x, y) (instead of the more general form t(x) in the original text), and

observed datax(instead of yin the original text). To avoid confusion, I have adjusted the notation in the quotation

accordingly.36N.B. maximum likelihood fory=y(p)to happen


24/62

25

to construct a pathological example where the EM algorithm does not converge after a singlecycle (and thus degenerates into a simple conditional expectation estimation as in Chapter 4).

Finally, with an eye on implementation, it becomes obvious that conditional expectation is onlyone part of expectation maximisation, namely the first E-step; one saves the coding effort 37forthe M-step, the iterative loop, and the stop criterion.

This sounds as if EM would not make much sense. But remember that we have a very specialcase of data here, namely time series data. In Chapter 4,we exploited this calendar structure inorder to construct an algorithm which is simpler, but equally useful (namely single-stepconditional expectation). However, the general EM algorithm works also for chronologicallyunordered data sets, and so it is not just more complex, but in general also more powerful.

5.3 Convergence of the EM algorithm in a GARCH model

After what has been said in the preceding Section 5.2 about the virtual equality of results of theEM algorithm and of simple conditional expectation, it seems unnecessary to discuss theoreticalconvergence properties of the EM algorithm. Nevertheless, some minor theoreticalinconsistencies seem to have leaked into the preceding exposition. By this, I do not mean actuallogical contradictions, but rather some at first sight unnatural choices in our process model.This section is devoted to clarify these issues, and to show that the choices made are, at asecond glance, indeed practicable.

First let me point out what has been impure in our process model so far. In order to be able toprove theoretical convergence properties for the EM algorithm, maximum likelihood (ML)estimation of the process parameters ijis required (cf. Section 5.1 and [3]). Under our processassumption of independent process innovations

(log

fi

)(t), the ML estimator is given by = 1,

as has been pointed out in Section 4.2.But we have chosen < 1; more precisely, the industrystandard = 0.94 as introduced by RiskMetrics. So how does this fit together?

Let us rewrite the covariance estimator of Section 4.2,in order to embark on the justification ofour Ansatz. As usual, let us assume t= 1 (trading day), and consider for simplicity only the

case i=j, i.e. only the (univariate) variance estimator for some yield curve grid point i{1, ,m}:

( )

( ) ( ) ( )

( ) ( )

( )22

21

0

22

1

0

2

1

1)1(

1

1

)1(1

1

1

1)(

dtrttr

dtrktrtr

ktrt

id

d

iiid

i

dd

k

i

k

id

d

k

i

k

dii

+

=

++

=

=

=

=

If we compare this with Section 31.1.2 of [4], we find that the above equation describes aGARCH(p, q) process with volatility forecast

( )==

++++=+

q

k

ik

p

h

l ktrahtbat1

2

1

2

0

2 1)1()1( ,

37And also for productive use in a banks internal IT environment the extensive software testing effort.


25/62

26

where

)()1( 2 ttii

=+ ,

p= 1,q= d+1,a0= 0,a1= (1)

/

(1d),ad+1=

d(1)/

(1

d),b1= .

Notice that we have to shift the volatility forecast by 1 day in order to make our approachcompatible with the GARCH model. This introduces the mildly time varying distributional

parameters, as quoted from 4.2.

Notice that a GARCH process cannot be a Markov process, since the process innovations no

longer are independent.But also notice that we have nowhere actually used the assumption ofindependent process innovations, neither for the conditional expectation method, nor for theEM method. In both cases, two distinct steps have to be performed: Firstly, the estimation of the

process parameters with an estimation formula that may (< 1) or may not (= 1) implicitlyassume some form of statistical dependence (as outlined above). Secondly, calculation of theconditional expectation for one grid point of the yield curve, given the already determined

process parameters, and depending on the otheryield curve grid values for the same tradingday. In other words, we are not conditioning upon past values in this second step. This maysound a little complicated, but it merely boils down to proper accounting, or in our case: a cleardistinction between the maturity dimension of the yield curve (along the curve structure), andits historical dimension (along the time series).

How do we now obtain a maximum likelihood estimator? In our special version of the GARCHmodel, the underlying process parameter is no longer the covariance ij, but the decay factor .

Now if we are given the ML estimator for , then Theorem 5.1.1 of [22] shows that the MLestimator for ij is given by the formula in Section 4.2. However, since the (unconditional)distribution for the GARCH process is not known analytically (in the sense that we can write

down an analytical Lebesgue density function, cf. [18]), the ML value must be determined

numerically. This is just what RiskMetrics has done for many financial time series, and it found

that is always reasonably close to 0.94. So for all our practical purposes, we can rest assured

that the formula in Section 4.2 with = 0.94 is a very close approximation of the ML estimatorfor the covariance ijin our GARCH model.

This refines the theoretical foundation for our heuristic study. It should also illuminate the not

very lucid exposition of the EWMA and the EM algorithm in [13].

5.4 Completion algorithm

Let us now continue with the practical aspects of our study, and present the adaptation of theEM algorithm to our situation.

For this, we use a similar framework as described in Section 4.4.In particular, we start the EMalgorithm with the initial parameter estimation (which has been left unspecified in Section 5.1)

(0) := ( )1( tij : i,j = 1, , m).


26/62

27

We then alternately calculate y(p) and (p) as described in Section5.1, until eithery(p)-y(p-1)2(numerical stop criterion), orp= 10 (maximum number of iterations).


This is very similar to the coding of the conditional expectation (cf. 4.5). The difference is theadditional iteration loop in the expectation maximisation case (switch (method) case

"EM") of the general wrapper function complete.fe(Appendix A.7). The stop criterion is

defined by the threshold = 1010= 0.01 (bp)2, which suffices for our purposes.


27/62

28

Chapter 6 Principal Component Analysis

6.1 Shift, twist, and hump

Principal component analysis is a general technique. It is often adapted to the special task ofpricing exotic interest rate options (cf. Chapter 3 of [16]). In most applications, it allows thefinancial analyst to think of the interest rate curve in more abstract, geometric terms such ascurve level, slope, or curvature, rather than the concrete maturity buckets represented by thecurve grid points. Let us look at this method in more detail.

We start with the forward yield curve processf

defined in 4.1.Its covariance matrix := ( ij:i, j = 1, , m ) is positive semi-definite. In particular, it is symmetric, so there exists anorthogonal matrix Qm m such that

QT

Q=:D=: diag(1, , m)

is a diagonal matrix. The numbers 1, , m0 on the diagonal ofDare the (non-negative38)eigenvalues of , and the columns q1, , qmof Qare the associated eigenvectors, which wecall theprincipal axes(of the forward yield curve).

Consider the log-change history (ri(s) : s = t, t-1, , t-d) of a realisation of f on some

maturity grid point i{1, , m}, as given in the discrete setting of 4.2,with integer-valuedtime step t= 1 (trading day), and integer-valued time parameters. We write r(s) := (r1

(s), ,

rm

(s))

T

in (column) vector notation with respect to the canonical basis

39

(corresponding to thematurity grid points) of the state space m.

Instead of this canonical representation, we now want to study this vector valued time serieswith respect to the principal axes. For this, we have to perform a rotation 40of the coordinatesystem from the canonical axes to the principal axes, described by the variable transformation

mm, raz:= QTr.

We writez= (z1, ,zm)T, and we call these new variables z1, ,zmtheprincipal components

(of the yield curve). Their covariance matrix is D, and therefore they are uncorrelated, withrespective variances 1, , m.

By permuting the eigenvectors (principal axes) q1, , qmif necessary, we may w.l.o.g. assume

that the eigenvalues are ordered via 1 m0. It has been established by many empiricalstudies (e.g., see Figure 3.6 of [16],or Abbildung 33.1 of [4]) that under this assumption, thefirst principal component z1can intuitively be interpreted as (changes in) the average level ofthe yield curve, the second principal component z2 as (changes in) its average slope, and thethird principal component z3 as (changes in) its average curvature.

41 They are colloquially

38because is positive semi-definite39The canonical basis of mconsists of the orthonormal vectors e1 = (1,0,,0), e2 = (0,1,0,,0), , em = (0,,0,1).40And/or a reflection, but by replacing q1with q1if necessary, we can w.l.o.g. assume that det(Q) = 1. This is only

mentioned for the sake of intuition and quite unimportant for the following derivation.41There is of course no a priorireason why the most important components are level, slope, and curvature (in that

order). The fact that they usually turn out this way is incidental. If they did not, principal component analysis would

still work. Indeed, it is designed to determine the largest contributions to the risk.


28/62

29

referred to as shift, twist, and hump (of the yield curve), respectively. They almostcomprehensively describe the dynamics of the yield curve, since their combined variance (1+

2 + 3) makes up most of the total variance (1 + + m) typically around 98-99%(cf. [4],Section 33.2).

6.2 Extrapolating principal components

We now want to adapt principal component analysis to Risk Control by constructing a datacompletion algorithm based on principal component analysis. The idea is to extrapolate

principal componentsziinstead of original rates ri(i= 1, , m). Since the principal componentsare uncorrelated by definition, each of them can be estimated independently of the others.Estimation here simply means replacement of the variable with its expectation, which is zero inour model.

There is one difficulty with this approach: Missing values in the rates ri do not correspondbijectively to missing values in the principal componentszi, since the transformation z= Q

Tr

has smeared the holes of rall overz, colloquially speaking. So we must choose which of thezi we want to estimate, i.e. in our case, set to zero. We want to make our choice in such amanner that we leave the big principal components (shift, twist, hump, etc.) unchanged, andset only the principal components with the smallest influence on curve movements to zero, i.e.those zicorresponding to the smallest i. This makes sense, since replacing a variable ziby itsexpectation E[zi] means forcing its variance ito be zero, so picking the smallest eigenvalues ikeeps a maximum of market move information provided by the data vendors.

Let us examine this approach in detail: As in Section 4.4,let us partition rT= (r1, , ra | ra+1,, rm) = (x|y) into the aknown values and the b:= m amissing values. If we partition Q

T

accordingly, we can write

==

y

x

SK

HRrQz T .

Sincexis known, this equation system gives us aconditions. We need badditional conditions

for the estimator y~ ofy. For this, we set the last bentries of zto zero, i.e. we estimate z~ T=

(z, 0, , 0), where z= (z1, ,za)Tis the appropriate truncation ofz. This gives

=

0~z

y

x

SK

HR,

and it follows from the lower part of this equation system that the estimation of yis given by

y~ = S1Kx.

At this point, we do not know if Sis invertible. However, since Qhas full rank, it follows that

the matrix m b has rank b. By permuting its rows if necessary, we may even assume

that the last b rows are linearly independent, i.e. that Sb b is in fact invertible. Noticehowever that we have just permuted the rows of QT, i.e. the columns of Q, and so we can nolonger uphold the assumption of Section 6.1 that the eigenvectors q1, , qmare ordered in sucha way that the eigenvalues satisfy 1 m0. We must relax this a little, and can onlyassume that among all permutations of q1, , qmwhich result in an invertible submatrix S, we

S

H


29/62

30

have picked one such that a+1+ + mis minimal (since there are exactly m! permutations on{1, , m}, in particular only finitely many, the minimum exists and is actually assumed).

6.3 Completion algorithm

The preceding section already fully describes the algorithm.


The method is implemented in the routine pcaestimation.felisted in Appendix A.6.

There was, however, the following problem with the implementation. The calculation ofeigenvalues and eigenvectors is performed by Formula Engine internal functions provided inthe Asset Control software package (cf. [1]). During the empirical study (Chapter 7), it wasobserved that these functions do not work reliably in the present software release (Version 4.1).The software produced erratic results for near-zero matrices represented as the data type realmatrix. The software vendor recommended the use of data type lists of lists instead. This isindeed what I have done for all the other completion methods (without encountering any

problems), but for the PCA method, the data type real matrix is, unfortunately, essential. Inprinciple, it is possible to patch the system with a C routine which adds an establishednumerical method for eigenvalue calculation (e.g. from[14]), but this involves both adisproportionate amount of interface coding and tampering with a clients software.

The practical effect of these problems was that in the empirical study the PCA methodgenerated huge outliers, e.g., a standard deviation of 10 78was produced in one typical example.Since these PCA estimates are so inaccurate, it is inappropriate to report them (cf. Chapter 7).

I can, however, report an interesting theoretical result for this method in the following section:

6.5 Comparison with conditional expectation

Consider the estimation formulas

y~ = S1Kx and = UT V 1x ( = E [ Y|X=x] )y

of Sections 6.26.3 and 4.34.4,respectively, where

=

y

xr , , Q , QTQ=D= diag(1, , m).

=

WU

UVT

=

SK

HRT

Formally, the two estimation formulas for y look very similar. We want to examine the

circumstances under which they coincide. In other words: when is y~ = ?y

Let us put ourselves in the position of an interest rate option trader, and let us assume that thedynamics of the yield curve only depend on the first two or three, at most a < m principalcomponents.

Then a+1= = m= 0, and therefore the relationship QT=DQTcan be rewritten in block

matrix form as


30/62

31

=

SK

HRD

WU

UV

SK

HRT

00

0,

where D

y

= diag(1, , a) is the appropriately truncated submatrix ofD. The lower left handcorner of this matrix equation yieldsKV+ SU

T= 0

R+ 0

K= 0. But then S

1K+ U

TV

1= 0,

and so ~ = .y

Therefore, unsurprisingly, under the usual practical assumptions and approximations, datacompletion via principal component analysis yields the same results as taking direct conditionalexpectations. Principal component analysis is however more difficult to implement (cf. 6.4).


31/62

32

Chapter 7 Method comparison

We now want to compare the prediction quality of the data completion methods described in thepreceding chapters.

7.1 Example curves of September 2001

Let us pick two trading days, 10/09/2001 and 20/09/2001,42for the US dollar swap43curve withterms 1Y10Y. We simulate an incomplete data delivery by deleting the 8Y10Y rates. Thesegaps are then filled again by our data completion methods. The results are as follows:

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y 10Y

Bloomberg USSA1 USSW2 USSW3 USSW4 USSW5 USSW6 USSW7 USSW8 USSW9 USSW10

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.475 5.556 5.635

simulation 3.460 4.027 4.503 4.831 5.067 5.241 5.372 n/a n/a n/a

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.224 5.342 5.453

simulation 2.710 3.484 4.032 4.397 4.661 4.880 5.081 n/a n/a n/a

STRUCT structural interpolation

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.372 5.372 5.372

error (bp) -10.3 -18.4 -26.3

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.081 5.081 5.081

error (bp) -14.3 -26.1 -37.2

EXTRA previous day absolute value extrapolation

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.430 5.504 5.579

error (bp) -4.5 -5.2 -5.6

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.222 5.339 5.450

error (bp) -0.2 -0.3 -0.3

RELATIV

E

previous day relative value extrapolation

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.469 5.543 5.619

error (bp) -0.6 -1.3 -1.6

42Notice how the trading days have been picked before and after September 11, 2001.43In Chapter 2Chapter 6, we have considered forwardcurves, but here we examine the swapcurve. However, this

does not make much difference for the sake of this illustrative example. Later on, we will compare the impact of all

data completion methods on several yield curves in both forward and swapquotation. See also Footnote 45.


32/62

33

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y 10Y

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.205 5.321 5.432

error (bp) -1.9 -2.1 -2.1

CE conditional expectation

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.470 5.547 5.623

error (bp) -0.5 -0.9 -1.2

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.223 5.347 5.462

error (bp) -0.1 0.5 0.9

EM expectation maximisation

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.470 5.547 5.623error (bp) -0.5 -0.9 -1.2

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.223 5.347 5.462

error (bp) -0.1 0.5 0.9

PCA principal component analysis

10/09/2001 3.460 4.027 4.503 4.831 5.067 5.241 5.372 5.470 5.547 5.622

error (bp) -0.5 -0.9 -1.3

20/09/2001 2.710 3.484 4.032 4.397 4.661 4.880 5.081 5.214 5.344 5.463

error (bp) -1.0 0.2 1.0

It is perhaps somewhat surprising that September 11 seems not to have any impact on theprediction quality of any of the methods, but apart from that, the estimation comparison clearlyconfirms the conjectures of our theoretical discussions in Chapter 2Chapter 6: Structuralinterpolation (respectively extrapolation) produces unacceptably large errors. Previous-dayextrapolation performs much better, but we cannot decide between the absolute and therelative alternative. Conditional expectation shows the best results. Expectation maximisationis identical to conditional expectation. The results of the PCA method are also quite good here(in spite of the implementation problems mentioned in 6.4).

Notice that these have been only exemplary results for two trading days in a single currencywith missing values at the long end of the yield curve. In order to make a more profoundstatement about the quality of the methods, we must examine a larger data set in the followingsections.

7.2 Empirical and simulated data

Additionally to US$ (cf. Section 7.1), I have also downloaded from Bloomberg daily swap ratetime series for Deutsche Mark (tickers DMSW1, DMSW2, , DMSW10) and Pound Sterling (tickers


33/62

34

BPSW1, BPSW2, , BPSW10). The terms are 1Y, 2Y, , 10Y, and the histories are about five

years long, from 1996 to 2001.44

Time series deliveries with missing values have then been simulated for each currency in thefollowing manner:

Load the curve history into memory.

For every trading day in the curve history:

o Loop through the (ten) grid points for 1Y, , 10Y, and delete each value withprobability 1/10 (i.e., on average, one value is deleted per curve).

o Pass the incompleted curve to our six competing data completion methods.

o Collect from each method the completed data vector and compare it with theoriginal (true) values. Store the estimation error for each completed grid

point (rounded to full basis points, with positive/negative sign).

Repeat the above ten times for each day, with different values missing each time.

Collect all estimation errors for all our six completion methods.

Finally, for each method, determine the

o error histogram,

o sample average deviation from true value (AVG),

o sample standard deviation of errors (STDEV)

o maximal overestimation error (MAX),

o maximal underestimation error (MIN).

Notice how this algorithm does not measure errors as absolute (or squared) deviations fromthe true value, but as simple difference (estimated value true value) with positive or negativesign. Therefore, we can distinguish between over- and underestimation, and analyse theempirical error distribution.

7.3 Experimental set-up

All curves have undergone the above procedure in both swap45and forward rate notation (wherethe swap rates have then been converted to forward rates with 1Y-tenors by the Formula Engine

routine forward.fe listed in Appendix A.1). Additionally, the above experiment has beenperformed in a modified way where the completion has been performed on the forward rates,but the estimation error has been measured in par (swap) rate notation.

44The DEM time series of course reflect EUR swap rates since 1998, but the tickers DMSWxhave been continued by

Bloomberg. They have been chosen for the experiment instead of their Euro equivalents because of their longer

histories.45As has been pointed out before in Footnote 43, our data completion methods have been defined onforward curves

in Chapter 2Chapter 6. It is however easy to see that they all carry over topar curves(swap curves) as well with

one exception, namely structural interpolation (Chapter 2). Bearing this in mind, it is nevertheless possible to

numericallyapply this method also to swap curves, and measure the resulting errors.


34/62

35

So altogether six methods, three currencies, and three quotation variants have been examined:

Structural interpolation (STRUCT), previous-day extrapolation (EXTRA), relative

value extrapolation (RELATIVE), conditional expectation (CE), expectationmaximisation (EM), principal component analysis (PCA): see Chapter 2Chapter 6.

Empirical study of DEM yield curve completion: see Section 7.4.

Empirical study of GBP yield curve completion: see Section 7.5.

Empirical study of USD yield curve completion: see Section 7.6.

Completion of forward rates and error measurement (experiment evaluation) in forwardrates: see Sections 7.4.1,7.5.1,7.6.1.

Completion of forward rates and error measurement (experiment evaluation) in parrates: see Sections 7.4.2,7.5.2,7.6.2.

Completion of par rates and error measurement (experiment evaluation) in par rates: seeSections 7.4.3,7.5.3,7.6.3.

The above experiments have been performed automatically by hybrid UNIX shell script andFormula Engine routines.

46

7.4 Completion statistics for the DEM yield curve

7.4.1 Forward completion / forward evaluation

The experiment has simulated 10293 deliveries of randomly missing data, which have beencompleted by our methods. If the estimation errors are measured with positive/negative signs(over-/ underestimation), the following statistical quality parameters are obtained:

(in bp) STRUCT EXTRA RELATIVE CE / EM

AVG 21.2 0.0 0.1 0.0

STDEV 24.8 9.6 14.8 9.0

MAX 257.0 136.0 284.0 115.0

MIN -224.0 -117.0 -220.0 -117.0

Conditional expectation is the best method, although it does produce outliers. The EMalgorithm yields identical results.47 Surprisingly, simple previous-day extrapolation is not

46This is a common technique for programming the Asset Control data base. From a mathematical point of view, not

much insight can be gained from the script source codes, since it is mostly concerned with book-keeping: loading all

subroutines, formatting the curve data, running the loops, passing the data to the mathematical routines, collecting

the returned data, and storing the errors in log files. Therefore I have decided not to include the (lengthy) code listing

in Appendix A.47Separate experiments have been performed for CE and for EM, but the results are always identical. So we will not

list the results in separate columns. This equally remains true for the experiments described in the following

Sections 7.4.2ff. See also our theoretical discussion in Section 5.2.


35/62


36/62

37

In practice however, there is one additional problem which we have ignored so far. The rateshave been originally delivered as swap rates. If the delivered curve is incomplete, we cannot

convert it to forward rates, complete the curve, and then convert it back. We must complete thecurve directly in swap rate notation. This is what we will do now, by applying our methodsdirectly on swap curves (without adjustment). We will find out whether the results are worse orstill acceptable (10366 simulated missing data points):


AVG 12.5 0.0 0.0 0.0

STDEV 12.2 4.3 1.7 1.5

MAX 114.0 21.0 21.0 36.0

MIN -37.0 -22.0 -21.0 -36.0

The results are very similar to completion on the forward rates, evaluated on par rates inSection 7.4.2,which is the natural benchmark method for comparison. But recall that contraryto Section 7.4.2, we do not have an averaging effect here which might reduce our statisticalerror. So we can also compare the results to Section 7.4.1, where both completion andmeasurement have been on forward rates; we find that they are much better. See also thehistograms in Appendix B.1.3.

Therefore, it seems to make sense to apply the successful completion methods previous dayextrapolation (including relative value extrapolation) and conditional expectation (includingEM) directly on the delivered swap rates, and not to change the curve representation from par

to forward notation.We will now examine if these statements carry over to yield curves in other currencies.

7.5 Completion statistics for the GBP yield curve


The statistical experiment for the GBP forward curve completion has been performed on 9587arbitrarily missing data points. The results give a similar picture as for DEM in 7.4.1:


AVG -5.9 0.2 0.3 0.2

STDEV 28.7 14.3 22.4 14.3

MAX 266.0 217.0 484.0 288.0

MIN -351.0 -359.0 -348.0 -342.0

One striking result is the big outlier of 484 bp produced by the relative value extrapolation. Thehistograms in Appendix B.2.1 do not add much to the general picture here.


37/62

38

7.5.2 Forward completion / par evaluation

The forward completion with evaluation in par notation has been performed on 9622 missingvalues, with the following result:


AVG -1.7 0.1 0.1 0.1

STDEV 21.7 3.1 4.8 3.1

MAX 145.0 24.0 102.0 55.0

MIN -143.0 -33.0 -51.0 -29.0

The interpretation is similar to the DEM case in Section 7.4.2.This is also supported by thehistograms in Appendix B.2.2.

7.5.3 Par completion / par evaluation

The experimental run for the par completion / par evaluation case of the GBP swap curve hassimulated 9777 missing values. The results are very close to the DEM case of 7.4.3.(The sameholds true for the GBP histograms in Appendix B.2.3.)


AVG -2.4 0.2 0.0 0.0

STDEV 12.2 5.2 2.5 2.1

MAX 84.0 45.0 42.0 47.0

MIN -103.0 -37.0 -34.0 -24.0


38/62

39

7.6 Completion statistics for the USD yield curve

The experiments for USD confirm the general findings of the preceding Sections 7.47.5 forDEM and GBP. The results are therefore listed in the following without further comment. Thehistograms for the USD curves are displayed in Appendix B.3.


Number of simulated missing data points: 9736.


AVG 10.9 0.0 0.2 0.0

STDEV 26.2 15.5 22.9 11.7

MAX 384.0 180.0 391.0 189.0

MIN -160.0 -386.0 -473.0 -208.0

7.6.2 Forward completion / par evaluation



AVG 8.2 0.1 0.1 0.0

STDEV 20.2 3.2 5.2 2.6

MAX 267.0 42.0 123.0 75.0

MIN -47.0 -39.0 -82.0 -24.0

7.6.3 Par completion / par evaluation



AVG 7.3 0.2 0.0 0.0

STDEV 11.5 5.9 2.6 2.2

MAX 120.0 26.0 45.0 22.0

MIN -37.0 -45.0 -43.0 -42.0


39/62

40

Chapter 8 Conclusion and recommendation

Let us summarize our findings:

Financial institutions collect financial time series data in IT data bases for regulatory (and other)purposes (Sections 1.1-1.3). Incomplete data is a very common problem here (Section 1.4). It isclosely related to the problem of incorrect data (Section 1.5).

Several data completion techniques have been described and discussed from a theoreticalviewpoint in Chapter 2-Chapter 6.

We then have empirically simulated randomly incomplete data deliveries of swap and forward

curves in the currencies Deutsche Mark (respectively Euro), US Dollar, and Pound Sterling (seeAppendix A for the coding of the experiment). After a statistical evaluation of the results(Chapter 7 and Appendix B), the data completion methods in question can be ranked as follows:

1. Conditional expectation is clearly the best completion method.

2. Previous day extrapolation comes second, and it is simpler to implement thanconditional expectation. Based on our (statistical) results, we cannot decide whetherextrapolation of absolute values, or extrapolation of neighbouring changes in the curve

perform better. However, if missing values are not randomly distributed, but aremissing several days in a row for the same grid point of the curve (systematic errors,if you will, cf. the following Chapter 9), then relative value extrapolation willoutperform absolute value extrapolation.

3. The expectation maximisation algorithm in our simulation delivers exactly the sameresults as conditional expectation,48but it is more complex. It is therefore unnecessarilycomplicated to implement in practice.

4. Interpolation along the term structure without regard for the historical rate evolution isunsuitable for historical time series of yield curves.

5. Under common assumptions of mathematical finance, we can show that completion ofprincipal components of curves is equivalent to conditional expectation (Section 6.5).

Under the non-mathematical considerations of simplicity, implementation cost, dailyperformance, and transparent reporting in practice, I recommend the use of the relative valueextrapolation method (Section 3.2). However, if an institution wants to invest a little more

effort, all our results, both theoretical and empirical, suggest the implementation of theconditional expectation method (Chapter 4).

An additional important recommendation is valid for any data completion method: Alwayscomplete market data directly in its natural quotation, i.e., in the form as it is delivered, andnot on some transformed data set. In our case, although some theoretical considerations(discussed in Section 7.4.3) might suggest the use of forward notation, it is neverthelessadvantageous to complete the swap curves received via Bloomberg tickers in their originalswap rate notation.

48 Empi

Documents

Rossmanith - Completion of Market Data