Splines and Finance 1
Penalized Splines andFinancial Market Data
David RuppertOperations Research and Industrial Engineeering
Cornell University
[email protected]://legacy.orie.cornell.edu/~davidr
Splines and Finance 2
Main Themes
• Calibration of financial models is a statistical problem
• Researchers in mathematical finance are experts in probabilitytheory but often are less knowledgeable about statisticalmodeling and data analysis
• Unfortunately, statisticians have, with some notable exceptions,not recognized finance as an important area of application
• Transformation and weighting in regression can improve thecalibration of financial models
• Splines are an effective tool for data analysis and statisticalmodeling
Splines and Finance 3
Overview
• Recent example where a statistician could have helped
• Example of curve fitting – dynamics of interest rates
• Penalized splines
• Two examples:
– Return to interest rate dynamics
– Term structure – estimating the forward rate curve
Splines and Finance 4
Example: Estimation of Default Probabilities
Data:
• ratings: 1 = Aaa (best), . . . , 16 = B3 (worse)
• default frequency: estimate of default probability
– many zero values at best ratings
From recent book on credit risk
Splines and Finance 5
• nonlinear model:
Pr(default|rating) = expβ0 + β1rating• linear/transformation model (in recent textbook):
logPr(default|rating) = β0 + β1rating
– Problem: cannot take logs of default frequencies that are 0
– (Sub-optimal) solution in textbook: throw out theseobservations
Splines and Finance 6
• Transform-both-sides (TBS) model – see Carroll andRuppert (1984, 1988):
Pr(default|rating)α = exp[αβ0 + β1rating]– α chosen by residual plots (or maximum likelihood)
– α = 1/2 works well
– α = 0 ⇒ log transformation
∗ if we x 7→ xα by x 7→ (xα − 1)/α
Splines and Finance 7
0 2 4 6 8 10 12 14 1610
−5
10−4
10−3
10−2
10−1
100
101
102
rating
defa
ult p
roba
bilit
y
dataTEXTBOOKnonlinearTBS
TBS fit compared to others
Data = proportion defaulting
Values at bottom are at log(proportion) = −∞
Splines and Finance 8
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
fitted values
abso
lute
res
idua
l
Nonlinear regression residuals
Splines and Finance 9
−0.6 −0.4 −0.2 0 0.2 0.4 0.60.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
Residual
Pro
babi
lity
Normal Probability Plot
Nonlinear regression residuals
Splines and Finance 10
0 0.5 1 1.5 2 2.5 3 3.5 40
0.05
0.1
0.15
0.2
0.25
fitted values
abso
lute
res
idua
l
TBS residuals
Splines and Finance 11
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.20.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
Residual
Pro
babi
lity
Normal Probability Plot
TBS residuals
Splines and Finance 12
method P rdefault|Aaa % of TEXTBOOK estimate
TEXTBOOK 0.005% 100%
nonlinear 0.002% 40%
TBS 0.0008% 16%
Splines and Finance 13
Comments:
• Suppose sample sizes were large so that all categories had atleast one default
– log transformation would have been applied to all 16 sampleproportions
– but this might have caused outliers and unstable estimates
• Perhaps a logistic regression fit should be compared with theTBS fit.
Splines and Finance 14
0 1 2 3 4 5 6 7 8−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Y
log(
Y) tangent line at Y=5
tangent line at Y=1
Geometry of transformations – variance stabilization
Splines and Finance 15
0 1 2 3 4 5 6 7 8−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Y
log(
Y) tangent line at Y=5
tangent line at Y=1
Geometry of transformations – symmetrization
Splines and Finance 16
1965 1970 1975 1980 1985 1990 1995 2000 20050
2
4
6
8
10
12
14
16
18
year
1−ye
ar T
reas
ury
cons
tant
mat
urity
rat
e
1-Year Treasury Constant Maturity Rate, daily data
Source: Board of Governors of the Federal Reserve System
http://research.stlouisfed.org/fred2/
Splines and Finance 17
1965 1970 1975 1980 1985 1990 1995 2000 2005−1.5
−1
−0.5
0
0.5
1
1.5
year
Dai
ly c
hang
e in
rat
e
∆Rt versus year
Splines and Finance 18
0 2 4 6 8 10 12 14 16 18−1.5
−1
−0.5
0
0.5
1
1.5
rate
Cha
nge
in r
ate
∆Rt versus rate
Splines and Finance 19
Estimating Volatility
Parametric model:
Var(∆Rt) = β0Rβ1t−1
E.g.,
• β1 = 0 (Vasicek, 1977)
• β1 = 1/2 (Cox, Ingersoll, Ross, 1985)
• β1 = 1 (Courtadon, 1982)
• β1 a free parameter (Chan, Karolyi, Longstaff, and Sanders,1992)
Splines and Finance 20
Nonparametric model:
Var(∆Rt) = σ2(Rt−1)
where σ(·) is a smooth function
• will be modeled as a spline
• In these models: no dependence on t
Splines and Finance 21
0 2 4 6 8 10 12 14 16 180
0.2
0.4
0.6
0.8
1
1.2
1.4
rate
Squ
ared
cha
nge
in r
ate
dataspline fitparametric fit
Splines and Finance 22
0 1 2 3 4 5 6 7 8 9 100
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
rate
Squ
ared
cha
nge
in r
ate
dataspline fitparametric fit
Splines and Finance 23
Penalized Splines for Semiparametric Modeling
Underlying philosophy
1. minimalist statistics
• keep it as simple as possible
2. build on classical parametric statistics
3. modular methodology
Splines and Finance 24
Reference
Semiparametric Regression by Ruppert, Wand, and Carroll(2003)
• Lots of examples.
• But most from biostatistics and epidemiology
Splines and Finance 25
Semiparametric regression
Partial linear or partial spline model:
Yi = WTi βW + m(Xi) + εi.
Here m(·) is a smooth function. We will model it as a spline with atruncated polynomial basis:
m(x) = XTi βX + BT(x)b.
XTi = ( Xi · · · Xp
i )
BT(x) = (x− κ1)p+ · · · (x− κK)p
+
The intercept is part of WTi βW .
Splines and Finance 26
Example
m(x) = β1x + b1(x− κ1)+ + · · ·+ bK(x− κK)+
• slope jumps by bk at κk
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 plus fn.derivative
Splines and Finance 27
Fitting interest-rate data with plus functions
0 2 4 6 8 10 12 14 16 180
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
rate
Squ
ared
cha
nge
in r
ate
Splines and Finance 28
Generalization
m(x) = β1x + · · ·+ βpxp + b1(x− κ1)
p+ + · · ·+ bK(x− κK)p
+
• pth derivative jumps by p! bk at κk
• first p− 1 derivatives are continuous
0 0.5 1 1.5 2 2.5 30
0.5
1
1.5
2
2.5
3
3.5
4 plus fn.derivative2nd derivative
Splines and Finance 29
Ordinary Least Squares
0 2 4 6 8 10 12 14 16 180
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
rate
Squ
ared
cha
nge
in r
ate
data2 knots5 knots10 knots
Splines and Finance 30
Penalized least-squares
Minimizen∑
i=1
ω2i
Yi − (WT
i βW + XTi βX + BT(Xi)b)
2
+ λ bTDb.
E.g.,D = I.
ωi = 1/σ(Yi|Wi, Xi)
Splines and Finance 31
Penalized Least Squares – Non-adaptive
0 2 4 6 8 10 12 14 16 180
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
rate
Squ
ared
cha
nge
in r
ate
data2 knots5 knots10 knots
Splines and Finance 32
Ridge Regression
From previous slide:n∑
i=1
ω2i
Y − (WT
i βW + XTi βX + BT(Xi)b)
2
+ λ bTDb.
Let X have row (WTi XT
i BT(Xi) ). Then
βW
βX
b
=
XTΩX + λ blockdiag(0,0, D)−1 XTΩY ,
whereΩ = diag(ω2
1 , . . . , ω2n)
Splines and Finance 33
Penalized LSE is also
• a BLUP in a mixed model
– (βW , βX) is the fixed effect vector
– b is the random effect vector
– λ is a ratio of variance components
• empirical Bayes estimator.
Splines and Finance 34
Selecting λ
1. cross-validation (CV)
2. generalized cross-validation (GCV)
3. ratio of variance components estimated by ML or REML inmixed model framework
4. as in 3., but estimated in a fully Bayesian framework
5. EBBS = empirical bias bandwidth selection
– useful if m′(x) is of primary interest
Splines and Finance 35
Selecting the Knots Locations
1. I use sample-quantiles of X so there are (approximately) anequal number of observations between any pair of consecutiveknots
2. Some prefer equal-spaced knots
1. and 2. give similar results, except in extreme cases.
Splines and Finance 36
Selecting the Number of Knots
0 0.2 0.4 0.6 0.8 1−1
−0.5
0
0.5
1
1.5(a) SpaHet, j = 3, typical data set
y
Truefull−search
5 20 40 80 12095
100
105
110
115
K
rela
tive
MA
SE
(b) MASE comparisons
fixed nknotsmyopicfull−search
1 2 3 4 5 60
50
100
150
number of knots (coded)
freq
uenc
y
0 0.0125 0.0250
0.0125
0.025
ASE − K=5
AS
E −
K=
40
n = 200
Splines and Finance 37
0 0.2 0.4 0.6 0.8 1−0.5
0
0.5(a) SpaHetLS, j = 3, n = 2,000
y
Truefull−search
5 20 40 80 12095
100
105
110
115
K
rela
tive
MA
SE
(b) MASE comparisons
fixed nknotsmyopicfull−search
1 2 3 4 5 60
50
100
150
200
250
number of knots (coded)
freq
uenc
y
0 0.5 1 1.5
x 10−3
0
0.5
1
1.5x 10
−3
ASE − K=5
AS
E −
K=
40
n = 2, 000
Splines and Finance 38
0 5 10 15 20 250
1
2
x 10−4
dffit
(λ)
MS
E
MSE
Bias
Variance
Optimal
n = 10, 000, 20 knots, quadratic spline
Splines and Finance 39
Additive Models
Model:Yi = m1(X1) + · · ·+ mp(Xp) + εi
Basis functions:
XTi,j = ( Xi,j · · · Xp
i,j ) andBTj (x) = (x− κ1,j)
p+ · · · (x− κKj ,j)
p+
Let X have row
(WTi XT
i,1 . . . XTi,p BT
1 (Xi,1) . . . BTp(Xi,p) )
Estimation: MinimizenX
i=1
ω2i
(Y −
WT
i W +
pXj=1
XTi,jX,j +BT
j (Xi,j)bj
!)2
+
pXj=1
λj bTjDjbj .
Splines and Finance 40
Adaptive Penalties
• the penalty λ(·) is allowed to vary with spatial position
• see Ruppert and Carroll (2000), Australian and New ZealandJournal of Statistics
– λ(·) is itself a spline
Minimize:nX
i=1
ω2i
(Y −
WT
i W +
pXj=1
XTi,jX,j +BT
j (Xi,j)bj
!)2
+
pXj=1
bTjDjbj .
whereDj = diag ( λ(κ1,j) · · · λ(κKj ,j) )
Splines and Finance 41
Partial Spline Model
∆Rt = m1(Rt) + m2(t) + σ(Rt, t)εi
Splines and Finance 42
0 5 10 15 20−0.1
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
rate
effe
ct o
f rat
e on
∆(r
ate)
adaptive, no wtnon−adaptive, no wtadaptive, wtnon−adaptive, wt
1970 1980 1990 2000−0.02
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
year
year
effe
ct o
n ex
pect
ed ∆
(rat
e)
adaptive, no wtnon−adaptive, no wtadaptive, wtnon−adaptive, wt
Additive fit to ∆Rt
Splines and Finance 43
0 2 4 6 8 10 12 14 160
2
4
6
8
rate
log 10
(pen
alty
)
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005−4
−2
0
2
4
year
log 10
(pen
alty
)
Penalties for adaptive, weighted fit to ∆Rt
Splines and Finance 44
Partial Spline Model
∆Rt = β1Rt + m2(t) + σ(t, Rt)εi
Output:
• β1
• m2(t) + β1Rt
Corresponds to model with drift:
aθ(t)−Rt
where
a = −β1 and θ(t) = −m2(t)β1
Splines and Finance 45
1965 1970 1975 1980 1985 1990 1995 2000 2005−5
0
5
10
15
20
year
θ(ye
ar)
Splines and Finance 46
0 2 4 6 8 10 12 14 16 18−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
rate
effe
ct o
f rat
e on
∆(r
ate)
semiparnonpar
Splines and Finance 47
Multiplicative Models for Volatility
Var(Yt) = σ20 s2
1(X1) · · · s2p(Xp)
Example:
Var(∆Rt) = σ20σ2
1(Rt−1)σ22(t)
Splines and Finance 48
Backfitting algorithm:
Assume the Yt has mean zero, e.g., are residuals.
1. fit a model s21(X1) for Y 2
t as a function of X1
• “de-volatilize”: replace Yt by yt/s1(X1)
2. fit a model s22(X2) for Y 2
t as a function of X2
• “de-volatilize”: replace Yt by Yt/sx(X2)...
3. fit a model s2p(Xp) for Y 2
t as a function of Xp
• “de-volatilize”: divide Yt by s1(X1) · · · sp(Xp)
4. either STOP or go back to 1.
Weighting is built into the algorithm.
Splines and Finance 49
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010012
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010012
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010012
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010024
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010024
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010024
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010024
0 5 10 15 2005
10
1960 1970 1980 1990 2000 2010024
∆(Rt)2 regressed on Rt−1 and t
• each row is one iteration
• effect of Rt−1 on left
• effect of t on right
• # of knots = min(5*iteration number, 20)
Splines and Finance 50
0 2 4 6 8 10 12 14 16 18−1
−0.5
0
0.5
1
rate
(∆ R
) / v
olat
itlity
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010−1
−0.5
0
0.5
1
year
(∆ R
) / v
olat
itlity
Plots of de-volatilized changes versus explanatory variables
Splines and Finance 51
−0.6 −0.4 −0.2 0 0.2 0.4 0.6
0.0010.003
0.01 0.02
0.05 0.10
0.25
0.50
0.75
0.90 0.95
0.98 0.99
0.9970.999
Data
Pro
babi
lity
Normal Probability Plot
Normal plot of de-volatilized changes
Splines and Finance 52
Estimating the Term Structure of CorporateDebt with a Semiparametric Model
Joint work with:
• Bob Jarrow (Cornell)
• Yan Yu (University of Cincinnati)
Splines and Finance 53
Bond prices and the forward rate
• t = time to maturity
• P (t) = price of zero-coupon bond at current time (t = 0)
• D(t) = discount function
• y(t) = yield to maturity
• f(t) = forward rate
P (t)PAR
= D(t) = exp−F (t) = exp −ty(t) = exp−
∫ t
0
f(s)ds
.
Splines and Finance 54
Estimation of the forward rate
Suppose the ith bond pays Ci(ti,j) and time ti,j
• i = 1, . . . , n
• j = 1, . . . , zi
Let f(s, δδδ) = δδδ′B(s) be a spline model for the forward rate.
Model for price of ith bond:
Pi(δδδ) =zi∑
j=1
Ci(ti,j) exp−δδδ′BI(ti,j)
where
BI(t) :=∫ t
0
B(s)ds =(t · · · tp+1
p+1
(t−κ1)p+1+
p+1 · · · (t−κK)p+1+
p+1
)′.
Splines and Finance 55
Estimate δ by minimizing
Qn,λ(δδδ) =1n
n∑
i=1
Pi −
zi∑
j=1
Ci(ti,j) exp−δδδ′BI(ti,j)2
+ λδδδ′Gδδδ.
Splines and Finance 56
Selection of λ
• Estimation of λ by GCV did not work well
• GCV targets MSE of the estimated regression function
• But the forward rate is the derivative of the (log of) theregression function
• Derivatives require a different amount of smoothing
Splines and Finance 57
Corporate Bonds
• Problem: often there are not enough bonds to fit a fullynonparametric model
• Jarrow, Ruppert, and Yu solve this by using a semiparametricmodel
Splines and Finance 58
Algorithm
Step 1: Nonparametric spline fit of a forward rate to US Treasurybonds.
• δδδ is estimated by minimizing Qn,λ(δδδ)
• λ is chosen by GCV, RSA, or EBBS
• fTr(t) = δδδ′B(t), where δδδ are the estimated spline coefficients
Step 2: Parametric estimation to obtain the forward rate curve fora corporation’s bonds.
• credit spread is parametric with parameter α
• for example, if the credit spread is a constant, then
fC(t) = fTr(t) + α = δδδ′B(t),
• fix δδδ at value from Step 1 and estimate α by OLS
Splines and Finance 59
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
−lo
g(pr
ice)
maturity
−Log-prices (as fraction of PAR)
Splines and Finance 60
0 5 10 15 20 25 30
0.045
0.05
0.055
0.06
0.065
0.07
forw
ard
rate
time to maturity
EBBSGCV θ=3GCV θ=1Empirical
Estimates of forward rate
θ was used by Fisher, Nychka and Zervos (1995) to induce moresmoothing –
GCV (λ) =n−1Pn
i=1
nPi − bPi(δδδ)
o2
1− n−1θ trA(λ)2 ,
where A(λ) is the “hat” or “smoother” matrix: P = A(λ)P
Splines and Finance 61
0 10 20 30−6
−4
−2
0
2
4
6x 10
−3
maturity
resi
dual
− lo
g tr
ansf
orm
0 5 10 15 20−0.5
0
0.5
1
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
−4 −2 0 2 4 6
x 10−3
0.0030.01 0.02 0.05 0.10
0.25
0.50
0.75
0.90 0.95 0.98 0.99
0.997
Data
Pro
babi
lity
Normal Probability Plot
2.5 3 3.5 4 4.5 50
1
2
3
4
5
6x 10
−3
log(fitted value)
abso
lute
res
idua
l
Residual analysis
Splines and Finance 62
0
5
10
15
20
0
5
10
15
20
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Years to Maturity
AT&T
Apr 94 − Dec 95
forw
ard
rate
s
Estimates of Treasury and AT&T forward rates
Splines and Finance 63
Summary
• Statisticians and financial engineers would each benefit frommore collaboration
• Calibration of financial models is an interesting and challengingproblem in statistics and data analysis
– transformation and weighting can be important
• Penalized splines are an attractive method for semiparametricmodeling