Click here to load reader
Upload
fbsjam10
View
212
Download
0
Embed Size (px)
Citation preview
NUS-RMIFE5218 Credit Risk
Lecture 5
Dr. Keshab [email protected]
2014
Contents
1 Poisson Intensity Model 2
2 Artificial Neural Network 3
3 Maximum Likelihood Estimation Using R 8
1
FE5218: Credit Risk
1. Poisson Intensity Model
The Poisson intensity model is based on a doubly stochastic process commonly known
as the Cox process. It is a Poisson process whose intensity is a function of other
stochastic variables known as covariates. We use the first jump by the Poisson process
to represent the default time. Here we will discuss the intensity model based on the
model suggested by Duffie et al. (2007).1
The intensity is assumed to be function of covariates as follows:2
λ (xt;µ) = eµ0+µ1x1t+···+µkxkt = eµtxt (1)
where µ0, µ1, . . . , µk are the parameters and x1t, . . . , xkt are the covariates at time
t. The covariates may include firm-specific and macroeconomic variables. Due to
the property of Poisson process, the probability of surviving a small time interval ∆t
(from time t to t+ ∆t )is given by
1 − P (xt;µ) = e−λ(xt;µ)∆t (2)
Thus, the probability of default in the same interval is given by
P (xt;µ) = 1 − e−λ(xt;µ)∆t ∼= λ (xt;µ) ∆t (3)
The survival probability over a longer time period can be viewed as surviving
many little time intervals. For example, suppose that we have data for T periods
from t = 1, . . . , T where the length of each period is ∆t . Then, the probability of
survival for the whole sample period (assuming conditional independence) is given
by3
e−∫ T∆t0 λ(xs;µ)ds ∼= e
−T∑
i=1λ(x(i−1)∆t;µ)∆t
=T∏i=1
[1 − P
(x(i−1)∆t;µ
)](4)
1Duffie, D., L. Saita and K. Wang, (2007), “Multi-period corporate default prediction withstochastic covariates,” Journal of Financial Economics 83, pp. 635-665.
2Positive µi means higher the value of xi higher the probability of default.3See equation (3), Lecture 4 note which has the similar formula in different context - where the
probability of default is logistic function. Here the probability of survival and default depends onintensity. Also note that t = 1 (the first period) refers to time 0 to ∆t and tth period is from time(t− 1)∆t to t∆t.
Dr. Keshab Shrestha 2
FE5218: Credit Risk
Similarly, the probability default in period (t+1) (from t∆t to (t+ 1)∆t) is given
by
P (xt∆t;µ)t∏i=1
[1 − P
(x(i−1)∆t;µ
)](5)
-
0∆t = 0 ∆t 2∆t . . . (t− 1)∆t t∆t (t+ 1)∆t
Period... 1st 2nd . . . tth (t+ 1)th
P (x(t−1)∆t;µ) P (xt∆t;µ)
Figure 1: Time Line
So far we only referred to one firm. If we have sample of firms, then we would
identify the firms with the subscript j. For example, the probability of default for
the jth firm for period t
P(x(t−1)∆t,j;µ
)Then the likelihood for the whole sample and all n firms is given by
L =n∏j=1
T∏i=1
[P(x(i−1)∆t,j;µ
)]ytj[1 − P(x(i−1)∆t,j;µ
)]1−ytj (6)
Strictly speaking above expression is not correct, because the expression includes
the periods after a firm has defaulted in which case the information required is not
available. Therefore, once the firm defaults in any period, the firm will no longer be
included in the above likelihood function. We can now maximize the loglikelihood
function where the product term will be converted to summation.
Duffie et al. also considers second exit which we did not discuss here.
2. Artificial Neural Network
An artificial neural network (ANN) is a mathematical modeling tool used to mimic the
way human brain is considered to process information. An ANN structure can range
Dr. Keshab Shrestha 3
FE5218: Credit Risk
from being a simple to highly complex and computationally intensive. Recently, the
use of ANN has been popular due to the increase in computing power and decreasing
computational cost of current generation of computers. Since this trend is expected
to continue, we expect to see increasing use of artificial neural network.
ANN was originally designed for pattern recognition and classification. However,
ANN can also be used for prediction applications. Therefore, it is natural that at-
tempts have been made to use ANN for forecasting bankruptcy (see for example,
Odom and Sharda, (1990), Wilson and Sharda, (1994), and Lacher et al. (1995))4
An ANN is typically composed of several layers of many computing elements
called nodes. Each node receives input signals from external inputs or other nodes
and processes the input signals through a transfer function resulting in a transformed
signal as output from the node. The output signal from the node is then used as the
input to the other nodes or final result. ANNs are characterized by their network
architecture which consists of a number of layers with each layer consisting of a
number of nodes. Finally, the network architecture would display how the nodes are
connected to other nodes or to input node or to output node.
ANN architecture takes a large number of forms. Here we will discuss the simple
one that is used for the bankruptcy prediction purpose.
A popular form of ANN is called the multi-layer perceptron (MLP) where all
nodes and layers are arranged in feed forward manner resulting in a feed-forward
architecture. The input layer constitutes the first or the lowest layer of MLP. The
input layer is the layer which is connected to the external information. In other
words, the MLP received the external information or input through this input layer.
The last or the highest layer is called the output layer where the ANN produces the
output visible to the outside the network. In between these two layers, input and
output layers, there may exist one or more layers known as hidden layers.
There are almost unlimited variation of network architectures representing MLP
depending on the number of hidden layers and interconnections of the nodes. Here we
will discuss one specific architecture that is used for the bankruptcy prediction pur-
pose. This is a three-layer MLP network with one hidden layer. Since the bankruptcy
classification is a two-group classification problem, three-layer architecture is consid-
ered to be sufficient. The three-layer MLP architecture with one hidden layer with
single node is shown in the Figure 1 below.
4Please see GCR for references. The discussion here is based on Zhang et al. (1999).
Dr. Keshab Shrestha 4
FE5218: Credit Risk
The lowest layer, the input layer, consists of the k different inputs which represents
explanatory variables or the firm characteristics in case of bankruptcy models. At the
hidden layer, the input values, or the activation values of the input nodes, is linearly
combined as follows:
α0 + α1X1 + · · · + αkXk (7)
In linear regression, the coefficient α0 is known as intercept. In neural network
terminology, these constants are called bias parameter. The linear combination is
then transferred using a transfer function into the hidden layers activation value.
In Figure 1, the transfer function for the hidden layer is taken as logistic function.
Therefore, the activation value of the hidden layer, H1, is given by
H1 =1
1 + e−(α0+α1X1+···+αkXk)=[1 + e−(α0+α1X1+···+αkXk)
]−1(8)
The output of the hidden layer is then used as the input to the single node at
the output layer or another hidden layer if exists. Again, since we are dealing with
two-group classification, single node is all we need. At this node, a linear combination
of the input, B0 +B1H1, is transferred using another activation function which is also
taken (in Figure 1) as the logistic function. Therefore, the activation value of the
output node, Y, is given by
Y =1
1 + e−(B0+B1H1)=[1 + e−(B0+B1H1)
]−1(9)
The activation value or the output of the output layer becomes the output of
the network representing the ANN. It is important to note that due to the logistic
activation function used at the output node, the activation value would be between
0 and 1. However, we are using the network for classification purpose. Therefore, we
need convert the value of Y that lies between 0 and 1 into either 0 (non-bankrupt
group) or 1 (bankrupt group). One common way to do this by using the following
classification method:
Dr. Keshab Shrestha 5
FE5218: Credit Risk
y =
{0 if Y ≥ 0.51 Otherwise
Or, y =
{1 if Y ≥ 0.50 Otherwise
(10)
We just completed the description of simple three-layer MLP architecture with
the following two sets of unknown parameters:
α0, α1, . . . , αk, B0, B1
Now the question is how do we decide about the values these parameters will take.
This is done by a process called “training the network” which involves choosing the
values of these parameters so that some measure of error is minimized. One such
popular error measure is the so-called mean-squared errors (MSE) defined as
MSE =1
N
N∑i=1
(ai − yi)2 (11)
where ai represents the ith target value and yi represents the network output for
the ith training values. Finally, N represents the number of training sets of input
values, i.e., the size of the training sample.
For example, if we use the same set of five ratios for 266 firms out of which 134
firms are in bankrupt group, we have one set of five ratios for each firm. When this
set of ratios for a firm i is used as input to the ANN, the output of the network,
yi, for this input would be either 0 (representing non-bankrupt) or 1 (representing
bankrupt). Then the actual bankruptcy status of the firm would be represented by
ai. Therefore, the sample of 266 firms would constitute a training sample that would
be used by the network to find the value of the parameters that would minimize the
MSE.
From the discussion above, it is clear that the training of the network is an un-
constrained nonlinear minimization problem. One of the most popular algorithm use
for training the network is the well-known backpropogation. This method a variation
of the gradient based steepest descent method. There are other methods of training
the network (see Zhang et al. (1999)).
R has a package called neuralnet that can be used to estimate the neural network
parameters.
Dr. Keshab Shrestha 6
FE5218: Credit Risk
GLOBAL CREDIT REVIEW
determines how excited a particular neuron is. The transfer function is typically chosen so that a fully excited neuron will register 1 and a partially excited neuron will have some value between 0 and 1. The output signal from the node is then used as the input to other nodes or fi nal result. ANNs are characterized by their network architecture which consists of a number of layers with each layer consisting of some nodes. Finally, the network architecture displays how nodes are connected to one another.
ANN architecture takes a wide variety of forms. Here we will discuss the simple one that has been used for the purposes of bankruptcy prediction.
A popular form of ANN is the multi-layer percep-tron (MLP) where all nodes and layers are arranged in a feed-forward manner resulting in a feed-forward architecture. The input layer constitutes the fi rst or the lowest layer of a MLP. This is the layer for the external information. In other words, the MLP receives the external information or input through this input layer. In the context of default prediction, the external information is characterized by the attributes of fi rms and/or the common risk factors. The last or the highest layer is called the output layer where the ANN produces its result. For default prediction, the output can be thought of as the default probability because a fully excited neuron has the value of 1. In between these two
layers, input and output layers, there may exist one or more layers known as hidden layers.
There are almost unlimited variations of the network architecture that represents a MLP. Variations come from the number of hidden layers, from the number of nodes in each layer, and from the ways in which notes are connected. Here we will restrict ourselves to a specifi c architecture with one hidden layer that will be used later for our default prediction. Since the bankruptcy classifi cation is a two-group classifi cation problem, a three-layer architecture is likely to be suffi cient. The three-layer perceptron considered here has only one node in the hidden layer and uses the logistic function, whose value is bounded between 0 and 1, as the transfer function. The exact structure is shown in the fi gure below.
The input layer consists of the k different inputs, which represents explanatory variables or the fi rm characteristics. At the hidden layer, the input values, or the activation values of the input nodes, is linearly combined as follows:
0 +
1X
1 + … +
kX
k (13).
In linear regression, the coeffi cient 0 is known as
intercept. In the neural network terminology, it is known as the bias parameter. The linear combination is then translated by the transfer function into an activation
Ye B B H
1
1 0 1 1
Figure 2: Three-Layer MLP Architecture
Dr. Keshab Shrestha 7
FE5218: Credit Risk
3. Maximum Likelihood Estimation Using R
# Note that you have to change the directory using "File\change directory"
# then select the
# directory folder where "altman.R" is located
ddat <- read.table("altman_new.txt",header=TRUE)
default.logit<- ddat$default # to be used by logistic model
# logistic regression lgm is part of "stats" package
glm.out = glm(default.logit ~ X1 + X2 + X3 + X4 + X5, family=binomial(logit), data=ddat)
summary(glm.out)
#____________________________________________________________________________
# use maximum likelihood to estimate
library(maxLik)
N<-nrow(ddat)
d<-as.matrix(ddat[,1:5]) # convert the data.frame to matrix
d1<-cbind(1,d) # column bind - add columns of 1s as the first
# coulmn representing intercept
loglik1 <- function(param) { # alternate but equivalent way of defining
# loglikelihooe
beta <- param
# lambda<-matrix(0,N,1)
loglik1<-0.0
lambda<-d1%*%param
Dr. Keshab Shrestha 8
FE5218: Credit Risk
for(i in 1:N){
loglik1<-loglik1-log(1+exp(-lambda[i]))-lambda[i]+
default.logit[i]*lambda[i]
}
loglik1
}
beta<-matrix(0.1,6,1)
loglik1(beta)
O.withAltmanVar<- maxLik(loglik1, start=beta)
summary(O.withAltmanVar)
# first derivative of log-likelihood for one firm "i" (y_i - lambda) times x_i
# see equation (18)
gradlik <- function(param) {
beta <- param
# lambda<-matrix(0,N,1)
BetaXt<-d1%*%param
# grad<-matrix(0,6,1)
dd<-default.logit-(1/(1+exp(-BetaXt))) # (y_i - lambda)
gradlik<-colSums(d1*matrix(rep(dd,6),nrow=266,byrow=FALSE))
gradlik
}
# Ohlson.withAltmanVar<- maxLik(loglik, start=beta)
O<- maxLik(loglik, gradlik, hess=NULL, start=beta)
summary(O)
Dr. Keshab Shrestha 9