Poisson Intensity Model

NUS-RMIFE5218 Credit Risk

Lecture 5

Dr. Keshab [email protected]

2014

Contents

1 Poisson Intensity Model 2

2 Artificial Neural Network 3

3 Maximum Likelihood Estimation Using R 8

1

mailto:[email protected]

FE5218: Credit Risk

1. Poisson Intensity Model

The Poisson intensity model is based on a doubly stochastic process commonly known

as the Cox process. It is a Poisson process whose intensity is a function of other

stochastic variables known as covariates. We use the first jump by the Poisson process

to represent the default time. Here we will discuss the intensity model based on the

model suggested by Duffie et al. (2007).1

The intensity is assumed to be function of covariates as follows:2

λ (xt;µ) = eµ0+µ1x1t+···+µkxkt = eµtxt (1)

where µ0, µ1, . . . , µk are the parameters and x1t, . . . , xkt are the covariates at time

t. The covariates may include firm-specific and macroeconomic variables. Due to

the property of Poisson process, the probability of surviving a small time interval ∆t

(from time t to t+ ∆t )is given by

1 − P (xt;µ) = e−λ(xt;µ)∆t (2)

Thus, the probability of default in the same interval is given by

P (xt;µ) = 1 − e−λ(xt;µ)∆t ∼= λ (xt;µ) ∆t (3)

The survival probability over a longer time period can be viewed as surviving

many little time intervals. For example, suppose that we have data for T periods

from t = 1, . . . , T where the length of each period is ∆t . Then, the probability of

survival for the whole sample period (assuming conditional independence) is given

by3

e−∫ T∆t0 λ(xs;µ)ds ∼= e

−T∑

i=1λ(x(i−1)∆t;µ)∆t

=T∏i=1

[1 − P

(x(i−1)∆t;µ

)](4)

1Duffie, D., L. Saita and K. Wang, (2007), “Multi-period corporate default prediction withstochastic covariates,” Journal of Financial Economics 83, pp. 635-665.

2Positive µi means higher the value of xi higher the probability of default.3See equation (3), Lecture 4 note which has the similar formula in different context - where the

probability of default is logistic function. Here the probability of survival and default depends onintensity. Also note that t = 1 (the first period) refers to time 0 to ∆t and tth period is from time(t− 1)∆t to t∆t.

Dr. Keshab Shrestha 2

FE5218: Credit Risk

Similarly, the probability default in period (t+1) (from t∆t to (t+ 1)∆t) is given

by

P (xt∆t;µ)t∏i=1

[1 − P

(x(i−1)∆t;µ

)](5)

-

0∆t = 0 ∆t 2∆t . . . (t− 1)∆t t∆t (t+ 1)∆t

Period... 1st 2nd . . . tth (t+ 1)th

P (x(t−1)∆t;µ) P (xt∆t;µ)

Figure 1: Time Line

So far we only referred to one firm. If we have sample of firms, then we would

identify the firms with the subscript j. For example, the probability of default for

the jth firm for period t

P(x(t−1)∆t,j;µ

)Then the likelihood for the whole sample and all n firms is given by

L =n∏j=1

T∏i=1

[P(x(i−1)∆t,j;µ

)]ytj[1 − P(x(i−1)∆t,j;µ

)]1−ytj (6)

Strictly speaking above expression is not correct, because the expression includes

the periods after a firm has defaulted in which case the information required is not

available. Therefore, once the firm defaults in any period, the firm will no longer be

included in the above likelihood function. We can now maximize the loglikelihood

function where the product term will be converted to summation.

Duffie et al. also considers second exit which we did not discuss here.

2. Artificial Neural Network

An artificial neural network (ANN) is a mathematical modeling tool used to mimic the

way human brain is considered to process information. An ANN structure can range


FE5218: Credit Risk

from being a simple to highly complex and computationally intensive. Recently, the

use of ANN has been popular due to the increase in computing power and decreasing

computational cost of current generation of computers. Since this trend is expected

to continue, we expect to see increasing use of artificial neural network.

ANN was originally designed for pattern recognition and classification. However,

ANN can also be used for prediction applications. Therefore, it is natural that at-

tempts have been made to use ANN for forecasting bankruptcy (see for example,

Odom and Sharda, (1990), Wilson and Sharda, (1994), and Lacher et al. (1995))4

An ANN is typically composed of several layers of many computing elements

called nodes. Each node receives input signals from external inputs or other nodes

and processes the input signals through a transfer function resulting in a transformed

signal as output from the node. The output signal from the node is then used as the

input to the other nodes or final result. ANNs are characterized by their network

architecture which consists of a number of layers with each layer consisting of a

number of nodes. Finally, the network architecture would display how the nodes are

connected to other nodes or to input node or to output node.

ANN architecture takes a large number of forms. Here we will discuss the simple

one that is used for the bankruptcy prediction purpose.

A popular form of ANN is called the multi-layer perceptron (MLP) where all

nodes and layers are arranged in feed forward manner resulting in a feed-forward

architecture. The input layer constitutes the first or the lowest layer of MLP. The

input layer is the layer which is connected to the external information. In other

words, the MLP received the external information or input through this input layer.

The last or the highest layer is called the output layer where the ANN produces the

output visible to the outside the network. In between these two layers, input and

output layers, there may exist one or more layers known as hidden layers.

There are almost unlimited variation of network architectures representing MLP

depending on the number of hidden layers and interconnections of the nodes. Here we

will discuss one specific architecture that is used for the bankruptcy prediction pur-

pose. This is a three-layer MLP network with one hidden layer. Since the bankruptcy

classification is a two-group classification problem, three-layer architecture is consid-

ered to be sufficient. The three-layer MLP architecture with one hidden layer with

single node is shown in the Figure 1 below.

4Please see GCR for references. The discussion here is based on Zhang et al. (1999).


FE5218: Credit Risk

The lowest layer, the input layer, consists of the k different inputs which represents

explanatory variables or the firm characteristics in case of bankruptcy models. At the

hidden layer, the input values, or the activation values of the input nodes, is linearly

combined as follows:

α0 + α1X1 + · · · + αkXk (7)

In linear regression, the coefficient α0 is known as intercept. In neural network

terminology, these constants are called bias parameter. The linear combination is

then transferred using a transfer function into the hidden layers activation value.

In Figure 1, the transfer function for the hidden layer is taken as logistic function.

Therefore, the activation value of the hidden layer, H1, is given by

H1 =1

1 + e−(α0+α1X1+···+αkXk)=[1 + e−(α0+α1X1+···+αkXk)

]−1(8)

The output of the hidden layer is then used as the input to the single node at

the output layer or another hidden layer if exists. Again, since we are dealing with

two-group classification, single node is all we need. At this node, a linear combination

of the input, B0 +B1H1, is transferred using another activation function which is also

taken (in Figure 1) as the logistic function. Therefore, the activation value of the

output node, Y, is given by

Y =1

1 + e−(B0+B1H1)=[1 + e−(B0+B1H1)

]−1(9)

The activation value or the output of the output layer becomes the output of

the network representing the ANN. It is important to note that due to the logistic

activation function used at the output node, the activation value would be between

0 and 1. However, we are using the network for classification purpose. Therefore, we

need convert the value of Y that lies between 0 and 1 into either 0 (non-bankrupt

group) or 1 (bankrupt group). One common way to do this by using the following

classification method:


FE5218: Credit Risk

y =

{0 if Y ≥ 0.51 Otherwise

Or, y =

{1 if Y ≥ 0.50 Otherwise

(10)

We just completed the description of simple three-layer MLP architecture with

the following two sets of unknown parameters:

α0, α1, . . . , αk, B0, B1

Now the question is how do we decide about the values these parameters will take.

This is done by a process called “training the network” which involves choosing the

values of these parameters so that some measure of error is minimized. One such

popular error measure is the so-called mean-squared errors (MSE) defined as

MSE =1

N

N∑i=1

(ai − yi)2 (11)

where ai represents the ith target value and yi represents the network output for

the ith training values. Finally, N represents the number of training sets of input

values, i.e., the size of the training sample.

For example, if we use the same set of five ratios for 266 firms out of which 134

firms are in bankrupt group, we have one set of five ratios for each firm. When this

set of ratios for a firm i is used as input to the ANN, the output of the network,

yi, for this input would be either 0 (representing non-bankrupt) or 1 (representing

bankrupt). Then the actual bankruptcy status of the firm would be represented by

ai. Therefore, the sample of 266 firms would constitute a training sample that would

be used by the network to find the value of the parameters that would minimize the

MSE.

From the discussion above, it is clear that the training of the network is an un-

constrained nonlinear minimization problem. One of the most popular algorithm use

for training the network is the well-known backpropogation. This method a variation

of the gradient based steepest descent method. There are other methods of training

the network (see Zhang et al. (1999)).

R has a package called neuralnet that can be used to estimate the neural network

parameters.


FE5218: Credit Risk

GLOBAL CREDIT REVIEW

determines how excited a particular neuron is. The transfer function is typically chosen so that a fully excited neuron will register 1 and a partially excited neuron will have some value between 0 and 1. The output signal from the node is then used as the input to other nodes or fi nal result. ANNs are characterized by their network architecture which consists of a number of layers with each layer consisting of some nodes. Finally, the network architecture displays how nodes are connected to one another.

ANN architecture takes a wide variety of forms. Here we will discuss the simple one that has been used for the purposes of bankruptcy prediction.

A popular form of ANN is the multi-layer percep-tron (MLP) where all nodes and layers are arranged in a feed-forward manner resulting in a feed-forward architecture. The input layer constitutes the fi rst or the lowest layer of a MLP. This is the layer for the external information. In other words, the MLP receives the external information or input through this input layer. In the context of default prediction, the external information is characterized by the attributes of fi rms and/or the common risk factors. The last or the highest layer is called the output layer where the ANN produces its result. For default prediction, the output can be thought of as the default probability because a fully excited neuron has the value of 1. In between these two

layers, input and output layers, there may exist one or more layers known as hidden layers.

There are almost unlimited variations of the network architecture that represents a MLP. Variations come from the number of hidden layers, from the number of nodes in each layer, and from the ways in which notes are connected. Here we will restrict ourselves to a specifi c architecture with one hidden layer that will be used later for our default prediction. Since the bankruptcy classifi cation is a two-group classifi cation problem, a three-layer architecture is likely to be suffi cient. The three-layer perceptron considered here has only one node in the hidden layer and uses the logistic function, whose value is bounded between 0 and 1, as the transfer function. The exact structure is shown in the fi gure below.

The input layer consists of the k different inputs, which represents explanatory variables or the fi rm characteristics. At the hidden layer, the input values, or the activation values of the input nodes, is linearly combined as follows:

0 +

1X

1 + … +

kX

k (13).

In linear regression, the coeffi cient 0 is known as

intercept. In the neural network terminology, it is known as the bias parameter. The linear combination is then translated by the transfer function into an activation

Ye B B H

1

1 0 1 1

Figure 2: Three-Layer MLP Architecture


FE5218: Credit Risk

3. Maximum Likelihood Estimation Using R

# Note that you have to change the directory using "File\change directory"

# then select the

# directory folder where "altman.R" is located

ddat <- read.table("altman_new.txt",header=TRUE)

default.logit<- ddat$default # to be used by logistic model

# logistic regression lgm is part of "stats" package

glm.out = glm(default.logit ~ X1 + X2 + X3 + X4 + X5, family=binomial(logit), data=ddat)

summary(glm.out)

#____________________________________________________________________________

# use maximum likelihood to estimate

library(maxLik)

N<-nrow(ddat)

d<-as.matrix(ddat[,1:5]) # convert the data.frame to matrix

d1<-cbind(1,d) # column bind - add columns of 1s as the first

# coulmn representing intercept

loglik1 <- function(param) { # alternate but equivalent way of defining

# loglikelihooe

beta <- param

# lambda<-matrix(0,N,1)

loglik1<-0.0

lambda<-d1%*%param


FE5218: Credit Risk

for(i in 1:N){

loglik1<-loglik1-log(1+exp(-lambda[i]))-lambda[i]+

default.logit[i]*lambda[i]

}

loglik1

}

beta<-matrix(0.1,6,1)

loglik1(beta)

O.withAltmanVar<- maxLik(loglik1, start=beta)

summary(O.withAltmanVar)

# first derivative of log-likelihood for one firm "i" (y_i - lambda) times x_i

# see equation (18)

gradlik <- function(param) {

beta <- param

# lambda<-matrix(0,N,1)

BetaXt<-d1%*%param

# grad<-matrix(0,6,1)

dd<-default.logit-(1/(1+exp(-BetaXt))) # (y_i - lambda)

gradlik<-colSums(d1*matrix(rep(dd,6),nrow=266,byrow=FALSE))

gradlik

}

# Ohlson.withAltmanVar<- maxLik(loglik, start=beta)

O<- maxLik(loglik, gradlik, hess=NULL, start=beta)

summary(O)


Documents

Poisson Intensity Model