70
Current Methods for Evaluating Prediction Performance of Biomarkers and Tests Instructor: Margaret Pepe, University of Washington and Fred Hutchinson Cancer Research Center Abstract. This course will describe and critique methods for evaluating the performance of markers to predict risk of a current or future clinical outcome. Examples from cancer, cardiovascular disease and kidney injury research will be presented. Course content will include: a. Four criteria for evaluating a risk model: calibration, benefit for decision making, accurate classification,risk stratification; b. Comparing models; c. Comparing baseline and expanded models; e. Hypothesis testing and confidence interval construction; f. Relationships and differences with assessing discrimination; g. Software. 1 Evaluating Prediction Performance of Biomarkers and Tests Margaret Sullivan Pepe Fred Hutchinson Cancer Research Center and University of Washington email: [email protected] pertinent website: labs.fhcrc.org/pepe/dabs/index.html 2

Current Methods for Evaluating Prediction Performance of

  • Upload
    lynhi

  • View
    236

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Current Methods for Evaluating Prediction Performance of

Current Methods for Evaluating

Prediction Performance of Biomarkers and Tests

Instructor: Margaret Pepe, University of Washington and Fred Hutchinson Cancer

Research Center

Abstract. This course will describe and critique methods for evaluating the performance

of markers to predict risk of a current or future clinical outcome. Examples from cancer,

cardiovascular disease and kidney injury research will be presented. Course content

will include:

a. Four criteria for evaluating a risk model: calibration, benefit for decision making,

accurate classification,risk stratification;

b. Comparing models;

c. Comparing baseline and expanded models;

e. Hypothesis testing and confidence interval construction;

f. Relationships and differences with assessing discrimination;

g. Software.1

Evaluating Prediction Performanceof Biomarkers and Tests

Margaret Sullivan Pepe

Fred Hutchinson Cancer Research Center

and

University of Washington

email: [email protected]

pertinent website: labs.fhcrc.org/pepe/dabs/index.html

2

Page 2: Current Methods for Evaluating Prediction Performance of

Examples

1. Acute kidney injury after cardiac surgery1

• TRIBE-AKI consortium study

• n = 1139 adults, nC = 407 (36%) events

• Y = preoperative BNP

• X = demographics, co-morbidities, surgery characteristics

• D = AKI event

Patel et al. (submitted) Circulation

3

Examples

2. Critical illness assessment in out-of hospital emergency care2

• King County EMS 2002–2006

• n =145,000 without trauma or cardiac arrest

• nC = 8,000 (5%) sepsis events

• Y = whole blood lactate

• X = demographics, clinical measurements (blood pressure,

oximetry, respiratory rate, .. . .), location (nursing home, . . . .)

• D = sepsis

Seymour C. JAMA 2010

4

Page 3: Current Methods for Evaluating Prediction Performance of

Examples

3. Invasive breast cancer within 5 years after treatment for DCIS3.

• Early Detection Research Network (EDRN) ongoing

multicenter study.

• One site: Western Washington population-based study.

• nested case-control design (nC =445, nN =890).

• Y = DCIS tissue biomarkers TBD.

• X = age, race, mammographic density, treatment

• D = invasive breast cancer within 5 years

5

Examples

4. 10-year risk of cardiovascular health event4.

• Framingham study

• n =3264

• nC =183

• Y =HDL

• X = demographics, smoking, diabetes, blood pressure, total

cholesterol

• D = CVD event

Pencina Stat in Med 2008

6

Page 4: Current Methods for Evaluating Prediction Performance of

Examples

5. Simulated Data available on DABS website5.

• n =10,000

• nC =1,017

• Y =continuous

• X = one-dimensional continuous

• For practice and illustration

Pepe AJE 2011

7

Outline

Unit 1: Evaluating a risk model — the basic concepts

Unit 2: Comparing two risk models — including the role of risk

reclassification metrics

Unit 3: Estimation and inference from data

8

Page 5: Current Methods for Evaluating Prediction Performance of

Unit 1

Evaluating a Risk Model –

Basic Concepts

1.1

Notation

• D = 1 bad outcome (case, event)

D = 0 good outcome (control, non-event)

• X = predictors

• r(X ) = Prob(D = 1|X ) = risk(X )

1.2

Page 6: Current Methods for Evaluating Prediction Performance of

What’s the point of developing a risk model?

• To help make medical decisions.

• Offer new interventions particularly to those who might benefit

(cases: bad outcome in absence of intervention).

• Offer no new intervention but more peace of mind to those who

will not benefit from intervention (controls).

• Opposite scenario is analogous: where reduction from

standard intervention is the goal.

1.3

Fundamental Evaluations

1. Calibration — is the risk calculator correctly calculating risk?

2. Risk distributions and consequent benefit — is the risk calculator helpful to cases

and controls in making appropriate medical decisions?

density

*risk

cases

controls

* threshold for intervention

(Unit 1 will expand on these basic ideas)1.4

Page 7: Current Methods for Evaluating Prediction Performance of

What is risk(X)?

• Not individual level probability.

• risk(X ) = P[D = 1|X = x ] is a frequency of events among the

group of subjects with X = x .

• Individual level risks are not observable5.

• Individual level risks are not well defined.

• (more literature on this?)

1.5

Calibration

• Is the risk calculator r(X ) valid?

• Among people with r(X ) = r is the fraction of events ≈ r ?6

• More lenient than asking if r(X ) = P(D = 1|X ). We are asking

if P(D = 1|r(X ) = r) = r .

• Validity of the risk calculator is crucial.

• Otherwise, we are engaged in evaluating a score for

discrimination/classification, not with the higher level task of

evaluating risk prediction performance.

1.6

Page 8: Current Methods for Evaluating Prediction Performance of

Calibration

• Visual assessment with the predictiveness curve7 or calibration plot.6

event rate (0.1017)

0.0

0.2

0.4

0.6

0.8

1.0

Risk

0 20 40 60 80 100

Risk percentile

• Flexibility in choosing quantile categories on abscissa.

• Hosmer-Lemshow statistic8 sometimes a useful adjunct, but depends

on sample sizes.1.7

Calibration

• Achievable if the number of predictors is small.

• Recalibration9,10:

(i) training set develops a score X (W ) where W are

predictors;

(ii) test set for estimating r(X ) ≡ P(D = 1|X ).

• If P(D = 1|X ) �= P(D = 1|W ) we lose optimality but gain

‘validity.’

• Assume henceforth that risk calculators are valid, i.e.,well

calibrated, in the sense that P(D = 1|r(X ) = r) ≈ r .

• some measures of performance only make sense when

models are valid, others make sense regardless.

1.8

Page 9: Current Methods for Evaluating Prediction Performance of

Prediction Performance

• Case (C) and Control (N) risk distributions are fundamental components of all

performance measures.

threshold = t

risk

cases

controls

• Risk categories corresponding to treatment recommendations can be overlaid to

assess benefit for decision making.

• Suppose scenario of no treatment (default) versus a single treatment.

• HRC =% cases in High Risk category: potential benefit of using r(X ).

• HRN =% controls at High Risk category: potential cost of using r(X ).1.9

Existing Risk Categories

• Prob[Breast Cancer] ≥ 0.8% per year for tamoxifen treatment

in white women 50–59 years old.11

• Prob[CHD event in 10 years] ≥ 20% for cholesterol lowering

medication (ATPIII guidelines).12

When Risk Categories Don’t Exist

• Displaying distributions of risk allows one to overlay subjective

risk categories.

• “What if?” exercises about future interventions.

1.10

Page 10: Current Methods for Evaluating Prediction Performance of

Given a risk threshold t that determines recommendation for treatment:

Key summary statistics

• HRC (t) = P(r(X ) > t|D = 1) ideally HRC (t) = 1

• HRN(t) = P(r(X ) > t|D = 0) ideally HRN(t) = 0

• NB(t) = net benefit of using the model and t for subjects in the population

= B × P(D = 1)HRC (t) − Cost × P(D = 0)HRN(t)

where B = expected benefit of treatment to a case and Cost = expected cost

of treatment to a control.

Notes: t > P[D = 1] is only relevant when default is “no treatment.”13

TPR= HRC (t)

FPR= HRN(t)

1.11

Result 1

Cost/B = t/(1 − t)

Proof

When should a patient choose treatment?

◦ when he/she expects to benefit

◦ E(benefit|D = 1,X )P(D = 1|X ) − E(cost |D = 0,X )P(D = 0|X ) > 0

◦ B × P(D = 1|X ) − Cost × P(D = 0|X ) > 0

◦ P(D = 1|X )/P(D = 0|X ) > Cost/B

• a rational choice of risk threshold implies Cost/B

• specified Cost/B implies a rational choice of threshold.

Reference: 14, 291.12

Page 11: Current Methods for Evaluating Prediction Performance of

Example

20% risk threshold for treatment ‘feels right’ equivalent to

perceived cost-benefit ratio =t

1 − t=

.2

.8= .25

Example

Gail11 evaluated risk models for breast cancer in terms of decisions

about tamoxifien (T) prevention therapy. Values relate to Prob(bad

outcome) where a bad outcome includes hip fracture, stroke,

endometrial cancer, pulmonary embolism in addition to breast

cancer. In 50–59 year old white women:

Cost/B = 0.0077 → t = 0.0077 per year.

1.13

Net Benefit (t) in the population

• NB(t) = B × P(D = 1)HRC (t) − Cost × P(D = 0)HRN(t)

= B

{P(D = 1)HRC (t) −

t

1 − tP(D = 0)HRN(t)

}

= P(D = 1)HRC (t) −t

1 − tP(D = 0)HRN(t)

where B is the unit of measurement ≡ benefit of intervention for a would be

case in the absence of intervention.

• 1/NB(t) = number of tests needed to yield a NB of 1

◦ ignoring costs of testing

◦ NB = 1 is equivalent to 1 case and no controls

classified as high risk.

References: 13, 141.14

Page 12: Current Methods for Evaluating Prediction Performance of

Given a risk threshold t that determines recommendation for

treatment:

Key summary statistics

• HRC (t) = P(r(X ) > t|D = 1) ideally 1

• HRN(t) = P(r(X ) > t|D = 0) ideally 0

• NB(t) = net benefit

= P(D = 1)HRC (t) − t

1−tP(D = 0)HRN(t)

1.15

Example

• t = 20%; predictor X ; data in [5]

• HRC (t) = 65.2%

• HRn(t) = 8.9%

• NB(t) = 0.0465× benefit of statins to subjects who would get a CVD event without

them.

20%

risk

cases

controls

1.16

Page 13: Current Methods for Evaluating Prediction Performance of

Understanding Net Benefit

NB(t) = P(D = 1)HRC (t) −t

1 − tP(D = 0)HRN(t)

• ρ ≡ P(D = 1)

• maximum value of NB = P(D = 1)

‘The best we can do is treat all cases and no controls’

• relative utility13 = NB(t)/ρ = % of maximum benefit

• relative utility = HRC (t) − 1−ρρ

t

(1−t)HRN(t)

= true positive rate discounted appropriately for the

false positive rate.

1.17

In our data

• RU = 45.6%

• The maximum possible benefit is to treat all 1017 cases and no

controls per 10,000. We can achieve 45.6% of this benefit

using the prediction model.

• With this model 65.2% of cases are above the treatment risk

threshold. Discounting for controls also classified as high risk,

we achieve the equivalent of 45.6% of cases classified for

treatment.

1.18

Page 14: Current Methods for Evaluating Prediction Performance of

More than Two Risk Categories

5% 20%

risk

cases

controls

• Default = medium category intervention since P[D = 1] = 0.102

low med high

< 5% 5-20% > 20%

Cases % 11.2 23.6 65.2

Control% 65.7 25.3 8.9

• To calculate net benefit here, we also need to specify Bhigh/Costlow for a case.

• e.g., Bhigh/Costlow = 10: NB = 0.0485× Bhigh

NBmax = 0.1017 + 0.0047 = 0.1064

RU = 45.6%1.19

FYRP

• Use C to denote Cost here

NB = ρ {−C(low)PC (low) + B(high)PC (high)}

+ (1 − ρ) {B(low)PN(low) − C(high)PN(high)}

Arguments in Result 1 ⇒Blow

Clow

=tlow

1 − tlow

Chigh

Bhigh

=thigh

1 − thigh

Let λ = Bhigh/Clow

NB = ρ˘−Bhigh/λPC (low) + BhighPC (high)

¯+ (1 − ρ)

jtlo

1 − tlow

Bhigh

λPN (low) − Bhigh

thi

1 − thi

PN (high)

ff

= Bhigh

(−

ρ

λPC (low) + ρPC (high) +

tlow

1 − tlow

(1 − ρ)

λPN (low) −

thigh

1 − thigh

(1 − ρ)PN (high)

)

If λ = 10 ⇒ NB = 0.0485.

1.20

Page 15: Current Methods for Evaluating Prediction Performance of

In the absence of risk categories or thresholds:

• Plots: HRC (t), HRN(t), NB(t) or RU(t)

• Summary statistics

– But first, about the predictiveness curve . . . .

1.21

The Predictiveness Curve

• Plots risk quantiles in the whole population.5,15

• Shows the population risk distribution, cases and controls together

• Shows what % of subjects in the population will be classified as high risk by the

model − useful for enrollment in clinical trials for example.

• Total Gain = area between curve and horizontal (useless model) curve16

event rate (0.1017)

0.0

0.2

0.4

0.6

0.8

1.0

Risk

0 20 40 60 80 100

Risk percentile

1.22

Page 16: Current Methods for Evaluating Prediction Performance of

The Predictiveness Curve

• There are other ways of showing the population risk distribution.17

pdf

0 .2 .4 .6 .8 1

risk

0

.2

.4

.6

.8

1

cdf

0 .2 .4 .6 .8 1

risk

1.23

The Predictiveness Curve

• Is equivalent to case and control risk distributions.18

• Is derived from case and control distributions.

P(risk(X ) > t) = ρP(risk(X ) > t|D = 1)+(1−ρ)P(risk(X ) > t|D = 0)

• Can be used to calculate case and control distributions.

P(risk(X ) = r |D) =P(D|risk(X ) = r)P(risk(X ) = r)

P(D)

= rP(risk(X ) = r)/P(D)

1.24

Page 17: Current Methods for Evaluating Prediction Performance of

0.2

.4.6

.81

Ris

k

0.2

.4.6

.81

Positiv

e F

raction

0 20 40 60 80 100

Risk Percentile

TPF

FPF

• The integrated plot7 (2008)

• I have come to think that the lower panel of this plot is most important.

• Should be supplemented with the net benefit (decision) or relative utility curve.

1.25

Decision Curve

Decision Curve14: NB(t) = ρHRC (t) − (1 − ρ){t/(1 − t)}HRN(t)

0

2

4

6

8

10.17

Net

Benefit

(%)

0 ρ=.1017 .2 .4 .6 .8 1

Risk threshold

Decision Curve

1.26

Page 18: Current Methods for Evaluating Prediction Performance of

• Considering B fixed, larger risk threshold (t) corresponds to

larger Cost.

• Larger Cost implies smaller NB (decreasing NB(t)).

• Relative utility curve = NB(t)/maximum possible NB(t)

= NB(t)/ρ on (0,1) scale

• Recall default = no treatment here.

• Relative utility defined more generally.13

1.27

ρ = 0.1017

Data Displays without Risk Categories

0

1

0 ρ=.1017 .2 .4 .6 .8 1

Risk threshold

HRC(t)

HRN(t)

0

1

RU(t)

0 ρ=.1017 .2 .4 .6 .8 1

Risk threshold

1.28

Page 19: Current Methods for Evaluating Prediction Performance of

Summary Measures without Risk Categories

risk

cases

controls

• MRD = Mean Risk Difference ≡ mean(risk(X )|case)-mean(risk(X )|control)

• AARD = Above Average Risk Difference ≡ P(risk(X ) > ρ|case) − P(risk(X ) > ρ|control)

1.29

Mean Risk Difference (MRD)

Also known as

• IDI = Integrated Discrimination Improvement relative to no model4,19

=∫{P(risk(X ) > r |D = 1) − P(risk(X ) > r |D = 0)} dr

• PEV = Proportion of Explained Variation

= var(E(D|X ))/var(D)

= var(risk(X ))/var(D)

• R2 = PEV (There are other more complex R2measures18)

• Yates’ Slope

For our data

• mean(risk|case) = 0.391

• mean(risk|control) = 0.069

• MRD= .322

1.30

Page 20: Current Methods for Evaluating Prediction Performance of

Above Average Risk Difference (AARD)

AARD = P(risk(X ) > ρ|D = 1) − P(risk(X ) > ρ|D = 0) = 0.797 − 0.198 = .599

Also known as

• HRC (ρ) − HRN(ρ) = TPR(ρ) − FPR(ρ) = Youden’s index (ρ)

• RU(ρ) = NB(ρ)/ρ

Proof: NB(t) = ρHRC (t) − (1 − ρ) t

1−tHRN(t)

• Standardized Total Gain ≡ TG/2ρ(1 − ρ). Not intuitive result!20

• 0.5× continuous NRI21 comparing risk(X ) with no model

• 0.5× categorized NRI comparing risk(X ) with no model using risk

categories > ρ and < ρ.

1.31

Summary Measures without Risk Categories

risk

cases

controls

MRD = 0.322

AARD = 0.599

1.32

Page 21: Current Methods for Evaluating Prediction Performance of

ROC Type Statistics as Summary Measures

• May be useful when no clinically relevant risk thresholds exist.

• ROC(f0) = P(risk(X ) > t|D = 1) where t: f0 = P(risk(X ) > t|D = 0)

• ROC−1(t0) = P(risk(X ) > t|D = 0) where t: t0 = P(risk(X ) > t|D = 1)

• PCF = L(v0) = P(risk(X ) > t|D = 1) where t: v0 = P(risk(X ) > t)

• PNF = L−1(w0) = P(risk(X ) > t) where t: w0 = P(risk(X ) > t|D = 1)

= ρw0 + (1 − ρ)ROC−1(w0)

• L is related to the Lorenz curve.22

• Report the risk threshold corresponding to the criterion as well.

1.33

16.7 %

risk

cases

controls

0

.70

1

TPR

0 0.11 1

FPR

ROC−1(0.7) = 0.115

L(0.7) = 0.175

1.34

Page 22: Current Methods for Evaluating Prediction Performance of

18.5 %

risk

cases

controls

0

0.67

1

TPR

0 .10 1

FPR

ROC(0.1) = 0.672

L−1(0.15) = 0.656

1.35

ROC Plot is Inadequate

0

1

TPR

0 1

FPR

• Because it does not display absolute risk scale.7

• Preferable to display two risk distributions rather than just the

nonparametric distance between them.

1.36

Page 23: Current Methods for Evaluating Prediction Performance of

Area Under the ROC Curve = 0.884

• Mann-Whitney U-statistic (Wilcoxon) for comparing risk

distributions for cases versus controls.

• Average TPR over all possible FPR

• P(risk(XC ) > risk(XN))

• Not a clinically relevant measure of prediction performance

• Not worse than MRD in terms of clinical relevance (is it?).

• ROC(f0) or other point measures may be more clinically

relevant than AUC.

• AUC gives too much weight to low risk ranges and ignores the

meaning of risk.23

• Average TPR over a relevant range of FPR preferable to total

AUC=partial AUC(f )/f = 0.510 at f = 0.10.

1.37

Summary of Unit #1

• The statistical meaning of risk(X ): an observable frequency of events.

• Calibration≡ “is the risk calculator valid?” is crucial.

• Medical decisions based on risk:

– correspondence between risk categories and cost(FP)/benefit(TP)

– case and control frequencies in risk categories are the essential

components of net benefit gained by using risk model (excluding cost of

testing here)

• Without risk categories

– display case and control risk distributions, and relative utility (or

decision curves) that show benefit of using the model under various

decision rules

– statistics: ROC or Lorenz points

– more statistics: MRD, AARD, AUC ; are they useful for comparing

models?

1.38

Page 24: Current Methods for Evaluating Prediction Performance of

Unit 2

Comparing Risk Models

2.1

In a Nutshell

• Choose your favorite measure(s) of prediction performance.

• Compare that measure(s) for the two risk models.

2.2

Page 25: Current Methods for Evaluating Prediction Performance of

Example

• r(X ,Y ) versus r(X ) for data in Pepe AJE 2011.

• Both models are very well calibrated in the weak sense18 that

P(D = 1|r) ≈ r .

event rate (0.1017)

0.0

0.2

0.4

0.6

0.8

1.0

Risk

0 20 40 60 80 100

Risk percentile

Risk(X)

Risk(X,Y)

2.3

Figure 2. Plots showing high risk classification for cases (C) and

controls (N) under the baseline model (X ) and the expanded model

(X ,Y ). A comparison of net benefit is also shown through the relative

utility plot.

ρ = 0.1017

0

1

0 ρ=.1017 .2 .4 .6 .8 1

Risk threshold

HRC(t) XY

HRN(t) XY

HRC(t) X

HRN(t) X

0

1

RU(t)

0 ρ=.1017 .2 .4 .6 .8 1

Risk threshold

XY

X

model

2.4

Page 26: Current Methods for Evaluating Prediction Performance of

My Favorite Summary Measures

t = 20%, ρ = .1017

risk(X ) risk(X ,Y ) Δ

Cases > t HRC (t) 65.2% 73.5% 8.4%

Controls > t HRN(t) 8.9% 8.4% −0.5%

% of max benefit RU(t) 45.5% 55.0% 9.5%

• Hypothesis testing and confidence intervals in Unit 3.

2.5

Not My Favorite Summary Measures

risk(X ) risk(X ,Y ) Difference

AUC 0.884 0.920 0.036

MRD/IDI 0.322 0.416 0.094

AARD 0.599 0.673 0.074

ROC(0.20) 0.672 0.758 0.087

L−1(0.70) 0.174 0.134 −0.040

• Hypothesis testing and confidence intervals in Unit 3.

2.6

Page 27: Current Methods for Evaluating Prediction Performance of

Risk Reclassification Tables

• Showing numbers of subjects and event rates (%)

risk(X , Y )

r(X ) ≤ 5% 5 − 20% > 20% Total

≤ 5% 5558 437 25 6020

1.30 8.71 16.00 1.89

5 − 20% 1036 1095 386 2517

2.03 9.59 29.53 9.54

> 20% 40 329 1094 1463

0.00 10.03 57.59 45.32

Total 6634 1861 1505 10,000

1.40 9.46 49.70 10.17

• Cook and Ridker23,24

• Motivated by clinically relevant risk categories in cardiovascular health

research.2.7

Event and non-Event Risk Reclassification Tables4

Events

risk(X ,Y )

r(X ) < 5% 5 − 20% ≥ 20% Total

≤ 5% 72 38 4 114 (11.2%)

5 − 20% 21 105 114 240 (23.6%)

≥ 20% 0 33 630 663 (65.2%)

Total 93 176 748 1017

(9.1%) (17.3%) (73.5%)

2.8

Page 28: Current Methods for Evaluating Prediction Performance of

Event and non-Event Risk Reclassification Tables

Non-Events

risk(X ,Y )

r(X ) < 5% 5 − 20% ≥ 20% Total

≤ 5% 5486 399 21 5906 (65.8%)

5 − 20% 1015 990 272 2277 (25.3%)

≥ 20% 40 296 464 800 (8.9%)

Total 6541 1685 757 8983

(72.8%) (18.8%) (8.4%)

2.9

• Useful to see how r(X,Y) performs within strata defined by r(X)

ρ(X ) cases controls % of max benefit

Population Event Rate HRC (0.20) HRN(0.20) RU(0.20)

low risk r(X ) 1.89% 0.035 0.004 −1.7%

med risk r(X ) 9.54% 0.475 0.119 19.3%

high risk r(X ) 45.32% 0.950 0.580 25.4%

∗ Assume default is ‘no treatment’ for low and medium risk strata.

RU(t) = HRC (t) −(1 − ρ(X ))

ρ(X )

t

1 − tHRN(t)

∗ Assume default is ‘treatment’ for high risk stratum.

RU(t) = (1 − HRN(t)) −ρ(X )

1 − ρ(X ))

1 − t

t(1 − HRC (t))

2.10

Page 29: Current Methods for Evaluating Prediction Performance of

ρ(X ) cases controls % of max benefit

Population Event Rate HRC (0.20) HRN(0.20) RU∗(0.20)

low risk r(X )

med risk r(X ) 9.54% 0.475 0.119 19.3%

high risk r(X ) %

• Calculations based on middle rows of event and non-event reclassification

tables.

2.11

ρ(X ) cases controls % of max benefit

Population Event Rate HRC (0.20) HRN(0.20) RU∗(0.20)

low risk r(X )

med risk r(X )

high risk r(X ) 45.32% 0.950 0.580 25.4%

∗ Assume default is ‘treatment’ for high risk stratum.

RU(t) = (1 − HRN(t)) −ρ(X )

1 − ρ(X ))

1 − t

t(1 − HRC (t))

2.12

Page 30: Current Methods for Evaluating Prediction Performance of

ρ(X ) cases controls % of max benefit

Population Event Rate HRC (0.20) HRN(0.20) RU = NB

ρ(0.20)

low risk r(X ) 1.89% 0.035 0.004 −1.7%

med risk r(X )

high risk r(X )

2.13

Risk Reclassification Tables

• Valuable for evaluating r(X ,Y ) within subpopulations defined

by r(X )

• These are separate single model evaluations in each

subpopulation.

2.14

Page 31: Current Methods for Evaluating Prediction Performance of

Risk Reclassification Tables

• Not helpful for overall comparison of use of r(X ) versus use of

r(X ,Y ) in the entire population.

• Margins show risk distributions for each model. Performance

for each model determined by the corresponding marginal

distributions of risk.

• Comparison of performance determined by comparison of

marginal distributions.

• Reclassification statistics do not calculate performance for

each model and compare the two values.

2.15

Event and non-Event Risk Reclassification Tables

Events

risk(X ,Y )

r(X ) < 5% 5 − 20% ≥ 20% Total

≤ 5% 72 38 4 114 (11.2%)

5 − 20% 21 105 114 240 (23.6%)

≥ 20% 0 33 630 663 (65.2%)

Total 93 176 748 1017

(9.1%) (17.3%) (73.5%)

2.16

Page 32: Current Methods for Evaluating Prediction Performance of

Event and non-Event Risk Reclassification Tables

Non-Events

risk(X ,Y )

r(X ) < 5% 5 − 20% ≥ 20% Total

≤ 5% 5486 399 21 5906 (65.8%)

5 − 20% 1015 990 272 2277 (25.3%)

≥ 20% 40 296 464 800 (8.9%)

Total 6541 1685 757 8983

(72.8%) (18.8%) (8.4%)

2.17

Some Problems with Interior Cells of Risk Reclassification Tables

(1) Much reclassification does not imply improved performance.

Example of RR tables with the same margins but different % reclassification. r(X )

risk(X , Y ) risk(X , Y )

< 5 5 − 20 > 20% Total < 5 5 − 20 > 20% Total

< 5 10 10 0 20 20 0 0 20

5 − 20 5 20 10 35 0 35 0 35

> 20 5 5 35 45 0 0 45 45

Total 20 35 45 100 20 35 45 100

% reclassification= 35% % reclassification= 0%

• The comparison of performances of models r(X ) versus r(X , Y ) is found by

comparing vertical versus horizontal margins.

2.18

Page 33: Current Methods for Evaluating Prediction Performance of

Some Problems with Interior Cells of Risk Reclassification Tables

(2) The NRI statistics4,21 do not capture ‘improved risk reclassification’ when #

categories > 2.

Definitions

NRI (event) = P[rcat(X ,Y ) > rcat(X )|D = 1]

− P[rcat(X ,Y ) < rcat(X )|D = 1]

NRI (non-event) = P[rcat(X ,Y ) < rcat(X )|D = 0]

− P[rcat(X ,Y ) > rcat(X )|D = 0]

NRI = NRI (event) + NRI (non-event)

2.19

Example where NRI > 0 but no performance improvement

Events r(X , Y )

low med high

low 10 10 0 20

r(X ) med 5 20 10 35

high 5 5 35 45

20 35 45 100

Non-Events r(X , Y )

low med high

low 500 100 0 600

med 100 200 0 300

high 0 0 100 100

600 300 100 900

NRI (event) +NRI (non-event) = (20% − 15%) + 0% = 5%

but

% cases and % controls in each risk category are the same for r(X ,Y ) and

r(X ) models.

2.20

Page 34: Current Methods for Evaluating Prediction Performance of

Example where NRI = 0 but there is performance change (improvement)

Events r(X , Y )

low med high

low 10 10 0 20

r(X ) med 15 20 10 45

high 0 5 30 35

25 35 40 100

Non-Events r(X , Y )

low med high

low 500 100 0 600

med 100 200 0 300

high 0 0 100 100

600 300 100 900

• NRI (event)= 10% + 10% + 0% − 5% − 0% − 15% = 0

• % cases at high risk increased from 35 to 40%

2.21

NRI with 2 Risk Categories

Events r(X , Y )

< 20 ≥ 20

< 20 236 118 354

r(X ) (23.2%) (11.6%)

≥ 20 33 630 663

(3.2%) (61.9%)

269 748 1017

NRI (event)= (118 − 33)/1017 = 8.4%

ΔHRC = 748/1017− 663/1017 = 73.5% − 65.2% = 8.4%

• NRI (event)= ΔHRC

NRI (non-event)= −ΔHRN

NRI = ΔHRC − ΔHRN

• uses the table margins2.22

Page 35: Current Methods for Evaluating Prediction Performance of

Continuous NRI

Definitions

• Cont-

NRI(event)≡ P[risk(X ,Y ) > risk(X )|D = 1] − P[risk(X ,Y ) < risk(X )|D = 1]

= 2P[risk(X ,Y ) > risk(X )|D = 1] − 1

• Cont-NRI(non-event) ≡ 2P[risk(X ,Y ) < risk(X )|D = 0] − 1

• Cont-NRI=NRI(event) +NRI (non-event)

Concerns

• Possible for Cont-NRI(event)> 0 even when distribution of risk(X ) and

risk(X ,Y ) are the same for cases and vice versa. Similar concerns for

Cont-NRI(non-event).

• Other continuous summary indices such as ΔAUC , ΔMRD, IDI , and AARD

do not have this undesirable property because they use the table margins.2.23

Cook-Ridker Analysis Strategy24

• Is problematic.

(i) % reclassification

∗ Not a key summary in general (unless there is very little reclassification).

(ii) % ‘correct’ reclassification

∗ Defined as an off diagonal cell is correctly reclassified if % events in the

cell is closer to r(X , Y ) category label than to r(X ) label.

∗ This will be 100% in large samples if Y is a risk factor.

(iii) Reclassification calibration statistical tests

∗ Not traditional use of the term calibration

∗ Compare event rate in off diagonal cell with average (r(X )) and average

(r(X , Y )).

∗ H0 regarding r(X ) model always rejected in large samples if Y is a risk

factor. RCC statistic is equal to Pearson chi-squared test when X = φ.

∗ H0 regarding r(X , Y ) model rejected at nominal level in large samples.

RCC statistic is equal to Hosmer-Lemeshow test when X = φ.

2.24

Page 36: Current Methods for Evaluating Prediction Performance of

FYRP

Calibration Reminders

• r(X ) and r(X ,Y ) are well calibrated is a basic assumption

P(D = 1|r(X )) = r(X ) and P(D = 1|r(X ,Y )) ≈ r(X ,Y )

• Additional requirement for risk reclassification tables is that

within categories of r(X ), r(X ,Y ) is well calibrated∗

P[D = 1|r(X ,Y ), rcat(X )) ≈ P(D = 1|r(X ,Y )) ≈ r(X ,Y )

∗ This follows directly if r(X ,Y ) = P(D = 1|X ,Y ).

2.25

FYRP

Implications for Cook and Ridker Analysis Strategy

• P(D = 1|r(X ,Y ) ∈ A, r(X ) ∈ B) =

= E{P(D = 1|r(X ,Y ))|r(X ,Y ) ∈ A, r(X ) ∈ B}

= E{r(X ,Y )|r(X ,Y ) ∈ A, r(X ) ∈ B)

• Event rate in off diagonal cell = average (r(X ,Y )|cell)

– in interval A, not in interval B

– H0 regarding r(X ,Y ) model holds

– H0 regarding r(X ) model does not hold

2.26

Page 37: Current Methods for Evaluating Prediction Performance of

Summary of Unit 2

To compare two risk models

• Choose your favorite measure(s) of prediction performance,

report for each model and compare

• risk reclassification analyses that do not focus on marginal

performance of each model (NRI , reclassification calibration,

% reclassification) can be misleading.

Stratifying on risk categories from a baseline model

• Evaluating r(X ,Y ) within baseline risk strata can be helpful to

identify subpopulations where information on Y may be most

useful.

• Risk reclassification tables have a role for this purpose.

2.27

Unit 3

Inference

3.1

Page 38: Current Methods for Evaluating Prediction Performance of

• Not just for Pepe’s preferred statistics.

• Reviewers may demand certain analyses such as ΔAUC ,

ΔMRD, NRI , etc.

3.2

Analysis Strategy for a Single Model

(i) Calibration assessment via predictiveness curve or calibration

curve.

(ii) Plots of distributions of risk in cases and in controls and

relative utility curves (or decision curves).

(iii) Single threshold statistics: HRC (t), HRN(t), RU(t),

HRC (t) − HRN(t).

– analogues for several thresholds or risk categories.

(iv) Continuous summary statistics: AUC , pAUC , MRD, AARD

(v) Given specified criterion for % cases classified as high risk (tc)

or % controls classified as high risk (tn), or % population

classified as high risk (tp): ROC(tn), ROC−1(tc), L(tp), L−1(tc).

3.3

Page 39: Current Methods for Evaluating Prediction Performance of

Analysis Strategy for Comparison of Two Models

(i) Plots of case and control risk distributions and of relative utility

aligned to enable comparisons between models.

(ii) Single threshold comparison statistics along with event and

non-event reclassification tables: ΔHRc(t), ΔHRN(t), ΔRU(t),

ΔHRC (t) − ΔHRN(t) = NRI (t), % reclassification, risk

recalibration statistics.

– Analogues with multiple risk categories.

(iii) Continuous comparison statistics: ΔAUC , ΔpAUC , ΔMRD

(a.k.a. IDI), ΔAARD(a.k.a. ΔTG), NRI (event, non-event,

overall).

(iv) ΔROC(tn), ΔROC−1(tc ), ΔL(tp), ΔL−1(tc )

3.4

Table 1. Estimates of performance with 95% confidence intervals. (Stata programs incrisk, incroc)

Risk Model

r(X ) r(X ,Y ) Comparison

HRC (0.20) 65.2%(62.0,68.1) 73.5%(70.9,76.0) 8.4%(6.1,10.8)

HRN (0.20) 8.9%(8.2,9.7) 8.4%(7.7,9.1) −0.5%(1.0,0.0)

RU(0.20) 45.5%(41.9,48.8) 54.9%(51.9,58.0) 9.4%(7.0,12.2)

HRC (.2) − HRN (.2) 56.3%(53.2,59.1) 65.1%(62.5,67.6) 8.8%(6.5,11.5)

Cat-NRI (0.2) — — 0.088(0.65,0.115)

% reclassification — — 7.8%(6.8,8.7)

AUC 0.884(0.873,0.895) 0.920(0.91,0.929) 0.036(0.030,0.043)

MRD/IDI 0.322(0.298,0.350) 0.416(0.391,0.441) 0.094(0.081,0.109)

AARD 0.599(0.574,0.628) 0.671(0.650,0.696) 0.073(0.050,0.098)

Cont-NRI (event) — — 0.388(0.339,0.430)

Cont-NRI (non-event) — — 0.411(0.369,0.447)

Cont-NRI — — 0.799(0.724,0.865)

ROC(0.10) 0.672(0.643,0.703) 0.758(0.732,0.786) 0.087(0.059,0.111)

ROC−1(0.70) 0.115(0.098,0.133) 0.070(0.057,0.081) −0.044(0.058,0.034)

L(0.15) 0.658(0.634,0.687) 0.735(0.711,0.761) 0.077(0.055,0.097)

L−1(0.70) 0.174(0.158,0.191) 0.135(0.121,0.145) −0.040(0.053,0.030)

3.5

Page 40: Current Methods for Evaluating Prediction Performance of

Estimation

Simple and Most Common Approach

• Cohort data = {Di ,Xi ,Yi ; i = 1, ... n}

• Fit models (e.g., with logistic regression); r̂(X ) and r̂(X ,Y )

• Empirical estimates of performance measures substituting

r̂(Xi ) and r̂(Xi ,Yi ) for r(Xi ) and r(Xi ,Yi ).

Issues

• Optimistic bias due to developing models and assessing

performance with the same data.

• Variability in r̂ must be acknowledged in assessing variability of

estimated performance measures.

3.6

Variability in r̂

• Is a substantial component of var(M̂ (̂r)) where M̂ denotes

estimate of performance measure.

• Often ignored in variance expressions.4,21,22,24

• But not always ignored in variance expressions.20

• Addressed in bootstrap resampling for variance and

confidence interval calculations by refitting the risk models in

each resampled dataset.

3.7

Page 41: Current Methods for Evaluating Prediction Performance of

Table 2. Confidence interval widths with and without refitting the model in each

bootstrap sample. Confidence limits are 2.5th and 97.5th percentiles of the bootstrap

distribution. Threshold t = 20%.

Estimate Without Refitting With Refitting % Increase

ΔAUC 0.036 0.012 0.013 8.3%

Cont-NRI 0.799 0.151 0.138 20.0%

ΔMRD/IDI 0.094 0.020 0.028 40.0%

ΔAARD 0.073 0.045 0.048 6.6%

Cat-NRI (0.20) 0.088 0.049 0.050 2.0%

Reclassification % 7.800 1.060 1.910 80.2%

ΔNB(0.20)% 0.960 0.550 0.520 −5.5%

3.8

Optimistic Bias (Overfitting Bias)

The Problem

• The fitting procedures find r̂(X ) to maximize an estimated

measure of prediction performance (or one that is related to

measures of prediction performance).

• Well documented that performance estimates are better than

true performance.

• A major problem when there are many candidate predictors.

3.9

Page 42: Current Methods for Evaluating Prediction Performance of

Optimistic Bias (Overfitting Bias)

Partial Solutions

• Cross validation: include r̂−i (Xi ) in the estimate of

performance, where −i denotes all but i th observation used to

fit the model.

– But the fitted model r̂(X ) may still be problematic, poorly

calibrated.

• Training-test dataset split: fit r̂(X ) on training set and evaluate

on the test set.

– But the fitted model r̂(X ) may still be problematic.

– Probably not well calibrated on the test set if dim(X )large.

3.10

Optimistic Bias (Overfitting Bias)

Solutions

• Use shrinkage procedures in fitting r̂(X ) to the training set and

evaluate on the test set.

– Requires artistry/expertise.

• Derive preliminary r̂∗(X ) from the training set, then fit a model

to r̂(X ) ≡ P[D = 1|S(X )] where S(X ) = r̂∗(X ). Evaluate

performance with the test set. This is recalibration.9,10

– Since S is one-dimensional, overfitting is not a big problem

(see below).

– If P[D = 1|S(X )] �= P[D = 1|X ], prediction performance is

not optimal.

3.11

Page 43: Current Methods for Evaluating Prediction Performance of

• For low dimensional predictors overfitting is not a big problem.

• Evidenced by comparing cross-validated performance with performance

not cross-validated for our data.

r(X ) r(X ,Y ) Δ

CV not CV CV not CV CV not CV

HRC (0.20) 0.650 0.652 0.735 0.735 0.086 0.084

HRN(0.20) 0.089 0.089 0.084 0.084 −0.005 −0.005

RU(0.20)% 45.3 45.5 54.9 54.9 9.69 9.41

Cat − NRI (0.2) — — — — 0.091 0.088

AUC 0.883 0.884 0.920 0.920 0.036 0.036

MRD 0.322 0.322 0.416 0.416 0.094 0.094

AARD 0.599 0.599 0.673 0.671 0.074 0.073

3.12

Hypothesis Testing

• Testing the null hypothesis of no performance improvement is

redundant if Y has already been shown to be a risk factor.25

H10 : r(X ,Y ) = r(X ) with probability 1

⇔ H20 : ROCX ,Y (f ) = ROCX (f ) ∀ f

⇔ H30 : AUCX ,Y = AUCX

⇔ H40 : MRDX ,Y = MRDX

⇔ H50 : HRcase

X ,Y (t) = HRcaseX (t) and HRcontrol

X ,Y (t) = HRcontrolX (t) ∀ t

etc.

• Presented yesterday.

3.13

Page 44: Current Methods for Evaluating Prediction Performance of

Hypothesis Testing

• Properties of tests of H0 using ΔAUC , ΔMRD, NRI are very poor.25,26,27

– Do not have 5% rejection rate under H0.

Table 3: Data simulated under the null hypothesis of H0 : no improvement.

Shows are rejection rates (%) compared with nominal level of 5%. Sample

size: 1000 subjects, 10% prevalence

without adjustment bootstrap including

for var(β̂) refitting β̂

ΔAUC 0.1% 2.0%

ΔMRD 3.3% 4.1%

Cat-NRI (ρ) 11.9% 0.0%

Cont-NRI 6.3% 2.5%

3.14

Hypothesis Testing

Redundancy and poor statistical properties of tests based on

performance measures.

⇒ use standard tests of H10 : r(X ,Y ) = r(X ) to address the null

hypothesis of no performance improvement.

• If H10 is rejected, estimate performance measures with

confidence intervals.

3.15

Page 45: Current Methods for Evaluating Prediction Performance of

Example

• Test of βY = 0 in logitP(D = 1|X ,Y ) = β0 + βXX + βY Y

rejected with p =< 0.001 using Wald test.

estimate confidence interval

ΔHRC 8.4% (6.1,10.5)

ΔHRN(0.2) −0.5% −(1.0,0.0)

ΔRU(0.2) 9.4% (7.0,12.2)

ΔAUC 0.036 (0.030,0.043)

ΔMRD 0.94 (0.081,0.109)

ΔAARD 0.073 (0.050,0.098)

Cont-NRI 0.799 (0.724,0.865)

3.16

Hypothesis Testing

• Consider H0: performance improvement ≥ minimal level.

e.g., ΔNB ≥ .01 means that we require that by measuring Y

on 100 subjects we gain the equivalent of correctly identifying

1 case without identifying any controls (i.e., 1 unit of benefit).

• May be useful in sample size calculations

P[lower confidence limit > minimal|anticipated performance] ≥ 80%

⇒ sample size calculated with simulations.

3.17

Page 46: Current Methods for Evaluating Prediction Performance of

Nested Case-Control Designs

• EDRN DCIS study

• (Xi ,Di ) i = 1, ... n for a cohort

(Yi ) for nC cases and nN controls chosen randomly from the

cohort.

• When ascertaining Y is costly, this can be cost efficient.

3.18

Nested Case-Control Designs

Estimation of performance improvement measures requires

estimating

1. ρ = P(D = 1)

2. r(X )

3. case and control distributions of r(X ):

P(r(X ) > t|D = 1) and P(r(X ) > t|D = 0)

4. r(X,Y)

5. the case and control distributions of r(X ,Y ) :

P(r(X ,Y ) > t|D = 1) and P(r(X ,Y ) > t|D = 0)

1., 2., and 3. are estimable from cohort data. 5. is estimable from

case-control data when r̂(X ,Y ) is available.

3.19

Page 47: Current Methods for Evaluating Prediction Performance of

Nested Case-Control Designs

Estimation of r(X ,Y )

• requires cohort and case-control data

logitP[D = 1|X ,Y ] = logitρ̂cohort − logitρ̂case−control

+ logitP[D = 1|X ,Y , case-control]

3.20

Nested Matched Case-Control Designs

• Select controls to match cases on g(X )

• Efficient for testing H0 : r(X ,Y ) = r(X )

• Estimating r(X ,Y ) requires adjustment for matching

logitr(X ,Y ) = {logitr(X ) − logitrcc(X )} + logitrcc(X ,Y )

where cc denotes case-control sample.

• Estimating P(r(X ,Y ) > t|D) requires re-weighting the

distributions in case-control matching strata.

P(r(X ,Y ) > t|D) =∑g(X )

P̂(r(X ,Y ) > t|D, g(X ))P(g(X )|D)

• Work in progress. . . .

3.21

Page 48: Current Methods for Evaluating Prediction Performance of

Summary of Unit 3

(0) Fit complex models on a training set. Recalibrate on test set.

(1) Acknowledge variability in r̂ when calculating confidence

intervals or p-values regarding performance measures.

(2) Testing H0: no improvement, best achieved with standard

methods for testing if Y is a risk factor.

(3) Calculate CI for Δ performance measures and test hypotheses

concerning meaningful gain in performance.

(4) Methods can accommodate nested matched case-control

designs.

(5) Stata software on DABS website (incrisk, incroc and

predcurve programs).

3.22

Important Issues not Addressed in this Course

1. Event time outcomes with censoring

• Define D ≡ [event withinT years].

• Concepts should remain the same.

• Estimation is complicated by censoring.

• See work by Y. Zheng, T. Cai and others.

3.23

Page 49: Current Methods for Evaluating Prediction Performance of

Important Issues not Addressed in this Course

2. What if benefit and/or cost of treatment depend on (X ,Y )?

• We assumed B(X ,Y ) = B and C(X ,Y ) = C in calculating net

benefit.

• Otherwise risk thresholds depend on (X ,Y ).

• Analyses are more complicated but concepts of net benefit and

% cases and % controls classified as high risk are the same.

• A RCT will generally be needed to determine (X ,Y ) specific

decision rules and overall benefit of using them.

• Methods for evaluating treatment selection markers is an active

field.28

3.24

Important Issues not Addressed in this Course

3. Decision rules incorporating sampling variability in r̂

• e.g., treat if lower confidence limit of r̂ > t

• All methods apply to ll (̂r), e.g., AUC , MRD, HRC (t), HRN(t),

RU(t), etc.

3.25

Page 50: Current Methods for Evaluating Prediction Performance of

Software

DABS website

• Diagnostics And Biomarkers Statistical Center

• Google or Bing “DABS center” to locate the website

• datasets and software

• contributions are welcome

Stata programs

• incrisk

• incroc

• predcurve

3.26

Page 51: Current Methods for Evaluating Prediction Performance of

Stata Code

Preliminary Help files for three commands: predcurve, incrisk, incroc.

To obtain code and current help files within Stata, type:

net from http://labs.fhcrc.org/pepe/stata

and install the risk prediction and pcvsuite packages.

Page 52: Current Methods for Evaluating Prediction Performance of

� � � � � � � � � � � � � � � � � � � � � ��

� ��

� � � �� �

� � � ��

� � �� � �

� � �� � �

� � ��

� � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � ! " � � � �

# $ % & �

� � � � � � � � � ' � � ( � � � � � � ) � � �

* + , % - .

� � � � � � � � � ( � � � � � � � � � � � � � � � � / � � � � � � � 0 1 / 2 3 1 / 2 4 1 / 5 6 7 8 2 6 4 9 1

6 7 8 2 6 4 9 � � � � � � � �

� � : � � � ; � � � � & $ , < = 3 > 4 ? 8 2 6 4 @ & A B $ % � ( � � � ) � � � � � � � A C $ % � � : � ) � � � �� $ D < & = 7 @ � � E � � � � : � � � � � � � ( �� $ D < F = 7 @ ) � � � � � � : � � � � � � � ( �� A � - � = G H I J 2 9 8 @ � � � � � � K � � � � � � ) ( � � � � � � � : ; � ( � � �� $ � � ) ( � K � � � � � � � � � � ( � � � � � � � � � � � � � � : L M ' � L � ( N ' �

� � � ; � � � �

O � � P � � � � � � ( Q ) � ; � �� F A = R @ � � � � � � ) � � � � � ( � � � � � � � � � � � � � � ) � ( K � � � � � � � ( E � �

� � P � � � � � ( � � � � � � ) � ( � � � � � � � ( � � S � � ) ; � ( K �( � � � ) � � �

T � � � �+ & $ U = R @ ) � � � � � � ; � � � � � � � � � P � V � � � � �� & - D D � � � � M ' � W N ' � ) � � � � � � � ( � � � � �A X X D � % = R @ � � � � � O Y � � � � � � � � ; � � : � � � � � � � ( � ( � � � ) � � � � Z� & - D D [ A X X D � % = R @ � � � � � V � V � � � � � � � � � N ' � � ( M ' � O Y \ � ( � � � ) � � � � 0 ] �

" � � � � � � � � � � � � , D - U � = R @ ) ; K � � � � K � � � � � � � � ; � � � ^ ( � � � ) � � � � �� & � D % � � = G H I J 2 9 8 @ � � � � � K � � � ( � � � � � � S K � � � � � � � � � � ; � � � S � ) � � � & � � � & = R @ � � � � � ( � � � � � � � ^ M � � ( � � � ) � � � & � � � & = _ ` @ � � � � � K � � �

� � � � �a � E � � � � � K � � � � � � � � � � ( � S �

B � , � � � � - � D S � � � � � � � E � � � � � K � � � � � � � ( ; � ( � � � � � � � �

b � D � � $ � % $ A ,

� � � � � � � � � � � � � � � � ( � � � � � � ) � � � L � � � � � � � � � � � � � ; � � � ( ( � � � � � � : � � � � �� � : ( � � � � K ) � � � � � ; � � � � � � ( � � � � � : � � � ; � � � � � � K � � ( � � S � � � � � � c � ( � � � � �

K � � � � ; � ( � � � � ( � � � � � � : � � � ) � � � � � � � � � � � � � � ( � � � ) � ) ; � � : � �� � � � � K � � � � L 8 d 9 8 e G H I f / � ( 8 d 9 8 e G H I g 1 � h 2 9 d H 9 d e G H I � � � � ( � � � � � ( � � � � � ) � ( � � �� � � ; � ( � � ( � � � ( � � � � � � � K � � � � ( ( � � � � � � � � � � � � K � � � � � � � � ( E � � � � A � - � = G H I J 2 9 8 @ �

K � � � ) ( � ( E � � � � � � ; � � : � � � � � � � K � � � � � � � � : ; � ( � � �� � : � � � � � � � � � ( � � � � � � � � � � � � � � � � � � � ( � � S � � ( � � � � � E � � : � � � � � � � ( L

� $ D < F = 7 @ � ( � $ D < & = 7 @ L � � � � � � � � � � � � � � ) ( � ( � � � � � � � � �

� � ( ( � � � � � � � � � � � � � � � � � ) � � ( � � � � � � � � � � � � � � � � � L M ' � � ( N ' � L � � ) � � � � � � � � � � : ( � � � � K ) � � � � � � � � � � � � � � � ) ( � ( � � � � � � � ; � � � � ( � � � � � � � � � � � � � � � � ( � S � � � � � � � � ( � � : � � � � � � � ( �� � : � � ) � � � � � � ) ; � � � � � � � � ; � � � S ( � � S K � ( � � � ) � � � � � � � � � � � � � � � � P � � � � � ( � � � E � � � K � � ; � � � � � ( � � � � � � � � ) � � � � � � � � � � � � � � � ( � � � � � � � � � � � (

E � � � � F A = R @ L � ( K � � � � � � � � ; � � � � � � � � � � � � � O Y � � ) � � � � � E � � � K � ( � � E � � � � � � � � �� � � ; � � � ( � � � � � �

Page 53: Current Methods for Evaluating Prediction Performance of

i j k l m n o

p q r s t r u q v w u q x yz l n { | } ~ � � � � � � � � � t � q � q t r u � t � q y w � � � � � � q y s � � y � u q x y � } ~ � � � � � � x � u q x y r q y � � � � t z m � l k

� u � t � t � w � � u � w y � j � m � l k �

� l o { z | � � � x � t � � q r s u � � t r � x � � � � w � � � � w u t u � t � t � � t y u w � t x � r � � � t � u r � q u � � q r s � � � w � r x� w � � � � w u t u � t � t � � t y u w � t r x � � w r t r �   ¡ p � w y � � x y u � x � r � ¢ ¡ p � � q u � � q r s £ � q � ¤ z ¥ o oq r r � t � q � q t � �

� l o { ¦ | � � � � � t � � q r s u � � t r � x � � � � w � � � � w u t u � t � t � � t y u w � t x � r � � � t � u r � q u � � q r s £ � � w � r x� w � � � � w u t u � t � t � � t y u w � t r x � � w r t r �   ¡ p � w y � � x y u � x � r � ¢ ¡ p � � q u � � q r s £ � q � ¤ z ¥ o oq r r � t � q � q t � �

¤ m § ¥ � | ¨ © ª « � ¬ � � � x ­ w � q w � � t r u x q y � � � � t q y u � t � q r s v x � t � �

� w r t ® � x y u � x � w � � � r u v t y u� ¦ m | ¯ �   � t � x � � � w u q x y � q r t w r t � � t ­ w � t y � t r � x � � � � t r � t � q � q t � � � t y � w r t ® � x y u � x � � w u w w � t

� r t � � � � x � q � � � t � r t � � x � � q r s � w � � � � w u q x y w � � � r u v t y u ° w y � � x x u r u � w � r w v � � q y � � x �x � u q x y w � � ± ² r � q � � � x y t r t � w � w u t � � � � x v � w r t w y � � x y u � x � r w v � � t r � ³ y � � u � t z m � l k� � � � q y s v w � � t � r t � � q u � � w r t ® � x y u � x � � w u w � q y � � � � q y � u � t z l n { | j � m � l k � � q u � u � t

� ¦ m | ¯ � x � u q x y � q � � � t u � � y w y t � � x � � p q r s � w � � � � w u q x y r w r r � v t w � x � x � u r w v � � q y �� t r q � y � � � t � w � � u �

� � w � � x � u q x y r

¤ z ¥ o o r � t � q � q t r u � w u   ¡ p w y � ¢ ¡ p � � � ­ t r r � x � � � � t � � x u u t � �

´ z l µ | ¯ � r t u u � t � � � t � � q v q u � x � � ® w ¶ q r r � w � t � � � r u � t � t u � t t y · w y � ¸ � ± � y x ur � t � q � q t � ° u � q r q r ¸ · ¹ � w � � t � u � w y u � t � w � � t r u x � r t � ­ t � � q r s t r u q v w u t �

¤ l q y � q � w u t r u � w u � x x u r u � w � � t � � t y u q � t ® � w r t � � x y � q � t y � t q y u t � ­ w � r � x � � q r s � t � � t y u q � t rw u r � t � q � q t � u � � t r � x � � r � l o { ¦ | � � x � � l o { z | � � � t � w � � � � w u t � � � ± ² r � x �   ¡ p w y � ¢ ¡ p

� q � � � t � w � � � � w u t � � x � u � t r � t � q � q t � u � � t r � x � � r q � u � t ¤ z ¥ o o x � u q x y q r q y � � � � t � �m º º o » k | ¯ � r � t � q � q t r u � t x � � r t u � � x v r � t � q � q t � � q r s u � � t r � x � � r � x � � � w � t v t y u x � ¼ y � � ±

q � ¼ v w � s t � r w � t r � t � q � q t � ° � x � w ­ x q � w y � t x � r � � t � q v � x r t � q y u t � ­ w � � w � r �   � tw � � � v t y u v � r u � t � t u � t t y · w y � � · ¼ � � t � w � � u q r m º º o » k | ½ ¾ ¾ ¿ � � À q r w � � x � x � u q x y x �u � t � w ¶ q r � w y � t q � ´ z l µ | ¯ � q r q y � � � � t � �

¤ z ¥ o o Á m º º o » k | ¯ � r � t � q � q t r u � t ¶ ® w ¶ q r � � q r s � t � � t y u q � t � x � � r t u � x � � � w � t v t y u x �   ¡ p w y �¢ ¡ p � ± ² r q y x � � t � u x w ­ x q � r � � t � q v � x r t � q y u t � ­ w � � w � r �   � t w � � � v t y u v � r u � t

� t u � t t y · w y � ¼ � � t � w � � u q r m º º o » k | ½ Â Ã � � � t � t ­ w y u x y � � q � � x u � u � t ¤ z ¥ o o w y � ¤ lx � u q x y r w � t r � t � q � q t � �

Ä x x u r u � w � x � u q x y r

  � t r t x � u q x y r w � t � t � t ­ w y u x y � � q � u � t ¤ l x � u q x y q r r � t � q � q t � �n o ¥ µ j | ¯ � r � t � q � q t r u � t y � v � t � x � � x x u r u � w � r w v � � t r � x � � q r s v x � t � t r u q v w u q x y � t

� t � � x � v t � u x x � u w q y � x y � q � t y � t q y u t � ­ w � r �   � t � t � w � � u q r ¸ · · · � t � � q � w u q x y r �

¤ z Å o k » � | ¨ © ª « � ¬ � � r � t � q � q t r ­ w � q w � � t r q � t y u q � � q y � � x x u r u � w � � t r w v � � q y � � � � r u t � r � � t tu � t � � � r u t � x � u q x y x � u � t � x x u r u � w � � x v v w y � �

z » § » z | ¯ � r � t � q � q t r u � t � x y � q � t y � t � t ­ t � ° w r w � t � � t y u w � t ° � x � � x y � q � t y � t q y u t � ­ w � r �   � t� t � w � � u q r z » § » z | Æ Ã � x � w r r t u � � r t u � t ­ t � �

Page 54: Current Methods for Evaluating Prediction Performance of

Ç È É Ê Ë Ì Í Ë Î Ï È Ð Ñ Ò Í Ð Ó Ô

Õ Ö × Ø Ù Ú Ú Û Ù Ü Ý Þ ß à á â à ã ä å È Ó È Ì Ë Ò È Ó È É Ê Ë Ì Í Ë Î Ï È Ô æ Ñ ç è æ é Ð Ì È Ë ê ç ë Ë Ì ì È Ì Í Ó Ò ç È í ß î í ï ð ñ ò ó â î íÒ Ð ç Ð Ï ô Ñ Ì È ô Í ê Ò È ô Ñ Ì Ð Î Ë Î Í Ï Í Ò Í È Ô é Ð Ì È Ë ê ç Ô õ Î ö È ê Ò Î Ë Ô È ô Ð Ó Ò ç È Î Í Ó Ë Ì ÷ Ì Í Ô ì ë Ð ô È Ïé Í Ò ø Ç È É Ê Ë Ì Í Ë Î Ï È Ó Ë ë È Ô Ë Ì È Ó õ ë Î È Ì È ô ù è ú Ë ê ê Ð Ì ô Í Ó å Ò Ð ë Ë Ì ì È Ì Ê Ë Ì Í Ë Î Ï È Ð Ì ô È Ì Í Ó Ò ç È

í ß î í ï ð ñ ò ó â î í

Ù Ö Ø û Û ü Ö Ì È ý õ È Ô Ò Ô Ò ç Ë Ò È þ Í Ô Ò Í Ó å Ê Ë Ì Í Ë Î Ï È Ô Þ ÿ � Î È Ð Ê È Ì É Ì Í Ò Ò È Ó Î ÷ Õ Ö × Ø ü Ú Ð Ì Õ Ö × Ù � ü Ú Û Ù Ü ø

� Û Ú Ö � Ù Ö Ü � û � Ü� é Ì Í Ô ì Ò ç Ì È Ô ç Ð Ï ô Ð Ñ Ò Í Ð Ó Ô æ Ù � Ü � û Ý Þ ä Ð Ì Ù � Ü � Ý Þ ä Ë Ì È Ô Ñ È ê Í é Í È ô æ Ø Ù Ö � ü � Ù Ú Ö Ô Ë Ê È Ô Ò ç È

é Ð Ï Ï Ð É Í Ó å Í Ó Ù Ý ä � ê Ë Ï Ë Ì Ô

Ù Ý � � � � � � ä Ï Ð É È Ì Ò ç Ì È Ô ç Ð Ï ô Ì Í Ô ì Ñ È Ì ê È Ó Ò Í Ï È È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � øÙ Ý � � � � � � ä õ Ñ Ñ È Ì Ò ç Ì È Ô ç Ð Ï ô Ì Í Ô ì Ñ È Ì ê È Ó Ò Í Ï È È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � ø

� é Ò ç È ê Ï Ë Ô Ô Ð Ñ Ò Í Ð Ó æ ü û Û Ü Ü æ Í Ô Ë ô ô Í Ò Í Ð Ó Ë Ï Ï ÷ Ô Ñ È ê Í é È ô æ � � � Ë Ó ô � � � È Ô Ò Í ë Ë Ò È Ô é Ð Ì Ò ç ÈÔ Ñ È ê Í é Í È ô Ò ç Ì È Ô ç Ð Ï ô Ô Ë Ì È Ì È Ò õ Ì Ó È ô Í Ó

Ù Ý � � � � � � ä Ï Ð É È Ì Ò ç Ì È Ô ç Ð Ï ô � � � È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � øÙ Ý � � � � � � ä õ Ñ Ñ È Ì Ò ç Ì È Ô ç Ð Ï ô � � � È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � øÙ Ý � � � � � � ä Ï Ð É È Ì Ò ç Ì È Ô ç Ð Ï ô � � � È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � øÙ Ý � � � � � � ä õ Ñ Ñ È Ì Ò ç Ì È Ô ç Ð Ï ô � � � È Ô Ò Í ë Ë Ò È æ ë Ë Ì ì È Ì Ó õ ë Î È Ì � ø

� é Ò ç È ü � Ð Ñ Ò Í Ð Ó Í Ô Í Ó ê Ï õ ô È ô æ Î Ð Ð Ò Ô Ò Ì Ë Ñ Ñ Ð Ô Ò È Ô Ò Í ë Ë Ò Í Ð Ó Ì È Ô õ Ï Ò Ô Ï È é Ò Î È ç Í Ó ô Î ÷ Î Ô Ò Ë Ò Ë Ì ÈË Ê Ë Í Ï Ë Î Ï È ø

� È Ò õ Ì Ó È ô ë Ë Ò Ì Í ê È Ô Í Ó ê Ï õ ô È Ö Ý ü � � Ø Ö Ù ü Ö × � � û Ö ä � þ ì ë Ë Ò Ì Í þ Ð é Î Ð Ð Ò Ô Ò Ì Ë Ñ Ñ È Ì ê È Ó Ò Í Ï È � � � Ô æ É ç È Ì È ì � ù è ë Ë Ì ì È Ì Ô ú �

ù è Ò ç Ì È ç Ð Ï ô Ô ú � ù è È Ô Ò Í ë Ë Ò Ð Ì Ô ú æ Ë Ó ô Ì Ð É Ô ê Ð Ì Ì È Ô Ñ Ð Ó ô Ò Ð õ Ñ Ñ È Ì Ë Ó ôÏ Ð É È Ì Î Ð õ Ó ô Ô ø � Ë Ò Ì Í þ ê Ð Ï õ ë Ó Ó Ë ë È Ô Í Ó ô Í ê Ë Ò È È Ô Ò Í ë Ë Ò Ð Ì æ ë Ë Ì ì È Ì è æ

Ë Ó ô Ò ç Ì È ç Ð Ï ô ø

! Û " Ø û Ö Ü

# � Ü Ö � � Ø $ % % û Û & Ü # ' ü Ù ü # � Ù Õ % Ø Ö Ø Ö % � Û � Û % ( Û × Ü Ü Ö × Ü � ü ) ü û Ö Û Ù

# Ø Ù Ö � ü � Ù Ú Ö � û � Õ Ü ü Ù

# Ø Ù Ö � ü � Ù Ú Ö � û � Õ Ü ü Ù & " �# Ø Ù Ö � ü � Ù Ú Ö � û � Õ Ü ü Ù & " � ) Ù � Ü � Ý # * + ä

# Ø Ù Ö � ü � Ù Ú Ö � û � Õ Ü ü Ù & " � ) Ù � Ü � Ý # * + ä Ù � Ü � û Ý # , + ä ü û Û Ü Ü

# Ø Ù Ö � ü � Ù Ú Ö � û � Õ Ü ü Ù ) ü � Ú Ý Û Õ Ö - Ø Ö Ù & " � & Ù � � � Ú Û Ü Õ Ö × � Ö Ù ä Ù � Ü � Ý # * + ä Ù � Ü � û Ý # , + ä ü û Û Ü Üü �

# û � Õ � Ü � � ü � Û Õ Ö - Ø Ö Ù & " � & Ù � � � Ú Û Ü ü Õ Ö × � Ö Ù ) ü � Ö '# Ø Ù Ö � � ü � " � � , ) ! &# û Û Ú Û Ù " � � , . Ù � Ü � Ý / ä .# û � Õ � Ü � � ü � Û Õ Ö - Ø Ö Ù & " � & Ù � � � Ú Û Ü ü Õ Ö × � Ö Ù û � Õ Ü ü Ù ) ü � Ö '# Ø Ù Ö � � ü � " � � 0 ) ! &# û Û Ú Û Ù " � � 0 . Ù � Ü � Ý / ) 1 ä .# Ø Ù Ö � ü � Ù Ú Ö � " � � , " � � 0 ) Ù � Ü � Ý # * + ä Ù � Ü � û Ý # , + ä ü û Û Ü Ü

Page 55: Current Methods for Evaluating Prediction Performance of

2 3 4 5 6 7 8 4 9 5 6 : ; 6 < : ; 6 = > 4 ? @ A B C 2 D E F 4 ? @ A G C 2 < E F 7 G H @ @ I G ? : C 2 J FK 5 L 5 4 5 M 7 5 @N 8 O B ; 4 @

P Q R S T U V W X U V Y Z R [ \ ] ^ X _ ` a V b U V c Q V _ [ R d [ b [ Q R _ ` c [ V X [ R Y e [ Q X X f [ Y g h iW f U V W X U V j k ` _ R _ i U R Wl Q R W Q R [ X m [ n [ Y Z R [ \ ] ^ X _ ` a V b U V c Q V _ [ R d [ b [ Q R _ ` c [ V X [ R Q V \ o V a p [ R b a X S U k g Q b ` a V W X U V Ye [ Q X X f [ Y g h iq b n [ n [ j ^ i r Q b ` a V W X U V i [ \ ^

Page 56: Current Methods for Evaluating Prediction Performance of

s s s s s s s s s s s s s s s s s s s t u vw s s w s s s s w w s s s s ws s s w w w s s s w w w s s s w

x y z y { | y { } | w ~ z y z � � z � � | { |

� � � � � � � � � � � t � � � | { � � � � � � � v� � � � �

� � � � � � � � � } � � � � � y z � � z � � � � � � � � � � � � � � � z � � � � | � � � � � � { } y � � | � � � z y { � � y � z � { | y� � � � { | y { � � � � � � { } y � � | �   � z � � z y { � � { | ¡ { y ¢ � � | � � } y y � � z � { � � | � { | �{ � � � � � � � � � y � � z | � � � | �

£ ¤ � � ¥ ¦� � � � � � � § ¨ © ª « © ª ¬ ­ « ® ¯ ¨ ° ± ¯ ¨ ² ± ³ ¦ ´ ­ « ® µ ¨ © ¶ · ¤ ´ ­ « ® µ ¨ © ¶ · ¯ ¸ ¹ ¶ ¨ ¸ ² © ±

¸ ¹ ¶ ¨ ¸ ² © ~ � | } � { � y { � �º � � � � |» ¦ ´ ­ « ® µ ¨ © ¶ · ¼ z | � � � � � � � � � � { } y � � � z � { z ¼ � � |» ¤ ´ ­ « ® µ ¨ © ¶ · z � � { y { � � z � � z � � � � � � � � � � { } y � � � z � { z ¼ � � |½ � y { � � z � x y z y { | y { } |� ¾ ¾ � ¿ � � � u � } � z | | { � { } z y { � � } z � { ¼ � z y { � � | y z y { | y { } �À Á º z � y � � � z y { � � | � à ´ ¶ Ä ¹ ª · | � � } { � � À Á º � � � � � y � � � Å � ¾  � � t � � � z � � y v Æ � ¾ � � � ¾ � Æ � � � ¾ Â Ç � �

u { | � } � z | | { � { } z y { � �� � ¥ � ´ ² È É µ ¨ © ¶ · | � � } { � � � � y � Ê y ¢ � � | ¢ � � � | � � � � { | � } z y � � � � � } � z | | { � { } z y { � � �� ¥ � �  ¾ � � � ¥ � { � } � � � � } z y � � � � { } z � � { | � } � z | | { � { } z y { � � y z ¼ � � | z � � | y z y { | y { } |� � � � { � � � ¥ � ´ · { | � � y | � � } { � { � � � ~ � � { � � y ¡ � } z y � � � � { � | ¡ { y ¢| { � � � � y ¢ � � | ¢ � � � ® Ë ¸ Ì � { | � z | � � � � � z � � � } � �� � ¾ ´ ® · � � � � � z y { � � � { | � z | � � � � � z � � � } � Å | z � � � � � � � � z � � � } � { | � | � � ¼ �� � � z � � y � u � Í � { � � � { � } z | � Î } � � y � � � | z � � � { � � { | | � � } { � { � � �

¿ � � | | Î � z � { � z y { � �� � � � � � ´ Ï · | � � } { � � � � � ¼ � � � � � z � � � � � z � y { y { � � | � � � � Î � � � � } � � | | Î � z � { � z y { � � Å� � � z � � y { | � � � � � � ´ Ð Ñ ·� ¾ � Ò � � { y } � � | | Î � z � { � z y { � �x � � � z � � | y z y { | y { } |� ¾ ¥ Ó � � � { y } z � } � � z y { � � � � y ¢ � z � � z � � � � � u ½ ¿ } � � � � t � Ô ¿ v� Õ ´ ° ¹ · Á � � � � Ö � � z | � � � z y | � � } { � { � � � � � � � z y { � � � � � � � � y { � � ° ¹ Å � � � z � � y { |� Õ ´ × Ð Ø ·� Õ � � Ò ´ ° Ù · { � � � � | � Á � � � � Ö � � z | � � � z y | � � } { � { � � } z | � � � � � � � y { � � ° Ù Å � � � z � � y{ | � Õ � � Ò ´ × Ú Ñ ·x z � � � { � � � z � { z ¼ { � { y �� � ¥ à � ´ Û · | � � } { � � � � � ¼ � � � � ¼ � � y | y � z � | z � � � � | Å � � � z � � y { | � � ¥ à � ´ Ð Ñ Ñ Ñ ·� ¾ Ç � � � ¥ � � � { y ¼ � � y | y � z � | z � � � { � �� � � ¥ à � | � � } { � � } z | � Î } � � y � � � � z y ¢ � � y ¢ z � } � ¢ � � y ¼ � � y | y � z � | z � � � { � �� � Ó � � � � ´ ­ « ® µ ¨ © ¶ · | � � } { � � � z � { z ¼ � � | { � � � y { � � { � � ¼ � � y | y � z � � � | z � � � { � � } � � | y � � |� � � Ü � � � ´ ° ¨ µ ª ² « É ª · | z � � ¼ � � y | y � z � � � | � � y | { � ° ¨ µ ª ² « É ª� � � � ¥ � � � � � � ¡ � { y � | � � } { � { � � ¼ � � y | y � z � � � | � � y | � { � � { � { y z � � � z � � � � { | y |� � Ò � � ´ Û · | � � } { � � } � � � { � � � } � � � � � � Å � � � z � � y { | � � Ò � � ´ Ý Ø ·

Þ � ¡ � z � { z ¼ � � � � ¦ Ç � � � � � z y � � � ¡ � z � { z ¼ � � Æ ¦ Ç Û Æ y � ¢ � � � � � � � { } y � � � z � � � | � � � � � � � � Û� � � � ¥ � � ß ¦ Ç � � � � ¡ � { y � � � { | y { � � ¦ Ç Û � z � { z ¼ � � |

» ¦ ´ ­ « ® µ ¨ © ¶ · z � � ¤ ´ ­ « ® µ ¨ © ¶ · z � � � � Í � { � � � �

Page 57: Current Methods for Evaluating Prediction Performance of

à á â ã ä å æ ç å è éå é ã ä å â ê ë ì í î ï ð ñ ò î ð ñ ó ô ë õ ñ ó ð ô ò ö ò ÷ ð ì í ø ï ò ñ ù ú û ï ü ó ñ ü ý ï ü ë ñ ó ù ú þ û í ì ó ñ ÿ ò ì ÷ ó ô ò ñ ï ò ñì � õ ë ì í ñ � ô õ ý ð ñ ò î ñ ë õ õ ì � ï ð ô ì � ò î ñ ð ÷ ì ð í ï ü ë ñ ô í î ð ì � ñ í ñ ü õ í ñ ï ò � ð ñ ò ô ü ë ÿ � ó ô ü � � ñ õ

� ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü � í î ð ì � ñ í ñ ü õ ù � � � û � � ü õ ñ � ð ï õ ñ ó � ô ò ë ð ô í ô ü ï õ ô ì ü � í î ð ì � ñ í ñ ü õ ù � � � û � ï ü óë ý ï ü � ñ ò ô ü ñ ï ë ý ì ÷ õ ý ñ ñ ï ü � ô ò ö � ô ÷ ÷ ñ ð ñ ü ë ñ ù � � û � ì ð ñ ü � í ñ ï ò � ð ñ ù ï ü ó � ù � û û � � ø ì � ñ� � ñ ð ï � ñ � ô ò ö � ô ÷ ÷ ñ ð ñ ü ë ñ ù � � � � û � ò õ ï ü ó ï ð ó ô � ñ ó � ì õ ï ÿ � ï ô ü ù � � û � ï ü ó � ð ñ ï � ü ó ñ ð õ ý ñ � � �ë � ð � ñ ù � � � û �

� ÷ ë ï õ ñ � ì ð ô ë ï ÿ ð ô ò ö ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü ô ò ò î ñ ë ô ÷ ô ñ ó � ï ó ó ô õ ô ì ü ï ÿ í ñ ï ò � ð ñ ò ô ü ë ÿ � ó ñ õ ý ñë ï õ ñ � ì ð ô ë ï ÿ � ñ ð ò ô ì ü ì ÷ õ ý ñ � ñ õ � ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü � í î ð ì � ñ í ñ ü õ ù � � � û � � ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì üî ñ ð ë ñ ü õ ù � � û � ë ý ï ü � ñ ò ô ü î ñ ð ë ñ ü õ ì ÷ ë ï ò ñ ò ï ü ó ë ì ü õ ð ì ÿ ò ë ÿ ï ò ò ô ÷ ô ñ ó ï ò ý ô � ý ð ô ò ö ù � � ë ï ü ó� � ü û � � ñ õ � ñ ü ñ ÷ ô õ ù � � û � ï ü ó ì î õ ô ì ü ï ÿ ÿ � � õ ý ñ � ì ì ö � ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü � ï ÿ ô ø ð ï õ ô ì ü ò õ ï õ ô ò õ ô ëù � � � û �� ï ò ñ í ì ó ñ ÿ � ï ð ô ï ø ÿ ñ ò ï ð ñ ò î ñ ë ô ÷ ô ñ ó � ô õ ý � � � � � � � ! " � � ý ñ ñ ü ý ï ü ë ñ ó í ì ó ñ ÿ ï ó ó ô õ ô ì ü ï ÿ ÿ �ô ü ë ÿ � ó ñ ò � ï ð ô ï ø ÿ ñ ò ò î ñ ë ô ÷ ô ñ ó � ô õ ý # � � � � � � ! " � $ � % � % & � � � ô ò õ ý ñ ' ( � ñ � ñ ü õ ì ð ó ô ò ñ ï ò ñô ü ó ô ë ï õ ì ð � ï ð ô ï ø ÿ ñ �� ÿ ÿ ñ ò õ ô í ï õ ì ð ò ï ð ñ ë ï ÿ ë � ÿ ï õ ñ ó ñ í î ô ð ô ë ï ÿ ÿ � ÷ ð ì í í ì ó ñ ÿ ø ï ò ñ ó î ð ñ ó ô ë õ ñ ó � ï ÿ � ñ ò � � ý ñ í ì ó ñ ÿø ï ò ñ ó î ð ñ ó ô ë õ ñ ó � ï ÿ � ñ ò � ò ñ ó ÷ ì ð ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü ï ü ó ÷ ì ð ë ï ÿ ë � ÿ ï õ ô ì ü ì ÷ õ ý ñ ô í î ð ì � ñ í ñ ü õí ñ ò ï ò � ð ñ ò ï ð ñ ì ø õ ï ô ü ñ ó ø � ö ÷ ì ÿ ó ë ð ì ò ò � ï ÿ ô ó ï õ ô ì ü � ü ÿ ñ ò ò ò î ñ ë ô ÷ ô ñ ó ì õ ý ñ ð � ô ò ñ � ô õ ý õ ý ñé è ã ) ì î õ ô ì ü �� ì ì õ ò õ ð ï î ò õ ï ü ó ï ð ó ñ ð ð ì ð ò ï ü ó ë ì ü ÷ ô ó ñ ü ë ñ ô ü õ ñ ð � ï ÿ ò ù � � ò û ÷ ì ð õ ý ñ ð ñ * � ñ ò õ ñ ó ò õ ï õ ô ò õ ô ë òï ð ñ ë ï ÿ ë � ÿ ï õ ñ ó � + ñ ð ë ñ ü õ ô ÿ ñ � � , ò ï ð ñ ó ô ò î ÿ ï � ñ ó � � ì ð í ï ÿ ø ï ò ñ ó � ï ü ó ø ô ï ò ë ì ð ð ñ ë õ ñ ó � � òï ð ñ ï � ï ô ÿ ï ø ÿ ñ ù ò ñ ñ ñ ò õ ï õ ø ì ì õ ò õ ð ï î ï ü ó - . / 0 è è ç â ç ä 1 æ û � ï ü � ì ÷ õ ý ñ ò õ ï ü ó ï ð ó ñ ë ÿ ï ò ò ø ì ì õ ò õ ð ï î ð ñ ò � ÿ õ ò ÿ ñ ÷ õ ø ñ ý ô ü ó ø � 0 â ç 1 ç ï ð ñ ï � ï ô ÿ ï ø ÿ ñ ï ÷ õ ñ ðð � ü ü ô ü � å é ã ä å â ê � ô ü ï ó ó ô õ ô ì ü õ ì õ ý ñ ð ë ÿ ï ò ò ð ñ ò � ÿ õ ò ÿ ô ò õ ñ ó ø ñ ÿ ì � �

2 æ ç å è é â

ì ó ñ ÿ ò� � � � � � � ! " ò î ñ ë ô ÷ ô ñ ò õ ý ñ � ï ð ô ï ø ÿ ñ ò õ ì ø ñ ô ü ë ÿ � ó ñ ó ô ü õ ý ñ ø ï ò ñ í ì ó ñ ÿ � � � " ô ò ð ñ * � ô ð ñ ó �# � � � � � � ! " ò î ñ ë ô ÷ ô ñ ò ï ó ó ô õ ô ì ü ï ÿ � ï ð ô ï ø ÿ ñ ò õ ì ø ñ ô ü ë ÿ � ó ñ ó ô ü õ ý ñ ñ ü ý ï ü ë ñ ó í ì ó ñ ÿ � # � " ô òð ñ * � ô ð ñ ó �

� î õ ô ì ü ï ÿ 3 õ ï õ ô ò õ ô ë òã è è ê ô ü ë ÿ � ó ñ ò õ ý ñ � ì ì ö � ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü ë ï ÿ ô ø ð ï õ ô ì ü ò õ ï õ ô ò õ ô ë ô ü õ ý ñ ì � õ î � õ � 4 ü õ ð ô ñ ò÷ ì ð õ ý ñ í ñ ï ü í ì ó ñ ÿ ø ï ò ñ ó ð ô ò ö ÷ ð ì í ø ï ò ñ ï ü ó ñ ü ý ï ü ë ñ ó í ì ó ñ ÿ ò ï ð ñ ï ó ó ô õ ô ì ü ï ÿ ÿ �ô ü ë ÿ � ó ñ ó ô ü õ ý ñ � ñ ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü õ ï ø ÿ ñ � 5 ï ÿ ô ó ì ü ÿ � � ý ñ ü ñ ô õ ý ñ ð ì ÷ õ ý ñ ä ã 1 ç � " ì ðã 1 ç á 6 è ä å ã 1 7 ì î õ ô ì ü ò ï ð ñ ï ÿ ò ì ò î ñ ë ô ÷ ô ñ ó �

� ï ÿ õ ñ ð ü ï õ ô � ñ ò6 7 8 � ! 9 : % " ò î ñ ë ô ÷ ô ñ ò ï ÿ õ ñ ð ü ï õ ñ � í ì ó ñ ÿ õ � î ñ ò ÷ ì ð ñ ò õ ô í ï õ ô ì ü ì ÷ ó ô ò ñ ï ò ñ ð ô ò ö �

7 è 6 å ç � õ ý ñ ó ñ ÷ ï � ÿ õ � � ò ñ ò ÿ ì � ô ò õ ô ë ð ñ � ð ñ ò ò ô ì ü í ì ó ñ ÿ ò ù � ø ô ü ì í ô ï ÿ ÷ ï í ô ÿ � ï ü ó ÿ ì � ô õÿ ô ü ö û ï ü ó ð ñ ò � ÿ õ ô ü � ÿ ô ü ñ ï ð î ð ñ ó ô ë õ ì ð ÷ ì ð ë ï ÿ ë � ÿ ï õ ô ì ü ì ÷ ô í î ð ì � ñ í ñ ü õ í ñ ï ò � ð ñ òï ü ó ð ô ò ö ë ÿ ï ò ò ô ÷ ô ë ï õ ô ì ü �æ è å â â è é � ò ñ ò î ì ô ò ò ì ü ð ñ � ð ñ ò ò ô ì ü í ì ó ñ ÿ ò ù � î ì ô ò ò ì ü ÷ ï í ô ÿ � ï ü ó ÿ ì � ÿ ô ü ö û ï ü ó õ ý ñð ñ ò � ÿ õ ô ü � ÿ ô ü ñ ï ð î ð ñ ó ô ë õ ì ð �

7 è 6 0 å é � ò ñ ò ï � í ì ó ñ ÿ � ô õ ý ø ô ü ì í ô ï ÿ ÷ ï í ô ÿ � ï ü ó ÿ ì � ÿ ô ü ö ô ü ò õ ñ ï ó ì ÷ ï ÿ ì � ô õ ÿ ô ü ö �� ì ü � ñ ð � ñ ü ë ñ î ð ì ø ÿ ñ í ò ï ð ñ ì ÷ õ ñ ü ñ ü ë ì � ü õ ñ ð ñ ó � ô õ ý õ ý ô ò ì î õ ô ì ü �

Page 58: Current Methods for Evaluating Prediction Performance of

; < = > ? @ A = = < B < C A D < E FG H I J K L M N O P Q R S = T U C < B < U = V U D W U U F X A F Y Z D [ \ U = [ E @ Y = B E \ \ < = > C A D U ] E \ ^ C @ A = = < B < C A D < E F

< F D E _ ` a C A D U ] E \ < U = b c A @ d U = e d = D V U V U D W U U F f A F Y X b g B H I J h i j G k H I l < = = T U C < B < U YA F Y G H I J K S < = F E D m D W E C A D U ] E \ < U = W < D [ = < F ] @ U D [ \ U = [ E @ Y A D U < D [ U \ D [ U = A e T @ UY < = U A = U T \ U n A @ U F C U m o p q r p s R m E \ < B = T U C < B < U Y m D [ U T E T d @ A D < E F T \ U n A @ U F C U m G t j K S m A \ U

d = U Y bH I J h i j G k H I l = T U C < B < U = D [ A D C A D U ] E \ < C A @ \ < = > C @ A = = < B < C A D < E F D A V @ U = A F Y = D A D < = D < C = A \ U D E

V U < F C @ d Y U Y W [ U F G H I J K S < = F E D = T U C < B < U Y b u W E C A D U ] E \ < U = A \ U Y U B < F U Y V ^ D [ \ U = [ E @ Yo p q v Y < = U A = U T \ U n A @ U F C U b

G t j K o S = T U C < B < U = D [ U T E T d @ A D < E F Y < = U A = U T \ U n A @ U F C U D E V U d = U Y B E \ Y U B A d @ D C A D U ] E \ < C A @\ < = > C @ A = = B < C A D < E F m u E D A @ w A < F = D A D < = D < C C A @ C d @ A D < E F m A F Y e E Y U @ ` V A = U Y \ < = >U = D < e A D < E F b c A @ d U e d = D V U V U D W U U F f A F Y X b ; U x d < \ U Y W [ U F C A = U ` C E F D \ E @ V E E D = D \ A T= A e T @ < F ] < = = T U C < B < U Y W < D [ H H y I z { b u [ U = A e T @ U Y < = U A = U T \ U n A @ U F C U < = d = U Y V ^Y U B A d @ D b

? \ E = = ` n A @ < Y A D < E F| y { l k J K } S = T U C < B < U = D [ U F d e V U \ E B \ A F Y E e T A \ D < D < E F = E B D [ U Y A D A D [ A D A \ U D E V U d = U Y D E

E V D A < F T \ U Y < C D U Y n A @ d U = n < A > ` B E @ Y C \ E = = ` n A @ < Y A D < E F b u [ U Y U B A d @ D < = | y { l k J K ~ � S b� T U C < B ^ < F ] | y { l k J K � S < = U x d < n A @ U F D D E = T U C < B ^ < F ] | j H � b

| j H � = T U C < B < U = D [ A D C \ E = = ` n A @ < Y A D < E F < = F E D d = U Y b g F = D U A Y m W < D [ < F = A e T @ U T \ U Y < C D U Yn A @ d U = A \ U E V D A < F U Y B \ E e A = < F ] @ U e E Y U @ B < D D E D [ U B d @ @ = A e T @ U b | j H � E n U \ < Y U = A F ^

= T U C < B < C A D < E F B E \ | y { l k J K } S b

� d e e A \ ^ = D A D < = D < C =| j I � H E e < D = C A @ C d @ A D < E F A F Y \ U T E \ D E B D [ U ; � ? A \ U A d F Y U \ D [ U C d \ n U � � � ? � bl � K � � S = T U C < B < U = � � B E \ C A @ C d @ A D < E F E B D [ U � E \ U F � e U A = d \ U m D [ U T \ E T E \ D < E F E B C A = U = W < D [

\ < = > A V E n U D [ U D [ \ U = [ E @ Y U � C U U Y U Y V ^ D [ U T \ E T E \ D < E F � � E B = d V � U C D = < F D [ UT E T d @ A D < E F b u [ U A \ ] d e U F D e d = D V U V U D W U U F f A F Y X b u [ U Y U B A d @ D < = l � K � ~ � S b

l � k | � K � � S = T U C < B < U = � � B E \ C A @ C d @ A D < E F E B D [ U < F n U \ = U � E \ U F � e U A = d \ U m D [ U T \ E T E \ D < E F E BD [ U T E T d @ A D < E F W < D [ \ < = > A V E n U D [ U D [ \ U = [ E @ Y U � C U U Y U Y V ^ D [ U T \ E T E \ D < E F � � E BC A = U = b u [ U A \ ] d e U F D e d = D V U V U D W U U F f A F Y X b u [ U Y U B A d @ D < = l � k | � K � � � S b

� A e T @ < F ] n A \ < A V < @ < D ^| y I z { K � S = T U C < B < U = D [ U F d e V U \ E B V E E D = D \ A T = A e T @ U = D E V U Y \ A W F B E \ U = D < e A D < E F E B

= D A F Y A \ Y U \ \ E \ = A F Y ? g = b u [ U Y U B A d @ D < = | y I z { K ~ � � � S b| j � y J G I { E e < D = V E E D = D \ A T = A e T @ < F ] A F Y U = D < e A D < E F E B = D A F Y A \ Y U \ \ E \ = A F Y ? g = b g B

| y I z { K S < = = T U C < B < U Y m | j � y J G I { W < @ @ E n U \ \ < Y U < D bH H y I z { = T U C < B < U = D [ A D V E E D = D \ A T = A e T @ U = V U Y \ A W F = U T A \ A D U @ ^ B \ E e C A = U = A F Y C E F D \ E @ =

\ A D [ U \ D [ A F = A e T @ < F ] B \ E e D [ U C E e V < F U Y = A e T @ U � C E [ E \ D = A e T @ < F ] < = D [ U Y U B A d @ D bH l � y J h G K � s o O P Q R S = T U C < B < U = n A \ < A V @ U = < Y U F D < B ^ < F ] V E E D = D \ A T \ U = A e T @ < F ] C @ d = D U \ = b � U U

D [ U H l � y J h G K S E T D < E F < F � � � � j j J y J G I { bG h y � k l h K � P O � L s N � S C \ U A D U = A � D A D A B < @ U � A � � J I B < @ U � W < D [ D [ U V E E D = D \ A T \ U = d @ D = B E \ D [ U

< F C @ d Y U Y = D A D < = D < C = b � y J I J C A F V U \ d F E F D [ < = B < @ U D E n < U W D [ U V E E D = D \ A T \ U = d @ D =A ] A < F b

G h { l I H h = T U C < B < U = D [ A D < B D [ U = T U C < B < U Y B < @ U A @ \ U A Y ^ U � < = D = m D [ U F D [ U U � < = D < F ] B < @ U= [ E d @ Y V U E n U \ W \ < D D U F b

Page 59: Current Methods for Evaluating Prediction Performance of

� �   � � ¡ ¢ £ ¤ ¥ ¦ § ¨ © ¨ ¦ ¤ ª « ¦ § ¬ ­ © ¨ ® ¦ ­ § ¦ ¯ ¦ ° ¦ ¯ © ¬ ± ² ³ ¤ ´ ¤ ´ ¥ ¦ ± § ¦ ­ ª ´ µ ¦ ¶ · « ¦ ® ¦ © ´ ¸ ¯ ª ¨ ¤� �   � � ¡ ¹ º £ ¬ ± ´ ¤ ¤ ¦ ª » ¼ ½ � ¾ � �   � � ¶

¿ ¦ À ° ´ ± ¨ ´ » ¯ ¦Á � Â Ã Ä µ ¦ ­ ¦ ± ´ ª ¦ ¤ ­ ¦ À ° ´ ± ¨ ´ » ¯ ¦ ¤ Ã Ä Å Æ ´ ­ ® Ã Ä Ç È É ª ¬ « ¬ ¯ ® ª « ¦ Ê Ë ° ´ ¯ ¨ ® ´ ª ¨ ¬ ­ µ ¦ ­ ¦ ± ´ ª ¦ ® Ì ¬ ® ¦ ¯

¥ ± ¦ ® ¨ § ª ¦ ® ° ´ ¯ ¸ ¦ ¤ © ± ¬ Ì Ì ¬ ® ¦ ¯ ¤ ¤ ¥ ¦ § ¨ © ¨ ¦ ® ¨ ­ Í Î Ï Å ¡ Ð Ñ Ò Ó Ô Õ Ö £ Æ ´ ­ ® Í Î Ï Ç ¡ Ð Ñ Ò Ó Ô Õ Ö £ È ¶× � Ø � Ù Ú � Û Ã Ä ± ¦ Ü ¸ ¦ ¤ ª ¤ ª « ´ ª ¦ Ê ¨ ¤ ª ¨ ­ µ ° ´ ± ¨ ´ » ¯ ¦ ¤ Ã Ä Å Æ ´ ­ ® Ã Ä Ç È » ¦ ¬ ° ¦ ± À ± ¨ ª ª ¦ ­ » ¼ Á � Â Ã Ä ¶

Ý Ù   � Ï × � ½ Þ � ¾ ½

ß Ã Ù Í Ø � � ½¯ ¬ ´ ® ¤ ¨ Ì ¸ ¯ ´ ª ¦ ® ® ´ ª ´

à Þ ½ � á ¾ ¾ Ø â ã ã � Ù Ä ½ à ä á Ú × Ú à Î × Á ã Ø � Ø � ã Ï Ù Ä ½ ã × å ½ æ Û × � Ú � Ù ½ ½ Û Äå Â Ú × å ½ æ À ¨ ª « § ¬ ­ ª ¨ ­ ¸ ¬ ¸ ¤ Ì ¦ ´ ¤ ¸ ± ¦ ¤ ¬ ­ ¯ ¼ É ´ ­ ® À ¨ ª « ® ¦ © ´ ¸ ¯ ª ¤ ¦ ª ª ¨ ­ µ ¤ ¬ ª « ¦ ± À ¨ ¤ ¦ ¶

à å Â Ú × å ½ æ Ï ç à ¡ à £ è ¡ è £¨ ­ § ¯ ¸ ® ¦ § ´ ª ¦ µ ¬ ± ¨ § ´ ¯ Ì ¦ ´ ¤ ¸ ± ¦ ¤ À ¨ ª « ª À ¬ § ´ ª ¦ µ ¬ ± ¨ ¦ ¤ ® ¦ © ¨ ­ ¦ ® » ¼ ª « ¦ ¤ ´ Ì ¥ ¯ ¦ ® ¨ ¤ ¦ ´ ¤ ¦

¥ ± ¦ ° ´ ¯ ¦ ­ § ¦

à å Â Ú × å ½ æ Ï ç à ¡ à £ è ¡ è £ Ú Ù ¾ � Á Î × å Ú Ù �¨ ­ § ¯ ¸ ® ¦ § ´ ª ¦ µ ¬ ± ¨ § ´ ¯ Ì ¦ ´ ¤ ¸ ± ¦ ¤ À ¨ ª « é § ´ ª ¦ µ ¬ ± ¨ ¦ ¤ ® ¦ © ¨ ­ ¦ ® » ¼ ¤ ¥ ¦ § ¨ © ¨ ¦ ® ª « ± ¦ ¤ « ¬ ¯ ® ¤

à å Â Ú × å ½ æ Ï ç à ¡ à £ è ¡ è £ × Ú Ù ¾ ¡ à Å º à Ç º à º £ Ú Î Î æ  ½ Ù Í Ø ¡ Å ê ê £¯ ¨ ¤ ª ± ¦ ª ¸ ± ­ ¦ ® ± ¦ ¤ ¸ ¯ ª ¤

à × � ¾ Þ × Â � å ½ ¾¤ ´ ° ¦ » ¬ ¬ ª ¤ ª ± ´ ¥ ± ¦ ¤ ¸ ¯ ª ¤ ª ¬ ´ © ¨ ¯ ¦ ´ ­ ® ® ¨ ¤ ¥ ¯ ´ ¼ ª « ¦ ¤ ´ ° ¦ ® ± ¦ ¤ ¸ ¯ ª ¤

à å Â Ú × å ½ æ Ï ç Ã ¡ Ã £ è ¡ è £ × Ú Ù ¾ ¡ à Å º à Ç º à º £ Ú Î Î æ Â Î Ú   × � ½ ä å � � ¡ × � ½ Þ � ¾ Å £ × � Ø � Ù Ú �

à Ä ½ ¾ Ù ¾ Þ ½ å Â Á × � ½ Þ � ¾ Å

ë � ä � × � Â Ú � ½ì ¦ ¥ ¦ í î ¶ ¤ ¯ ¨ ® ¦ ¤ © ¬ ± ´ ¤ « ¬ ± ª § ¬ ¸ ± ¤ ¦ É ï ð Ò Ò ñ ò Ö ó ñ Ö ô õ ö Õ ÷ õ Ò ø Ð Ñ Ó ð Ñ Ö Ô ò ù ú Ò ñ ö Ô û Ö Ô õ ò

ú ñ Ò ÷ õ Ò ü Ñ ò û ñ õ ÷ ý Ô õ ü Ñ Ò þ ñ Ò Õ Ñ ò ö ÿ ñ Õ Ö Õ É § ´ ­ » ¦ ¬ » ª ´ ¨ ­ ¦ ® ´ ª ª « ¦ � � ï � ï � � ý � Õ Ô Ö ñ

� Þ ¾ á Î × ½� ´ ± ¼ � ¬ ­ µ ª ¬ ­ ± ¦ ® ¸ ª § « ¨ ­ ¤ ¬ ­ ² ´ ­ § ¦ ± � ¦ ¤ ¦ ´ ± § « ² ¦ ­ ª ¦ ±î ¦ ´ ª ª ¯ ¦ É � µ ¯ ¬ ­ µ ª ¬ ­ � © « § ± § ¶ ¬ ± µ

í ´ ± µ ´ ± ¦ ª ì ¦ ¥ ¦ ± ¦ ® ¸ ª § « ¨ ­ ¤ ¬ ­ ² ´ ­ § ¦ ± � ¦ ¤ ¦ ´ ± § « ² ¦ ­ ª ¦ ± ´ ­ ® � ­ ¨ ° ¦ ± ¤ ¨ ª ¼ ¬ © � ´ ¤ « ¨ ­ µ ª ¬ ­î ¦ ´ ª ª ¯ ¦ É �

Ì ¤ ¥ ¦ ¥ ¦ � ¸ ¶ À ´ ¤ « ¨ ­ µ ª ¬ ­ ¶ ¦ ® ¸

� � ½ Î Ý � �� ­ ¯ ¨ ­ ¦ � Ø × � Ï Ú Þ ×   � É å Â Ú × Î Ú � ¨ © ¨ ­ ¤ ª ´ ¯ ¯ ¦ ® �

Page 60: Current Methods for Evaluating Prediction Performance of

� � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � ! � " # � � �

$ % & ' ( ) * + , * � - . / � � 0 ! 1 2 3 2 4 �5 ( 6 & %

( ) * + , * 7 ! � / . 8 . ! � � " - � " 9 . 0 : � 8 � / ; . / / . " � � � - . � 0 � " � � � 0 : . < � � � � ! = > / . ? � � � 0 / � 2@ - � " 9 � � � 0 ! � � A � � B / . � > . � � � 0 / . � . � - . / 0 > . / � � � ! = � B � / � � � . / � � � � � � � C D �� 9 8 8 � / # � � � � � � � � � � 2 � C D � � � � � � � � � � � / . E � � . ? 0 ! > . / � . ! � � " . - � " 9 . � F G �� � " � 9 " � � � 0 ! �

H I ) 6 J K( ) * + , * L M N O P N O Q R P S T M U V T M W V X Y , Z [ \ R P S ] M N ^ _ T Y , Z ` \ R P S ] M N ^ _ a b ^ M a W N V

a b ^ M a W N � . � � / � > � � 0 !c 0 ? . " �d Y , Z [ \ R P S ] M N ^ _ 8 0 ? . " 1 > / . ? � � � 0 / - � / � � E " . �Y , Z ` \ R P S ] M N ^ _ 8 0 ? . " 4 > / . ? � � � 0 / - � / � � E " . �

D / 0 � � e - � " � ? � � � 0 !) f ' & ( 6 \ g _ � > . � � : # ! 9 8 E . / 0 : / � ! ? 0 8 > � / � � � � 0 ! � : 0 / ; e : 0 " ? � / 0 � � e - � " � ? � � � 0 ! h? . : � 9 " � � � ) f ' & ( 6 \ [ i _) , * j 0 8 � � � / 0 � � e - � " � ? � � � 0 !

� 9 8 8 � / # � � � � � � � � � �J k * � / . � 9 ! ? . / � C D � 9 / - . � l D �' J k * \ U _ > � / � � � " l D � > l D � : 0 / : � " � . > 0 � � � � - . / � � . / � ! = . m F � n U+ , * \ U _ � C D � � � > . � � : � . ? m F � o U+ , * ( ) j \ ^ _ � C D p � e 1 � � � � � � � > . � � : � . ? � / 9 . > 0 � � � � - . / � � . q F � o ^� � � ! ? � / ? � r � � � 0 ! 8 . � B 0 ?' j * Y % 6 $ \ s O ^ t a L _ � > . � � : # F G � � " � 9 " � � � 0 ! 8 . � B 0 ? h % Y ' ( + ( * J & � ? . : � 9 " � � 0 / ) , + Y J &6 ( % * , + + � 0 / / . � � � 0 ! : 0 / � � . �D 0 - � / � � � . � ? u 9 � � 8 . ! �J Z v * , j \ R P S ] M N ^ _ � > . � � : # � 0 - � / � � � . � � 0 � ? u 9 � � : 0 /J Z v Y , Z % & \ s a L O ] _ � > . � � : # 8 0 ? . " � ? u 9 � � 8 . ! � h f 6 + J 6 ( w ( % Z � ? . : � 9 " � � 0 / & ( ) % J +� � 8 > " � ! = - � / � � E � " � � #) f J Y ' \ x _ � > . � � : # ! 9 8 E . / 0 : E 0 0 � � � / � > � � 8 > " . � h ? . : � 9 " � � � ) f J Y ' \ [ i i i _) , y f 6 + J ' 0 8 � � E 0 0 � � � / � > � � 8 > " � ! =) , * * f J Y ' � > . � � : # � 0 B 0 / � / � � B . / � B � ! � � � . e � 0 ! � / 0 " E 0 0 � � � / � > � � 8 > " � ! =) , f 6 f J Y ' ? / � A � � 8 > " . � A � � B 0 9 � / . � > . � � � 0 � 0 - � / � � � . � � / � � �* & k f 6 % + \ R P S ] M N ^ _ � > . � � : # - � / � � E " . � � ? . ! � � : # � ! = E 0 0 � � � / � > / . � � 8 > " � ! = � " 9 � � . / �+ % f w ( & % \ U M ] O W P s O _ � � - . E 0 0 � � � / � > / . � 9 " � � � ! U M ] O W P s O+ % ' & J * % 0 - . / A / � � . � > . � � : � . ? E 0 0 � � � / � > / . � 9 " � � : � " . � : � � � " / . � ? # . < � � � �& % j % & \ x _ � > . � � : # � 0 ! : � ? . ! � . " . - . " h ? . : � 9 " � � � & % j % & \ z { _

| . A - � / � � E " .} % ) K y = . ! . / � � . ! . A - � / � � E " . ~ K y x ~ � 0 B 0 " ? > / . ? � � � . ? - � " 9 . � : 0 / 8 0 ? . " x+ % ' & J * % 0 - . / A / � � . . < � � � � ! = K y x - � / � � E " . �

d Y , Z [ \ R P S ] M N ^ _ � � / . � 9 � / . ? 2� % f * + ( ' 6 ( , )

Page 61: Current Methods for Evaluating Prediction Performance of

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �   � ¡ ¢ � � � � � � � � � �   � £ ¢ � � � �� � � � � ¤ � � � � � � � � � ¥   � £ ¦ §   � � � ¨ � � � � � � © ª « � ¬ � � � � � �   � � � ­ ª ® � � � � ¤ � � � � ¯ � � � � � �� � � � � � � � � � � � � ° � ± ² ³ ´ µ ¶ · ¸ ¹ ¦ º � � � ° � ± » ³ ´ µ ¶ · ¸ ¹ ¦ º ª ¥ � � � � � � � ¼ � � � � � � � � � � � � � � �¤ � � � � ¯ � � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � � � � ¤ � � � � ¯ ¼ � � � � � � � � � � � � � � � � � � � � � � � � ¤ � � � �� � � � � � ½ � � � � � � � � ¤ � � � � � � � � � � � � � � � � � � � � � � � ¼ � � � � � � � � � ¤ � � � � ¯ � � � � � � � � � � � � � � � � ¯ ¼� � � � � � � � � � � � � � � � � � � � ½ � � ¤ � � � � ¯ � � ª ¾ ¸ ¹ ¿ µ ¹ ¿ À ´ µ ¶ � � � � � Á  à � � � � � � � � � � � � � � � �¤ � � � � ¯ � � ª� � � � � � � � � ¤ � � ¼ � � � � � � � � � � � � � � � � ¯ � � � � � � � � � � � � � � ° � ± ² ³ ´ µ ¶ · ¸ ¹ ¦ º � � � � � � � � � � � � � � �� � Ä � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ª

¥ � � � � � � � � � � � � � � � � ¤ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¯ � � � � � � ¯ ¼ ½ Å � � � �� � � � � Å ¤ � � � � � � � � � ª� � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¯ ¼ � � � � �   Æ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¤ � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¯ � � � � � � � � � � � � � � � � � � §   � � � � � � Ç � � � � � � � È � � � �� � �   � � � ­ ª� � � � � � � � � � ¤ � � � � � � � � É � � � � � � � � � � ¯ � � � � � � ¤ � � � � � � � � ¯ ¼ � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � § Ê � � � � � � �   � � � � Ë Á Á Ì Í Ê � � � � � � �   � � � � � � � � � � � ­ ªÎ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¤ � � � § � Ï � ­ � � � � � � � � Ä � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ª   � � � � � � � � � � � � � � � � Å ¯ � � � � � � � �¯ � � � Å � � � � � � � � � � Ï � � � � � � � � � � ¼ � � § � � � � � � � � ¯ � � � � � � � � � � � Ð Ñ Ò Ó � � Ô Õ Ô � Ö × ­ ªØ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¯ � � � � � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � ¯ � � � � � � � � � � � � � � � � ª® � � ¼ � � � � � � � � � � � � � � Å � � � � � ¯ � � � � � � � � � � � � � � � � � � � ¯ � � � � � ¯ ¼ Ó Õ Ô Ö Ô � � � � ¤ � � � � ¯ � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Å � � � � � � � � � � � � � � � � � � ¯ � � � � ª

Ù × Ô � � � Õ

® � � � � �° � ± ² ³ ´ µ ¶ · ¸ ¹ ¦ º � � � � � � � � � � � � ¤ � � � � ¯ � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � ª ° � ± ² ³ º � � � � Ä � � � � � ª° � ± » ³ ´ µ ¶ · ¸ ¹ ¦ º � � � � � � � � � � � � ¤ � � � � ¯ � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � ª

� � � � � Å ¤ � � � � � � � � �� Õ × Ú � Ô ³ Û º � � � � � � � � � � � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¯ � � � � � � �� ¯ � � � � � � � � � � � � � ¤ � � � � � ¤ � � ½ Å � � � � � � � � � Å ¤ � � � � � � � � � ª ¥ � � � � � � � � � � � � Õ × Ú � Ô ³ ² Ü º ª¨ � � � � � ¼ � � � � Õ × Ú � Ô ³ Ý º � � � Ä � � ¤ � � � � � � � � � � � � � ¼ � � � � � � Þ ª� � � Þ � � � � � � � � � � � � � � � � � � Å ¤ � � � � � � � � � � � � � � � � � � ª Ï � � � � � � � � � � � � � � � � � � � � � � � � � � � �¤ � � � � � � � � � ¯ � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ª � � � Þ � ¤ � � � � � � � � ¼� � � � � � � � � � � � � � � � � Õ × Ú � Ô ³ Û º ª

¨ � � � � � ¼ � � � � � � � � � �Ö ß � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ª ¥ � � � � � � � � � � � � � � � � � � � � � � � � � ¼� � � � � � � � � � � � � � � � � � � � � � ª à

× Ö ß � ³ ¢ º � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �   � ¡ � � � � � � � � � ¢ ª ¥ � �� � � � � � � � � � � � ¯ � ¯ � � � � � � Á � � � à ª � � � � � � � � � � � � � � � � � � � � � � � � � � � � �   Æ� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ¼ � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � �   Æ � � � � � � � � � � � � � � � � � � � � � � � ª

Page 62: Current Methods for Evaluating Prediction Performance of

á â ã ä å æ ç è é ê ë ì í î ê ì í ï ð ç ñ è ì î ò ð ñ ó ì í î ê í ç ñ ñ è ñ ó í ô õ ö ë ñ ñ ó í î ê í ç ð ÷ ð í ï ø ù ô ú å û ü ó íë ì ý þ é í ÿ ñ é þ î ñ � í � í ñ ò í í ÿ � ë ÿ ï � ûá â ã � � � ä � æ ç è é ê ë ì í î ê ì í ï ð ç ñ è ì î ò ð ñ ó ì í î ê í ç ñ ñ è ñ ó í ð ÿ � í ì î í ô õ ö � ô õ ö � � � ñ � � ë ñ ñ ó íî ê í ç ð ÷ ð í ï ü ù ô ú � û ü ó í ë ì ý þ é í ÿ ñ é þ î ñ � í � í ñ ò í í ÿ � ë ÿ ï � û

ñ ë ÿ ï ë ì ï ð � ë ñ ð è ÿ é í ñ ó è ï� � ã � � � � ä � � � � � � æ î ê í ç ð ÷ ð í î ó è ò ñ ó í ù � î ë ì í ñ è � í ç ë � ç þ � ë ñ í ï û � � � � � � ç ë ÿ � í è ÿ í è ÷ ñ ó í÷ è � � è ò ð ÿ ý �

� � � � á � ã � � � ñ ó í ï í ÷ ë þ � ñ � þ î í î ñ ó í í é ê ð ì ð ç ë � ï ð î ñ ì ð � þ ñ ð è ÿ è ÷ ñ ó í ñ í î ñ é í ë î þ ì í ë é è ÿ ýç è ÿ ñ ì è � î � ú � � ë î ñ ó í ì í ÷ í ì í ÿ ç í ï ð î ñ ì ð � þ ñ ð è ÿ ÷ è ì ñ ó í ç ë � ç þ � ë ñ ð è ÿ è ÷ ç ë î í ù � î ûü ó í ù � ÷ è ì ñ ó í ç ë î í é í ë î þ ì í � ð ð î ñ ó í ê ì è ê è ì ñ ð è ÿ è ÷ ç è ÿ ñ ì è � é í ë î þ ì í î ! � � "� ð û

� â á � � � é è ï í � î ñ ó í ñ í î ñ é í ë î þ ì í ë é è ÿ ý ç è ÿ ñ ì è � î ò ð ñ ó ë ÿ è ì é ë � ï ð î ñ ì ð � þ ñ ð è ÿ û ü ó í ù �÷ è ì ñ ó í ç ë î í é í ë î þ ì í � ð ð î ñ ó í î ñ ë ÿ ï ë ì ï ÿ è ì é ë � ç þ é þ � ë ñ ð � í ï ð î ñ ì ð � þ ñ ð è ÿ÷ þ ÿ ç ñ ð è ÿ è ÷ � ð � é í ë ÿ � # î ï � ò ó í ì í ñ ó í é í ë ÿ ë ÿ ï ñ ó í î ñ ë ÿ ï ë ì ï ï í � ð ë ñ ð è ÿ ë ì íç ë � ç þ � ë ñ í ï � � þ î ð ÿ ý ñ ó í ç è ÿ ñ ì è � î ë é ê � í û� � � ã â á á ð ÿ ï ð ç ë ñ í î ñ ó ë ñ ë ç è ì ì í ç ñ ð è ÿ ÷ è ì ñ ð í î � í ñ ò í í ÿ ç ë î í ë ÿ ï ç è ÿ ñ ì è � � ë � þ í î ð îð ÿ ç � þ ï í ï ð ÿ ñ ó í í é ê ð ì ð ç ë � ù � ç ë � ç þ � ë ñ ð è ÿ û ü ó í ç è ì ì í ç ñ ð è ÿ ð î ð é ê è ì ñ ë ÿ ñ è ÿ � � ð ÿç ë � ç þ � ë ñ ð ÿ ý î þ é é ë ì � ð ÿ ï ð ç í î î þ ç ó ë î ñ ó í $ % ö û ü ó í ñ ð í � ç è ì ì í ç ñ í ï ù � ÷ è ì ë ç ë î í ò ð ñ óñ ó í é è ï í � ê ì í ï ð ç ñ í ï � ë � þ í � ð ð î ñ ó í ê ì è ê è ì ñ ð è ÿ è ÷ ç è ÿ ñ ì è � � ë � þ í î ! � � " � ð ê � þ îè ÿ í ó ë � ÷ ñ ó í ê ì è ê è ì ñ ð è ÿ è ÷ ç è ÿ ñ ì è � � ë � þ í î ! � � ú � ð � ò ó í ì í ! � � ï í ÿ è ñ í î ç è ÿ ñ ì è � î û

& � ï í ÷ ë þ � ñ � ñ ó í ù � ç ë � ç þ � ë ñ ð è ÿ ð ÿ ç � þ ï í î è ÿ � � ñ ó í ÷ ð ì î ñ ñ í ì é � ð û í û � ñ ó í ê ì è ê è ì ñ ð è ÿè ÷ ç è ÿ ñ ì è � � ë � þ í î ! � � " � ð û ü ó ð î è ê ñ ð è ÿ ë ê ê � ð í î è ÿ � � ñ è ñ ó í í é ê ð ì ð ç ë � ù �ç ë � ç þ � ë ñ ð è ÿ é í ñ ó è ï û

ö è � ë ì ð ë ñ í ë ï ' þ î ñ é í ÿ ñ� ( ) ã â � ä * + , - . / � æ î ê í ç ð ÷ ð í î ñ ó í � ë ì ð ë � � í î ñ è � í ð ÿ ç � þ ï í ï ð ÿ ñ ó í ë ï ' þ î ñ é í ÿ ñ û� ( ) � â ( � � ä � � � � - æ î ê í ç ð ÷ ð í î ó è ò ñ ó í ç è � ë ì ð ë ñ í ë ï ' þ î ñ é í ÿ ñ ð î ñ è � í ï è ÿ í û � � � � - ç ë ÿ � í è ÿ íè ÷ ñ ó í ÷ è � � è ò ð ÿ ý �

0 � á � � � 1 � � ( ù � î ë ì í ç ë � ç þ � ë ñ í ï î í ê ë ì ë ñ í � � ÷ è ì í ë ç ó î ñ ì ë ñ þ é ï í ÷ ð ÿ í ï � � * + , - . / � ð ÿ� ( ) ã â � ä æ û ü ó ð î ð î ñ ó í ï í ÷ ë þ � ñ ð ÷ � ( ) � â ( � � ä æ ð î ÿ è ñ î ê í ç ð ÷ ð í ï ë ÿ ï � ( ) ã â � ä æ ð î û

2 ë ç ó ç ë î í � ç è ÿ ñ ë ð ÿ ð ÿ ý î ñ ì ë ñ þ é é þ î ñ ð ÿ ç � þ ï í ë ñ � í ë î ñ ñ ò è ç è ÿ ñ ì è � î û ñ ì ë ñ ë ñ ó ë ñï è ÿ è ñ ð ÿ ç � þ ï í ç ë î í î ë ì í í 3 ç � þ ï í ï ÷ ì è é ç ë � ç þ � ë ñ ð è ÿ î û� � � � � á ÷ ð ñ î ë � ð ÿ í ë ì ì í ý ì í î î ð è ÿ è ÷ ñ ó í ê ì í ï ð ç ñ è ì ï ð î ñ ì ð � þ ñ ð è ÿ è ÿ ñ ó í ë ï ' þ î ñ é í ÿ ñç è � ë ì ð ë ñ í î ë é è ÿ ý ç è ÿ ñ ì è � î û ñ ë ÿ ï ë ì ï ð � í ï ì í î ð ï þ ë � î � ë î í ï è ÿ ñ ó ð î ÷ ð ñ ñ í ï � ð ÿ í ë ìé è ï í � ë ì í þ î í ï ð ÿ ê � ë ç í è ÷ ñ ó í ê ì í ï ð ç ñ í ï � ë � þ í î ÷ è ì ç ë î í î ë ÿ ï ç è ÿ ñ ì è � î û

ë é ê � ð ÿ ý � ë ì ð ë � ð � ð ñ �� 0 � � � ä 4 æ î ê í ç ð ÷ ð í î ñ ó í ÿ þ é � í ì è ÷ � è è ñ î ñ ì ë ê î ë é ê � í î ñ è � í ï ì ë ò ÿ ÷ è ì í î ñ ð é ë ñ ð è ÿ è ÷î ñ ë ÿ ï ë ì ï í ì ì è ì î ë ÿ ï ö 5 î û ü ó í ï í ÷ ë þ � ñ ð î � 0 � � � ä 6 7 7 7 æ û� â 8 0 � á � � è é ð ñ î � è è ñ î ñ ì ë ê î ë é ê � ð ÿ ý ë ÿ ï í î ñ ð é ë ñ ð è ÿ è ÷ î ñ ë ÿ ï ë ì ï í ì ì è ì î ë ÿ ï ö 5 î û 5 ÷

� 0 � � � ä æ ð î î ê í ç ð ÷ ð í ï � � â 8 0 � á � � ò ð � � è � í ì ì ð ï í ð ñ û� â ã ã 0 � � � î ê í ç ð ÷ ð í î ñ ó ë ñ � è è ñ î ñ ì ë ê î ë é ê � í î � í ï ì ë ò ÿ ÷ ì è é ñ ó í ç è é � ð ÿ í ï î ë é ê � í ì ë ñ ó í ì ñ ó ë ÿî ë é ê � ð ÿ ý î í ê ë ì ë ñ í � � ÷ ì è é ç ë î í î ë ÿ ï ç è ÿ ñ ì è � î 9 ç ë î í � ç è ÿ ñ ì è � î ë é ê � ð ÿ ý ð î ñ ó í ï í ÷ ë þ � ñ û� â 0 � 0 � � � ï ì ë ò î � è è ñ î ñ ì ë ê î ë é ê � í î ò ð ñ ó è þ ñ ì í î ê í ç ñ ñ è ç è � ë ì ð ë ñ í î ñ ì ë ñ ë û & � ï í ÷ ë þ � ñ �î ë é ê � í î ë ì í ï ì ë ò ÿ ÷ ì è é ò ð ñ ó ð ÿ ç è � ë ì ð ë ñ í î ñ ì ë ñ ë ò ó í ÿ î ñ ì ë ñ ð ÷ ð í ï ç è � ë ì ð ë ñ í ë ï ' þ î ñ é í ÿ ñð î ì í : þ í î ñ í ï � ð ë ñ ó í � ( ) ã â � ä æ ë ÿ ï � ( ) � â ( � � ä æ è ê ñ ð è ÿ î ûã � ; 0 � � á ä * + , - . / � æ î ê í ç ð ÷ ð í î � ë ì ð ë � � í î ð ï í ÿ ñ ð ÷ � ð ÿ ý � è è ñ î ñ ì ë ê ì í î ë é ê � ð ÿ ý ç � þ î ñ í ì î û í íñ ó í ã � ; 0 � � á ä æ è ê ñ ð è ÿ ð ÿ < = > 8 â â � 0 � á � � û

Page 63: Current Methods for Evaluating Prediction Performance of

? @ A B C D @ E F G H I J K L I M N O P Q R P S Q T R Q R Q U V W P X Q Y Z [ \ U V W P ] ^ V R _ R _ P ` a a R S R O Q b O P S c W R S U a O R _ PV d N W c e P e S R Q R V S R V N S f g A [ \ [ N Q d ` P O c d a d R _ V S U V W P R a h V P ^ R _ P ` a a R S R O Q b O P S c W R SQ i Q V d f

? @ j D \ k @ S b P N V U V P S R _ Q R V U R _ P S b P N V U V P e U V W P Q W O P Q e l P m V S R S n R _ P d R _ P P m V S R V d i U V W PS _ a c W e ` P a h P O ^ O V R R P d f

D @ o @ D E p M S b P N V U V P S R _ P N a d U V e P d N P W P h P W U a O q r S Q S Q b P O N P d R Q i P f s _ P e P U Q c W R V SD @ o @ D E t u M a O Q S S P R ` l A @ [ D @ o @ D f

v P ^ h Q O V Q ` W Pw @ x y g i P d P O Q R P S d P ^ h Q O V Q ` W P S y g z { Q d e y g | } n R a _ a W e R _ P m ~ h Q W V e Q R V a d i P d P O Q R P e � a e P W

b O P e V N R P e h Q W c P S U O a � � a e P W S S b P N V U V P e V d � � Z z E � K � H G � � M { Q d e � � Z | E � K � H G � � M } f? @ j D \ k @ O P � c P S R S R _ Q R P m V S R V d i h Q O V Q ` W P S y g z { Q d e y g | } ` P a h P O ^ O V R R P d ` l w @ x y g f

� \ o @ Z ? @ A � D [ AC x k ? � k S Q h P S R _ P U a W W a ^ V d i V d ? E M n ^ _ P O P � � K � V S a d P a O � a O P a U \ � k n j \ � k n ? � k n a O? � k C x o n N a O O P S b a d e V d i R a R _ P O P � c P S R P e S c � � Q O l S R Q R V S R V N S �

T N Q W Q O S? E � � K � z M S R Q R V S R V N P S R V � Q R P U a O U V O S R � a e P W b O P e V N R a O? E � � K � | M S R Q R V S R V N P S R V � Q R P U a O S P N a d e � a e P W b O P e V N R a O? E � � K � Z @ D [ \ M P S R V � Q R P e V U U P O P d N P n � � K � | ~ � � K � z? E A @ � � � K � z M ` a a R S R O Q b S R Q d e Q O e ~ P O O a O P S R V � Q R P U a O U V O S R b O P e V N R a O S R Q R V S R V N? E A @ � � � K � | M ` a a R S R O Q b S R Q d e Q O e ~ P O O a O P S R V � Q R P U a O S P N a d e b O P e V N R a O S R Q R V S R V N? E A @ � � � K � Z @ D [ \ M ` a a R S R O Q b S R Q d e Q O e ~ P O O a O P S R V � Q R P U a O R _ P e V U U P O P d N P n � � K � | ~

� � K � z

� y \ � j D @ A� S P v a O R a d d P a d Q R Q W Q c e V a W a i l e Q R Q S P R

Y � A @ � [ [ j � � � D \ g A Y B � k ? k Y � ? w � j @ j @ � g � � � � Z \ [ \ � x x � A | � k D @ \ ?T V d i W P � a e P W

Y C x k ? � k � o z � | Z C B @ \ ? � � z � � � Z z E k � ? ? \ w @ w @ x Z @ ? Mq W c S R P O P e ` a a R S R O Q b S Q � b W V d i � a ` S P O h Q R V a d S U a O ` a R _ P Q O S

Y C x k ? � k � o z � | Z � � � Z z E k � ? ? \ w @ w @ x Z @ ? M k D � A [ @ ? E C Z M \ � k j \ � k E Y | � M ? � k E Y | � Mv P S R P e � a e P W S

Y C x k ? � k � o z � | Z C B @ \ ? � � z � A C [ @ � � � � � Z z E k � ? ? \ w @ w @ x Z @ ? M � � Z | E k � ? ? \ w @ w @ x Z @ ?� z M \ Z � k � o E A C [ @ M \ Z � � � Z @ D E A [ ? \ [ M \ � k j \ � k E Y | � M ? � k E Y | � M

� O P e V N R a O S a ` R Q V d P e ^ V R _ a c R N O a S S ~ h Q W V e Q R V a d � S Q h P Q d e b W a R b O P e V N R P e h Q W c P SY C x k ? � k � o z � | Z C B @ \ ? � � z � A C [ @ � � � � � Z z E k � ? ? \ w @ w @ x Z @ ? M � � Z | E k � ? ? \ w @ w @ x Z @ ?

� z M x � k o \ Z � k � o E A C [ @ M \ Z � � � Z @ D E A [ ? \ [ M w @ x y gY ? � k k � ? o @ Z y g z y g |

� @ B @ ? @ x k @ A� a e e � Q d e � P b P � T f �     ¡ f � Q O R V Q W ¢ � q P S R V � Q R V a d Q d e O P i O P S S V a d f £ G ¤ L I � � G ¥ �

¦ § � ¨ © ª ~ ¨ � ¡ f« c Q d i ¬ n � P b P � T f ­ V a � Q O ® P O P h Q W c Q R V a d c S V d i R _ P N a d R O a W S Q S Q O P U P O P d N P b a b c W Q R V a d f

£ G ¤ � � K � G � � G ¥ � X V d b O P S S ]¯ Q d P S « n � P b P � T f �     ° f ¢ e ± c S R V d i U a O N a h Q O V Q R P S V d S R c e V P S a U e V Q i d a S R V N n S N O P P d V d i n

a O b O a i d a S R V N � Q O ® P O S � Q d a W e N a d N P b R V d Q d P ^ S P R R V d i f ² L I � G ¥ K J ³ ¤ ´ � J K H ¤ Fµ ¶ G · I L G ¤ H ¤ ¸ ¹ © ¨ ° � ° § ~ § º f

Page 64: Current Methods for Evaluating Prediction Performance of

» ¼ ½ ¾ ¿ À Á  ¾ à ¾ Ä Å Æ Ç È É Ê ¿ Ë Ì ½ Í Î Ï Ð Ñ Ï Ò ¼ Ð Ì ¼ Ë ¾ ¾ Î Î ¾ Ñ Ë ¿ Ï ½ Ñ Ó ¼ ¿ ¿ Ì Î Ì Ñ ¼ Ë Ì Ï ½ ¼ Ñ Ñ Ê Ð ¼ Ñ Ô Ê ¿ Ì ½Ñ Ï Ò ¼ Ð Ì ¼ Ë ¾ Õ ¼ È É Ê ¿ Ë ¾ È Ð Ï Ñ Ñ Ê Ð Ò ¾ Æ Ö × Ø Ù Ú Û Ü × Ý Þ ß Ì ½ à Р¾ ¿ ¿ à

 ¾ à ¾ Ä Å Á á Ï ½ Í Ë Ï ½ â Æ ã ä ä å Æ Å Ë ¼ ½ È ¼ Ð È Ì æ Ì ½ Í ç ¼ Ð è ¾ Ð ¿ Ë Ï ¾ Ò ¼ Ó Ê ¼ Ë ¾ ¼ ½ È Ñ Ï ç à ¼ Ð ¾ Ë é ¾ Ì Ðà ¾ Ð Î Ï Ð ç ¼ ½ Ñ ¾ ¿ Æ ê ë × ì Ú Ù × Ø í Ø î ï Æ ð ñ ß å à ò å ó ô õ ñ ä ö Æ

 ¾ à ¾ Ä Å Æ ã ä ä ö Æ ÷ ø Ú ù Û Þ Û × ú Û × û Þ í ê ü Þ í ý Þ Û × Ø þ Ø ÿ � Ú ì × û Þ í ÷ Ú ú Û ú ÿ Ø Ü � í Þ ú ú × ÿ × û Þ Û × Ø þ Þ þ� Ü Ú ì × û Û × Ø þ � � � Î Ï Ð È � ½ Ì Ò ¾ Ð ¿ Ì Ë Ô Â Ð ¾ ¿ ¿ Æ

� � � �

â ¼ Ð Ô á Ï ½ Í Ë Ï ½� Ð ¾ È À Ê Ë Ñ é Ì ½ ¿ Ï ½ � ¼ ½ Ñ ¾ Ð � ¾ ¿ ¾ ¼ Ð Ñ é � ¾ ½ Ë ¾ ÐÅ ¾ ¼ Ë Ë Ó ¾ Á � Ç

Í Ó Ï ½ Í Ë Ï ½ � Î é Ñ Ð Ñ Æ Ï Ð ÍÄ ¼ Ð Í ¼ Ð ¾ Ë Â ¾ à ¾� Ð ¾ È À Ê Ë Ñ é Ì ½ ¿ Ï ½ � ¼ ½ Ñ ¾ Ð � ¾ ¿ ¾ ¼ Ð Ñ é � ¾ ½ Ë ¾ Ð ¼ ½ È � ½ Ì Ò ¾ Ð ¿ Ì Ë Ô Ï Î � ¼ ¿ é Ì ½ Í Ë Ï ½Å ¾ ¼ Ë Ë Ó ¾ Á � Ç

ç ¿ à ¾ à ¾ � Ê Æ � ¼ ¿ é Ì ½ Í Ë Ï ½ Æ ¾ È ÊÀ Ï Ó Ó Ô » ¼ ½ ¾ ¿� Ð ¾ È À Ê Ë Ñ é Ì ½ ¿ Ï ½ � ¼ ½ Ñ ¾ Ð � ¾ ¿ ¾ ¼ Ð Ñ é � ¾ ½ Ë ¾ ÐÅ ¾ ¼ Ë Ë Ó ¾ Á � Ç

é É ¼ ½ ¾ ¿ � Î é Ñ Ð Ñ Æ Ï Ð Í

� � � � � �

� ½ Ó Ì ½ ¾ ò � � � � � � � � Á � � � � � � � Á � � � � � � ß Ì Î Ì ½ ¿ Ë ¼ Ó Ó ¾ È à

Page 65: Current Methods for Evaluating Prediction Performance of

ReferencesCurrent Methods for Evaluating Prediction Performance of Biomarkers and Tests

1. Patel UD, Garg AX, Krumholz HM, Shilpak MG, Coca SG, Sint K, Theissen-

Philibrook H, Koyner JL, Swaminathan M, Passik CS, Parikh CR for the

TRIBE-AKI Consortium. Pre-operative serum brain narituretic peptide and

risk of acute kidney injury after cardiac surgery. Circulation (submitted).

2. Seymour CW, Kahn JM, Cooke CR, Watkins TR, Heckbert SR, Rea TD.

Prediction of critical illness during out-of-hospital emergency care. Journal

of the American Medical Association 2010 304:747–54.

3. Kerlikowske K, Molinaro AM, Gauthier ML, Berman HK, Waldman F, Ben-

nington J, Sanchez H, Jimenez C, Stewart K, Chew K, Ljung BM, Tlsty

TD. Biomarker expression and risk of subsequent tumors after initial ductal

carcinoma in situ diagnosis. Journal of the National Cancer Institute 2010

102:627–637.

4. Pencina MJ, D’Agostino RB, D’Agostino Jr, RB, Vasan RS. Evaluating the

added predictive ability of a new marker: From area under the ROC curve to

reclassification and beyond. Statistics in Medicine 2008 27:157–172.

5. Pepe MS. Problems with risk reclassification methods for evaluating predic-

tion models. American Journal of Epidemiology 2011 173:1327–1335.

6. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of

prediction models: a framework for traditional and novel measures. Epidemi-

ology 2010 21:128–138.

7. Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, et al. In-

tegrating the predictiveness of a marker with its performance as a classifier.

American Journal of Epidemiology 2008 167:362-368.

8. Hosmer DW, Lemeshow S. Goodness of Fit Tests for the Multiple Logis-

tic Regression Model. Communications in Statistics, Theory and Methods.

1980 A9:1043–1069.

9. Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema

JD. Validation and updating of predictive logistic regression models: a study

on sample size and shrinkage. Statistics in Medicine 2004 23:2567–2586.

Page 66: Current Methods for Evaluating Prediction Performance of

10. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Devel-

opment, Validation, and Updating. Springer 2009.

11. Gail MH. Value of adding single-nucleotide polymorphism genotypes to a

breast cancer risk model. Journal of the National Cancer Institute 2009

101:959–963.

12. Expert Panel on Detection, Evaluation, and Treatment of High Blood Choles-

terol in Adults. Executive Summary of The Third Report of The National

Cholesterol Education Program (NCEP) Expert Panel on Detection, Evalua-

tion, And Treatment of High Blood Cholesterol In Adults (Adult Treatment

Panel III). Journal of the American Medical Association 2001 285:4286–

2497.

13. Baker SG. Putting risk prediction in perspective: Relative utility curves. Jour-

nal of the National Cancer Institute 2009 101:1538–1542.

14. Vickers AJ, Elkin, EB. Decision Curve Analysis: A Novel Method for Evalu-

ating Prediction Models. Medical Decision Making 2006 26:565–574.

15. Huang Y, Pepe MS, Feng Z. Evaluating the predictiveness of a continuous

marker. Biometrics 2007 63:1181–1188.

16. Bura E, Gastwirth JL. The Binary Regression Quantile Plot: Assessing the

Importance of Predictors in Binary Regression Visually. Biometrical Journal

2001 43:5–21.

17. Stern RH. Nocebo Responses to Antihypertensive Medications. Journal of

Clinical Hypertension 2008 10:723-725.

18. Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Bio-

statistics 2005 6:227–239.

19. Pepe MS, Feng Z, Gu JW. Comments on ’Evaluating the added predictive

ability of a new marker: From area under the ROC curve to reclassification

and beyond’ by M. J. Pencina et al., Statistics in Medicine 2008 27:173–181.

20. Gu W, Pepe M. Measures to summarize and compare the predictive capacity

of markers. International Journal of Biostatistics 2009 5:Article 27.

21. Pencina MJ, D’Agostino RB Sr, Steyerberg EW. Extensions of net reclassifi-

cation improvement calculations to measure usefulness of new biomarkers.

Statistics in Medicine 2011 15;30(1):11-21.

Page 67: Current Methods for Evaluating Prediction Performance of

22. Pfeiffer RM, Gail MH. Two criteria for evaluating risk prediction models. Bio-

metrics 2011 67:1057–1065.

23. Cook NR. Use and misuse of the receiver operating characteristic curve in

risk prediction. Circulation 2007 115:928–935.

24. Cook NR, Ridker PM. Advances in measuring the effect of individual predic-

tors of cardiovascular risk: the role of reclassification measures. Annals of

Internal Medicine 2009 150:795–802.

25. Pepe MS, Kerr KF, Longton GM, Wang Z. Testing for improvement in pre-

diction model performance. UW Biostatistics Working Paper Series. 2011

Working Paper 379.

http://www.bepress.com/uwbiostat/paper379

26. Kerr KF, McClelland RL, Brown ER, Lumley T. Evaluating the incremental

value of new biomarkers with integrated discrimination improvement. Amer-

ican Journal of Epidemiology 2011 174:364–374.

27. Vickers AJ, Cronin AM, Begg CB. One statistical test is sufficient for assess-

ing new predictive markers. BMC Medical Research Methodology. 2011

11:13.

28. Janes H, Pepe MS, Bossuyt PM, Barlow WE. Measuring the performance of

markers for guiding treatment decisions. Annals of Internal Medicine 2011

154:253–259.

29. Pauker SG, Kassierer JP. The threshold approach to clinical decision mak-

ing. New England Journal of Medicine 302:1109–1117.

Page 68: Current Methods for Evaluating Prediction Performance of

Glossary

AARD =above average risk difference=% cases above ρ − % controls above ρ=standardized total gain statistic where total gain is the area between the

predictiveness curve and the horizontal useles test curve=Youden’s index at

ρ = .5×continuous NRI comparing r(X ) to null model without predictors=.5× cat −NRI comparing to null model and using two risk categories defined

by threshold ρ.

AUC = area under the ROC curve

B = expected benefit of treatment to a subject who would be a case in the ab-

sence of treatment

(C , N) = subscripts for cases=C and controls = N

Cat − NRI= Cat −NRI (event)+cat −NRI (non-event), which is a number in (0, 2)

Cat − NRI (event)=proportion of cases above diagonal of a risk reclassification

table − proportion of cases below diagonal of the table.

Cat −NRI (non-event)=proportion of controls below versus above the diagonal of

the non-event risk reclassification table

Cont − NRI = Cont −NRI (event) + cont − NRI (non − event)

Cont−NRI (event)= Prob(risk(X , Y ) > risk(X )|case)−Prob(risk(X , Y ) < risk(X )|case)= net proportion of cases whose risks are increased by including Y .

Cont − NRI (non-event)= Prob(risk(x , Y ) < risk(X )|control) − Prob(risk(X , Y ) >risk(X )|control) =net proportion of controls whose risks are decreased by

including Y

Cost = expected cost of treatment to a control (including toxicities, expense, etc.)

HRC =% cases whose risks are in the high risk category

HRC (t) = P(r(X ) > t |D = 1) where t is a risk threshold

HRN =% controls whose risks are in the high risk category

HRN(t) = P(r(X ) > t |D = 0)

Page 69: Current Methods for Evaluating Prediction Performance of

IDI =integrated discrimination improvement= MRD for the larger model−MRD

for the smaller model

L−1(fc) =proportion of the population with risk above the threshold exceeded by

fc of cases = (1 − ρ)∗ROC−1(fc) + ρ∗fc

L(fp) =proportion of cases with risk above the threshold exceeded by fp subjects

in the population = ROC (fn) where fn is defined by fp = p∗ROC (fn)+(1−p)∗fn

MRD =mean risk (cases)− mean risk (controls) = IDI relative to a baseline model

without any predictors=Yates slope= proportion of variation explained= R2

n = total number of subjects in the cohort

nC = number of cases (events)

nN = number of controls (non-events)

NB=net benefit= B × P(D = 1)HRC (t) − CostP(D = 0)HRN(t) = ρ∗HRC − (1 −ρ)∗(t/(1− t))∗HRN where t is the risk threshold of treating a case has benefit

B ≡ 1

partial-AUC (f0)f0 = {area under the ROC over the false positive range (0, f0)/f0 =average

TPR over FPR∈ (0, f0)

Predictiveness curve: a plot of the quantiles of risk(X ) in the population

ρ = P(D = 1)

RU(t) =relative utility = NB(t)/ρ = proportion of maximum possible utility (ρ is

max possible NB when no treatment is the default)

reclassification %=total % of subjects for whom risk category based on r(x) is

not equal to the risk category based on r(X , Y )

risk(X ) = P(D = 1|X ) = frequency of events among subjects with covariate value

X = r(X )

risk(X , Y ) = P(D = 1|X , Y ) = r(X , Y )

ROC−1(fc) =proportion of controls with risks above the threshold exceeded by a

proportion fc of controls

Page 70: Current Methods for Evaluating Prediction Performance of

ROC (fn) =proportion of cases with risks above the threshold exceeded by a pro-

portion fn of controls

Total Gain = area between the predictiveness curve and the horizontal line at ρ.

It can be standardized by dividing by 2ρ(1 − ρ). See AARD.