75
Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Diagnostic Testing & Predictive Models

John Kwagyan, PhD

Howard University College of Medicine

Design, Biostatistics & Population Studies

GHUCCTS

1

Page 2: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

'physicians must be content to end not in certainties, but rather in statistical probabilities.

The physician thus has a right to feel certain, within statistical constraints, but never cocksure.

Absolute certainty remains for some theologians - and like-minded physicians'.

Am J Cardiol 1975;36:592-62

Page 3: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Objective

To understand the usage of diagnostic measures and screening tools

3

Page 4: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Outline

• Examples

• Why/What Diagnostic Testing

• Measures of Diagnostic accuracy

• ROC Curves

• Adaptation of Diagnostic/Screening Tools

• Predictive Models

4

Page 5: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

EXAMPLES

5

Page 6: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

4P’s Plus Screening Instrument -Substance Abuse in Pregnant Women

What is a positive assessment??J. Perinatology 2005

6

Page 7: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Index to Predict Relapse in Asthma

Factor Score 0 Score 1

Pulse <120 >= 120

Respiration <30 >=30

Pulsus Paradoxus <18 >=18

Peak Flow Rate >120 <=120

Dyspnea Absent or mild Moderate or Severe

Accessory muscle use Absent or mild Moderate or severe

Wheezing Absent or mild Moderate or severe

Fischl et al, NEJM 1981

Positive Test => Score 4 or more7

Page 8: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Study Design:  A total of 228 pregnant women underwent screening…. Reliability, sensitivity, specificity, and positive and negative predictive validity were conducted.

Result:  Overall reliability for the five-item measure was 0.62. Seventy-four (32.5%) of the women had a positive screen. Sensitivity and specificity were very good, at 87 and 76%, respectively. Positive predictive validity was low (36%), Negative predictive validity was quite high (97%).

Conclusion:  The 4P's Plus reliably and effectively screens pregnant women for risk of substance use, including those women typically missed by other perinatal screening methodologies.

Validation of the 4P's Plus© screen for substance use in pregnancy validation of the 4P's Plus- J. Perinatology (2007)

8

Page 9: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Maternal Biochemical Serum Screening for Down

Syndrome in Pregnancy With HIV Infection

To estimate the influence of HIV infection and antiretroviral therapy on maternal serum markers levels and the false-positive rate with biochemical maternal serum screening for Down syndrome.

Obstetrics & Gynecology9

Page 10: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Inability to Predict Relapse in Acute Asthma - NEJM 1984;310(9)

• Fischl & Co. developed an index to predict relapse in patients with acute asthma.

• Based on data from ER patients in Miami, FL

• Reported 95% sensitivity and 97% specificity

• Dramatic drop in accuracy when externally validated on patients in Richmond, VA?????

10

Page 11: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Other Examples

• SSAGA for alcohol dependence

• Genetic Screening for hereditary disease

• Etc

11

Page 12: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Why Diagnostic Testing

• Accurate screening/diagnosis of a health condition is often a first step towards its prevention or control

• Need for fast, inexpensive and RELIABLE tools

12

Page 13: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Purpose of Diagnostic Testing

• A (binary) diagnostic test performance is designed to determine whether a target condition is present or absent in a subject from the intended use population.

• the target condition can refer to

- a particular disease

- a disease stage

- health status or condition

that should prompt clinical action, such as the initiation, modification or termination of treatment, counseling, etc

13

Page 14: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Test Scale

• Binary - underlying measure is usually continuous presence or absence of a diseased

• Continuous (quantitative) - biomarkers for cancer PSA are measured as serum concentration - creatinine for kidney malfunction - blood sugar for diabetes, - cholesterol for dyslipidemia

• Ordinal - clinical symptoms moderate, severe, highly severe - Index Score : 0, 1, 2, 3, 4, 5 14

Page 15: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Other Test Scales

• Likert-type rating - highly disagree, disagree, neutral, agree, highly agree

• Nominal ** - genotype groups - ApoE Genotypes: E2/E2, E2/E3, E2/E4, E3/E3, E3/E4, E4/E4

15

Page 16: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

What is Diagnostic Testing?• Evaluation of a (new) test to determine whether a target

condition is present by comparison with a benchmark!

• Evaluation of the ability of a test to classify subjects as diseased or disease-free

• For non-binary scales, a classification rule is set by a threshold - PSA > 4.0ng/ml - Blood glucose > 126 mg/dL - BP >140/90 mmHg -used to be 160/95 mmHg - contemplating 130/85 mmHg???

16

Page 17: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

………… …………

That we are in the midst of crisis is now well understood. Our nation is at war,…………. Our economy is badly weakened, ……….. Homes have been lost; jobs shed; businesses shuttered. Our health care is too costly; our schools fail too many;..

These are the indicators of crisis, subject to data and statistics.

………………………………

Pres. Barack Obama (Inaugural speech)

Page 18: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Benchmarks for Comparison

1. comparison to a reference (Gold) standard

- considered to be the best available method for establishing the presence or absence of the target condition

- can be a single method, or a combination of methods, including clinical follow-up.

2. comparison to a non-reference standard

- method other than a reference (Gold) standard.

Note!!!: The choice of comparative method will determine which performance measures may be reported

18

Page 19: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Some Conventional Tests

• Bacterial cultures - strep throat, urinary tract infection, meningitis, etc• Imaging technology – X-ray for bone fracture, - CT scans for brain injury - MRI for brain injury• Biochemical markers - serum creatine for kidney dysfunction - serum bilirubin for liver dysfunction - blood glucose for diabetics - blood for HIV

19

Page 20: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Other Conventional Tests

• Expert judgment

present or absence of heart murmur

• Response to Questionnaire !!!!

substance abuse

• Experts Interview or Observation

schizophrenic, bi-polar, major depression

• Radiologists score of mammograms

no cancer, benign cancer, possible malignancy, malignancy

20

Page 21: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Measures of Accuracy

Validation

21

Page 22: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Validation

• it is the evaluation of the accuracy of the test

• can only be established by comparing with the Gold Standard

• validity is measured by sensitivity and specificity

The extent to which a test measures what it is supposed to measure!!!

22

Page 23: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Measures for Accuracy

• Sensitivity => True Positive Rate • Specificity => True Negative Rate

• False Negative Rate (FNR) = 1- sensitivity• False Positive Rate (FPR) = 1- specificity

• Predictive values positive predictive value negative predictive value

• Diagnostic Likelihood Ratios LR+ = TPR/FPR = SenS/(1-SpeC) LR- = FNR /TNR = (1-SenS)/SpeC

• ROC Curves 23

Page 24: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Sensitivity & Specificity

• Sensitivity is the ability of a test to correctly classify an individual as ‘diseased’.

Estimated as the proportion of subjects with the target condition in whom the test is positive

• Specificity is the ability of a test to correctly classify an individual as ‘disease-free’.

Estimated as the proportion of subjects without the target condition in whom the test is negative

Best illustrated using a 2 X 2 table!!!!!24

Page 25: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Diagnostic Testing

True Disease Status ________________________________________________________________

Diseased Disease-free______________________________________________________________________________________________________

Test Result

Positive No Error Error I

Negative Error II No Error N1=total diseased N2 =total disease-free

25

Page 26: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Sensitivity & Specificity

True Disease Status ________________________________________________________________

Diseased Disease-free______________________________________________________________________________________________

Test Result Positive True-positive False-positive

Negative False-negative True-negative

• Sensitivity ~ ability of a test to detect a disease when present => True-positive fraction• Specificity ~ability to indicate disease-free when absent =>True-negative fraction 26

Page 27: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Consequence of Diagnostic Errors

• False negative errors, i.e., missing disease that is present. -can result in people foregoing needed treatment for the disease - the consequence can be as serious as death

• False positive errors, i.e., falsely indicating disease - disease-free are subjected to unnecessary work-up procedures or even treatment. - negative impact include personal inconvenience and/or unnecessary stress, anxiety, etc .

27

Page 28: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Estimating Sensitivity & Specificity

(Estimate) SenS = TP/T_D (Estimate) SpeC = TN/T_Df False Negative rate = FN/T_D = 1- SenS False Positive Rate = FP/T_Df = 1- SpeC

True Diseased Status

Test Diseased Disease-free Total

Positive TP FP T+

Negative FN TN T-

Total T_D T_Df T_N

28

Page 29: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

ExampleCoronary Artery Surgery Study

SenS = 815/1023 = 0.79 CI=[0.77,0.82] FNF = 208/1023 = 0.19 CI=[0.18, 0.23]

SpeC = 327/442 = 0.74 CI =[70, 78] FPF = 115/442 = 0.26 CI =[0.22, 0.30]

CADGold Standard Arteriography

EST Diseased Disease-free

Positive 815 115 930

Negative 208 327 535

1023 442 1465

29

Page 30: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Absolute certainty remains for some theologians - and like-

minded physicians!!!.

30

Page 31: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

If Ever There is a Perfect Test!Ideal Case

True Diseased Status

Test Diseased Disease-free

Positive TP FP = 0 T+

Negative FN = 0 TN T-

T_D (=TP)

T_Df (=TN)

T_N

SenS = TP/T_D = 100%

SpeC = TN/T_Df = 100%

FNF = FN/T_D=1-SenS = 0%

FPF = FP/T_Df=1-SpeC = 0%31

Page 32: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Uninformative (Useless)Tests• Test is uninformative, if test result is unrelated to disease status

• the probability distributions of the measure are the same in both diseased and disease-free populations

• for uninformative tests, SenS = 1- SpeC TPF = FPF

Ex: Exercise Stress Test to determine Diabetes, HIV, etc

• Test is informative if: SenS + SpeC > 1

32

Page 33: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Clinical Application

Detection of Primary Angle Closure GlaucomaGold Standard = Gonioscopy

Test SenS (%) SpeC(%)

Intraocular Pressure 47 92

Torch Light Test 80 70

van Herick Test 61.9 89.3

Indian J Ophthalmology 2008;56:45-50

Which test should we use to screen for PACG??

33

Page 34: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Sensitivity vs. SpecificityRule Out & Rule In

• Test with high degree of sensitivity have a low FNR, - they ensure that not many true cases of the disease are missed.

• A screening test which is used to “rule out” a diagnosis, should have a high degree of sensitivity

•Test with high degree of specificity have a low FPR - they ensure that not many patients are misdiagnosed.

• A confirmatory test which is used to “rule in” a diagnosis, should have a high degree of specificity

34

Page 35: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Clinical Application

Detection of Primary Angle Closure GlaucomaGold Standard = Gonioscopy

Test SenS (%) SpeC(%)

Intraocular Pressure 47 92

Torch Light Test 80 70

van Herick Test 61.9 89.3

Which test should we use to screen for PACG?? How about Combining the tests !!!!

35

Page 36: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Combining Tests!

• 2 ways for performing combination Tests in parallel in series

• 2 Rules for combination "the OR rule" “the AND rule”

36

Page 37: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Combining 2 Tests

• “OR rule” Test is positive, if either test is positive Test is negative, if both are negative

SenS (Combo Test) = SenS1 + SenS2- SenS1*SenS2 SpeC (Combo Test) = SpeC1*SpeC2

• “AND rule” Test is positive if only both A and B are positive Test is negative if either A or both are negative

SpeC (Combo Test) = SpeC1 + SpeC2 –SpeC1*SpeC2 SenS (Combo Test) = SenS1*SenS2

37

Page 38: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Clinical ApplicationCombining Test

Detection of Primary Angle Closure GlaucomaGold Standard = Gonioscopy

Test SenS SpeCTorch Light Test 80 70

van Herick Test 62 89.3

Combined Test (OR RULE)

92.4 62.3

Combined Test (AND RULE)

49.6 97

38

Page 39: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Combining Tests

• "the OR rule" increases SenS ~ useful for screening tests to “rule out” diagnosis

• “the AND rule” increases SpeC ~ useful for confirmatory tests to “rule in” diagnosis

39

Page 40: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive Values Daring Clinical Questions

How likely is the disease, given the test result??

- what is the likelihood of disease when test is positive? - what is likelihood of non-disease when test is negative?

Answers to these questions are known as the predictive values

40

Page 41: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive Values

• +PV – fraction of test positives who are diseased

PPV = probability ( diseased | positive test)

• - PV – fraction of test negatives who are non-diseased

NPV = probability (disease-free | negative test)

41

Page 42: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive Values

Positive Predictive Value = TP/T+

Negative Predictive Value = TN/T-

True Diseased Status

Test Diseased Disease-free

Positive TP FP T+

Negative FN TN T-

T_D T_Df T_N

42

Page 43: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive ValuesExample

PPV = 815/930= 87%

NPV= 327/535= 61%

True Diseased Status

Test Diseased Disease-free

Positive 815 115 930

Negative 208 327 535

1023 442 1465

43

Page 44: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive Values

• A perfect test will predict perfectly, i.e., PPV = 1, NPV=1

• Predictive values depend on the prevalence of the disease PPV decreases with decreasing prevalence - low PPV may simply be a result of low prevalence NPV decreases with increasing prevalence

• Useless Test if: PPV = prevalence and NPV = 1- prevalence

• PV are not used to quantify the inherent accuracy of the test44

Page 45: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Attributes of Measures

Classification Probabilities Predictive Values

Perfect Test SenS = 1 SpeC = PPV=1, NPV=1

Useless Test SenS = 1- SpeC PPV=ρ, NPV=1-ρ

Context Accuracy Clinical prediction

Question addressed To what degree does the test reflect the true disease state

How likely is the disease given the test result

Affected by disease prevalence?

No Yes

45

Page 46: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Study Design:  A total of 228 pregnant

Result:  Seventy-four (32.5%) a positive screen = prevalence!!!!!.

Sensitivity = 87% => missed 13% of diseased Specificity = 76% => incorrectly classified 24% as diseased

Positive predictive validity = 36% => fraction w/D that tested +ve Negative predictive validity = 97% => fraction wo/D that tested -ve

Conclusion:  The 4P's Plus reliably and effectively screens pregnant women for risk of substance use, including those women typically missed by other perinatal screening methodologies.

Validation of the 4P's Plus© screen for substance use

46

Page 47: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

ROC CurvesNon-Binary Scales

47

Page 48: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

ROC Curves• ~ for evaluating tests that yield on a non-binary scale, set by a threshold.

BP > 140/90 mmHg for Hypertension ~ used to be 160/95 mmHg ~ contemplating 130/85 mmHg

• BP values can fluctuate in any individual-healthy or diseased- there will be some overlap of values in the disease population.

•The choice of a threshold depends on the trade-off that is acceptable between failing to detect disease and falsely identifying disease

•The ROC curve is a devise that describes the range of trade-offs that can be achieved 48

Page 49: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

ROC Curves

• Plot of SenS against 1-SpeC for all possible thresholds

• It is a visual representation of the global performance of the test

• ROC plot shows the trade-off of sensitivity and specificity

• Test is Useless (Uninformative Test) if for any threshold, c SenS = 1- SpeC

49

Page 50: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Construction of ROC Curves

Cutoff SenS SpeC 1-SpeC Comments

0 100 0 100 Ideal case

110 98 20 80

120 95 40 60

130 92 60 40

140 78 80 20

150 55 90 10

160 40 92 8

500 0 100 0

Calculate SenS and (1-SpeC) for all possible cutoff point

50

Page 51: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

51

Page 52: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

52

Page 53: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Attributes of ROC curves

• Provides complete description of potential performance of a test

• Facilitates comparing and combining information across studies of the same test

• Guides the choice of threshold in application

• Provides mechanism for relevant comparison between different tests

53

Page 54: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Reporting of EstimatesVariability

• Point estimates of sensitivity, specificity, predictive values, are not sufficient.

• Confidence intervals reflect the uncertainty of the estimates, and should be reported.

• Focus is on confidence intervals to characterize the performance of the test and not on hypothesis testing

54

Page 55: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Sources of Bias

Diagnostic Testing are subject to an array of biases:

• Verification bias -study selectively includes diseased for verification

• Imperfect reference standard - error in reference stand!

• Spectrum bias subjects may not be a complete representative of the population, i.e., important subgroups are missing 55

Page 56: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Measures of Agreement

Reliability

56

Page 57: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Reliability vs. Validity

• Sometimes the goal is estimate the validity (accuracy) of ratings in the absence of a "gold standard."

• Other times one merely wants to know the consistency of ratings made by different raters.

• In some cases, the issue of accuracy may even have no meaning--for example ratings may concern opinions, attitudes, or values

57

Page 58: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Measures of Agreement

• Reliability Coefficients

positive percent agreement

negative percent agreement

Overall agreement

overall % agreement

Kappa- how much is due to chance???

58

Page 59: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Other Measures of Agreement

• B-Statistics

• McNemar Test

• Latent class models

• Bayesian methods

59

Page 60: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Agreement

Test 2

Test 1 Positive Negative

Positive AGREEMENT DISAGREEMENT

Negative DISAGREEMENT AGREEMENT

60

Page 61: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Raw AgreementTest 1

TEST 2 Positive Negative Total

Positive 40 1 41

Negative 1 512 513

Total 41 531 572

Positive % agreement = 40/41 = 97.6%

Negative % agreement = 512/531 = 96.4%

Overall % agreement = (40+512)/572 = 96.5

61

Page 62: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Limitation of Agreement Measures

• Reliability is not proof of validity; - two tests can report the same readings, but be wrong

• It does not tell how the disagreements occurred: - whether the positive and negative results are evenly distributed

• Does not tell the extent to which the agreement occurs by chance

62

Page 63: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Generalization

Cultural Adaptation

63

Page 64: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Generalization

• Ultimate question (s)!!!!!

- Does the test perform well for new, unseen patients?

- Does the test perform well in other populations?

64

Page 65: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Inability to Predict Relapse in Acute Asthma - NEJM 1984;310(9)

• Fischl & Co. developed an index to predict relapse in patients with acute asthma.

• Based on data from ER patients in Miami, FL

• Reported 95% sensitivity and 97% specificity

• Dramatic drop in accuracy when externally validated on patients in Richmond, VA?????

65

Page 66: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Inability to Predict Relapse in Acute Asthma

• Index to predict relapse in patients with acute asthma.

• 205 Based on data from ER patients in Miami, FL

95% sensitivity

97% specificity

• 114 Based on data from ER patients seen at Richmond, VA

18.1% Sensitivity??? 82.4% Specificity

Centor RM et al. NEJM 1984;310(9):577-580.66

Page 67: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Centor RM et al. NEJM 1984;310(9):577-580.

FL

VA

67

Page 68: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Generalization-Validation

• Internal validation - restricted to a single data set

- data splitting (or cross-validation)

• Temporal validation

- evaluation on a second data set from the same population.

• External validation

- evaluation on data from other populations, perhaps by different investigators.

68

Page 69: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Predictive Models

69

Page 70: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

• A predictive model is a model for making expectations on future events

• It is usually build from a number of predictors, and a response (or outcome) variable

Predictive Models

70

Page 71: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

When Does a Diagnostic Test Work??

Does the diagnostic test add anything to what is already known?

• Example: A diagnostic test for macular degeneration would need to show that it is better than just using a person’s age.

71

Page 72: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Covariate Modeling: Age in Home Macular Perimeter

Problem: If you sample from subjects with MD and those without, there is likely to be an age difference that could confound the assessment of HMP. This is a bias!!!!.

•Question: Is HMP just a surrogate for age?

• Solution: Build a predictive model -logistic model- using age and HMP -visual field functional defects - to predict risk of MD and see if HMP adds anything

72

Page 73: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

SummaryMedical applications

• Screening – triage => prioritization • Diagnosis – triage – management and decision making – test selection• Prognosis => Prediction – management and decision making – informing patients and their families – risk adjustment – eligibility in clinical trials

73

Page 74: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

Thank You

????????

74

Page 75: Diagnostic Testing & Predictive Models John Kwagyan, PhD Howard University College of Medicine Design, Biostatistics & Population Studies GHUCCTS 1

References

1. Weinstern et al. Clinical Evaluation of Diagnostic Tests. AJR 20052. Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and

prediction. New York: Oxford University Press

75