31
1 Revising FDA’s Revising FDA’s “Statistical Guidance “Statistical Guidance on Reporting Results on Reporting Results from Studies from Studies Evaluating Diagnostic Evaluating Diagnostic Tests” Tests” FDA/Industry Statistics Workshop FDA/Industry Statistics Workshop September 28-29, 2006 September 28-29, 2006 Kristen Meier, Ph.D. Kristen Meier, Ph.D. Mathematical Statistician, Division of Biostatistics Mathematical Statistician, Division of Biostatistics Office of Surveillance and Biometrics Office of Surveillance and Biometrics Center for Devices and Radiological Health, FDA Center for Devices and Radiological Health, FDA

1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

Embed Size (px)

Citation preview

Page 1: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

1

Revising FDA’s “Statistical Revising FDA’s “Statistical Guidance on Reporting Results Guidance on Reporting Results

from Studies Evaluating from Studies Evaluating Diagnostic Tests”Diagnostic Tests”

FDA/Industry Statistics WorkshopFDA/Industry Statistics WorkshopSeptember 28-29, 2006September 28-29, 2006

Kristen Meier, Ph.D.Kristen Meier, Ph.D.Mathematical Statistician, Division of BiostatisticsMathematical Statistician, Division of Biostatistics

Office of Surveillance and BiometricsOffice of Surveillance and BiometricsCenter for Devices and Radiological Health, FDACenter for Devices and Radiological Health, FDA

Page 2: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

2

OutlineOutline

• Background of guidance developmentBackground of guidance development• Overview of commentsOverview of comments• STARD Initiative and definitionsSTARD Initiative and definitions• Choice of comparative benchmark and implicationsChoice of comparative benchmark and implications• Agreement measures – pitfallsAgreement measures – pitfalls• BiasBias• Estimating performance without a “perfect” Estimating performance without a “perfect”

[reference] standard - latest research[reference] standard - latest research• Reporting recommendationsReporting recommendations

Page 3: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

3

BackgroundBackground

• Motivated by CDC concerns with IVDs for sexually Motivated by CDC concerns with IVDs for sexually transmitted diseasestransmitted diseases

• Joint meeting of four FDA device panels (2/11/98): Joint meeting of four FDA device panels (2/11/98): Hematology/Pathology, Clinical Chemistry/Toxicology, Hematology/Pathology, Clinical Chemistry/Toxicology, Microbiology and ImmunologyMicrobiology and Immunology

• Provide recommendations on “appropriate data collection, Provide recommendations on “appropriate data collection, analysis, and resolution of discrepant results, using sound analysis, and resolution of discrepant results, using sound scientific and statistical analysis to support indications for use scientific and statistical analysis to support indications for use of of in vitroin vitro diagnostic devices when the new device is diagnostic devices when the new device is compared to another device, a recognized reference method or compared to another device, a recognized reference method or ‘gold standard,’ or other procedures not commonly used, ‘gold standard,’ or other procedures not commonly used, and/or clinical criteria for diagnosis”and/or clinical criteria for diagnosis”

Page 4: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

4

Statistical Guidance DevelopedStatistical Guidance Developed

““Statistical Guidance on Reporting Results from Studies Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests: Draft Guidance for Evaluating Diagnostic Tests: Draft Guidance for Industry and FDA Reviewers”Industry and FDA Reviewers”

• issued in Mar. 12, 2003 with a 90-day comment periodissued in Mar. 12, 2003 with a 90-day comment period

• http://www.fda.gov/cdrh/osb/guidance/1428.htmlhttp://www.fda.gov/cdrh/osb/guidance/1428.html

• for for allall diagnostic products not just diagnostic products not just in vitroin vitro diagnostics diagnostics

• only addresses diagnostic devices with 2 possible only addresses diagnostic devices with 2 possible outcomes (positive/negative)outcomes (positive/negative)

• does not address design and monitoring of clinical does not address design and monitoring of clinical studies for diagnostic devicesstudies for diagnostic devices

Page 5: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

5

Dichotomous Diagnostic Test Dichotomous Diagnostic Test PerformancePerformance

Study PopulationStudy Population TRUTHTRUTH

Truth+Truth+ Truth Truth NewNew Test+Test+ TP (true+) FP (false+) TP (true+) FP (false+)TestTest TestTest FN (false FN (false ) TN (true ) TN (true))

estimate:estimate:sensitivity (sens)sensitivity (sens) = Pr(Test+|Truth+) = Pr(Test+|Truth+) 100% 100%××TP/(TP+FN) TP/(TP+FN) specificityspecificity (spec)(spec) = Pr(Test = Pr(Test|Truth|Truth) ) 100% 100%××TN/(FP+TN) TN/(FP+TN)

““Perfect” test: sens=spec=100% (FP=FN=0)Perfect” test: sens=spec=100% (FP=FN=0)

Page 6: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

6

Example Data: 220 SubjectsExample Data: 220 Subjects

TRUTH TRUTH Imperfect Imperfect Standard Standard + + + +

New + 44 1New + 44 1 New + 40 5New + 40 5Test Test 7 168 7 168 Test Test 4 4 171 171 totaltotal 51 169 51 169 totaltotal 44 44 176 176

UnbiasedUnbiased EstimatesEstimates Biased* Biased* EstimatesEstimatesSensSens 86.3% 86.3% (44/51)(44/51) 90.9% 90.9% (40/44)(40/44)

SpecSpec 99.4%99.4% (168/169)(168/169) 97.2% 97.2% (171/176)(171/176)

* Misclassification bias (see Begg 1987)* Misclassification bias (see Begg 1987)

Page 7: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

7

Recalculation of Performance Using Recalculation of Performance Using “Discrepant Resolution” “Discrepant Resolution”

STAGE 1 – retest discordantsSTAGE 1 – retest discordants STAGE 2 – STAGE 2 – reviserevise 2x2* 2x2* using a “using a “resolverresolver” test” test based on resolver result based on resolver result

Imperfect Standard Imperfect Standard Resolver/imperfect std.Resolver/imperfect std. ++ “+” “+” “ “” ”

New + 40New + 40 5 5 ((5+, 05+, 0)) New + 45New + 45 0 0Test Test 4 4 ((1+, 31+, 3)) 171 171 Test Test 1 174 1 174 totaltotal 44 44 176 176 totaltotal 46 174 46 174

““sens” 90.9% sens” 90.9% (40/44)(40/44) 97.8% 97.8% (45/46)(45/46)““spec”spec” 97.2% 97.2% (171/176)(171/176) 100% 100%

(174/174)(174/174)

**assumes assumes concordant=“correct”concordant=“correct”

Page 8: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

8

Topics for GuidanceTopics for GuidanceRealization:Realization:• Problems are much larger than “discrepant Problems are much larger than “discrepant

resolution”resolution”• 2x2 is an oversimplification, but still useful to start2x2 is an oversimplification, but still useful to start

Provide guidance:Provide guidance:• What constitutes “truth”?What constitutes “truth”?• What to do if we don’t know truth?What to do if we don’t know truth?• What name do we give performance measures when What name do we give performance measures when

we don’t have truth?we don’t have truth?• Describing study design: how were subjects, Describing study design: how were subjects,

specimens, measurements, labs collected/chosen?specimens, measurements, labs collected/chosen?

Page 9: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

9

Comments on GuidanceComments on Guidance

FDA received comments from 11 individuals/organizations:FDA received comments from 11 individuals/organizations:• provide guidance on what constitutes “perfect standard”provide guidance on what constitutes “perfect standard”

– remove “perfect/imperfect standard” concept and include and define remove “perfect/imperfect standard” concept and include and define “reference/non-reference standard” concept (STARD)“reference/non-reference standard” concept (STARD)

• reference and use STARD conceptsreference and use STARD concepts• provide approach for indeterminate, inconclusive, equivocal, provide approach for indeterminate, inconclusive, equivocal,

etc… resultsetc… results– minimal recommendationsminimal recommendations

• discuss methods for estimating sens and spec when a perfect discuss methods for estimating sens and spec when a perfect [reference] standard is not used[reference] standard is not used– cite new literaturecite new literature

• include more discussion on bias, including verification biasinclude more discussion on bias, including verification bias– some discussion added, add more referencessome discussion added, add more references

• add glossaryadd glossary

Page 10: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

10

STARD InitiativeSTARD Initiative

STASTAndards for ndards for RReporting of eporting of DDiagnostic Accuracy iagnostic Accuracy InitiativeInitiative

• effort by international working group to improve effort by international working group to improve quality of reporting of studies of diagnostic quality of reporting of studies of diagnostic accuracyaccuracy

• checklist of 25 items to include when reporting checklist of 25 items to include when reporting resultsresults

• provide definitions for terminologyprovide definitions for terminology• http://www.consort-statement.org/http://www.consort-statement.org/

stardstatement.htmstardstatement.htm

Page 11: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

11

STARD Definitions AdoptedSTARD Definitions Adopted

Purpose of a qualitative diagnostic test is to determine Purpose of a qualitative diagnostic test is to determine whether a whether a target conditiontarget condition is present or absent in a is present or absent in a subject from the subject from the intended use populationintended use population

• Target condition (condition of interest) – can refer Target condition (condition of interest) – can refer to a particular disease , a disease stage, health status, to a particular disease , a disease stage, health status, or any other identifiable condition within a patient, or any other identifiable condition within a patient, such as staging a disease already known to be such as staging a disease already known to be present, or a health condition that should prompt present, or a health condition that should prompt clinical action, such as the initiation, modification clinical action, such as the initiation, modification or termination of treatmentor termination of treatment

• Intended use population (target population) – those Intended use population (target population) – those subjects/patients for whom the test is intended to be subjects/patients for whom the test is intended to be usedused

Page 12: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

12

Reference Standard (STARD)Reference Standard (STARD)

Move away from notion of a fixed, theoretical “Truth”Move away from notion of a fixed, theoretical “Truth”

• ““considered to be the best available method for considered to be the best available method for establishing the presence or absence of the target establishing the presence or absence of the target condition…it can be a single test or method, or a condition…it can be a single test or method, or a combination of methods and techniques, including combination of methods and techniques, including clinical follow-up”clinical follow-up”

• dichotomous - divides the intended use population dichotomous - divides the intended use population into condition present or absentinto condition present or absent

• does not consider outcome of new test under does not consider outcome of new test under evaluationevaluation

Page 13: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

13

Reference Standard (FDA)Reference Standard (FDA)

What constitutes “best available method”/reference method? What constitutes “best available method”/reference method? • opinion and practice within the medical, laboratory and opinion and practice within the medical, laboratory and

regulatory communityregulatory community• several possible methods could be consideredseveral possible methods could be considered• maybe no consensus reference standard existsmaybe no consensus reference standard exists• maybe reference standard exists but for non-negligible % or maybe reference standard exists but for non-negligible % or

intended use population, the reference standard is known to intended use population, the reference standard is known to be in errorbe in error

FDA ADVICE: FDA ADVICE: • consult with FDA on choice of reference standard before consult with FDA on choice of reference standard before

beginning your studybeginning your study• performance measures must be interpreted in context: report performance measures must be interpreted in context: report

reference standard along with performance measuresreference standard along with performance measures

Page 14: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

14

Benchmarks for Assessing Benchmarks for Assessing Diagnostic PerformanceDiagnostic Performance

NEW: FDA recognizes 2 major categories of benchmarksNEW: FDA recognizes 2 major categories of benchmarks• reference standard (STARD)reference standard (STARD)• non-reference standard (a method or predicate other than a non-reference standard (a method or predicate other than a

reference standard; 510(k) regulations)reference standard; 510(k) regulations)

OLD: “perfect standard” and “imperfect standard”, “gold OLD: “perfect standard” and “imperfect standard”, “gold standard” – concepts and terms deleted standard” – concepts and terms deleted

Choice of comparative method determines which performance Choice of comparative method determines which performance measures can be reportedmeasures can be reported

Page 15: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

15

Comparison with BenchmarkComparison with Benchmark

• If a reference standard is available: use itIf a reference standard is available: use it• If a reference standard is available, but impractical: If a reference standard is available, but impractical:

use it to the extent possibleuse it to the extent possible• If a reference standard is not available or If a reference standard is not available or

unacceptable for your situation: consider unacceptable for your situation: consider constructing oneconstructing one

• If a reference standard is not available and cannot If a reference standard is not available and cannot be constructed, use a non-reference standard and be constructed, use a non-reference standard and report agreementreport agreement

Page 16: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

16

Naming Performance Measures: Naming Performance Measures: Depends on BenchmarksDepends on Benchmarks

Terminology is important – help ensure correct interpretationTerminology is important – help ensure correct interpretation

Reference standard (STARD)Reference standard (STARD)• a lot of literature on studies of diagnostic accuracy (Pepe a lot of literature on studies of diagnostic accuracy (Pepe

2003, Zhou et al. 2002)2003, Zhou et al. 2002)• report sensitivity, specificity (and corresponding CIs), report sensitivity, specificity (and corresponding CIs),

predictive values of positive and negative results predictive values of positive and negative results

Non-reference standard (due to 510(k) regulations)Non-reference standard (due to 510(k) regulations)• report report positive percent agreementpositive percent agreement and and negative percent negative percent

agreementagreement• NEW: include corresponding CIs (consider score CIs)NEW: include corresponding CIs (consider score CIs)• interpret with care – many pitfalls!interpret with care – many pitfalls!

Page 17: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

17

AgreementAgreement Study PopulationStudy Population

Non-Reference StandardNon-Reference Standard ++

NewNew Test+Test+ a a bbTestTest TestTest c c dd

Positive percent agreement (new/non ref. std.) =Positive percent agreement (new/non ref. std.) =100%100%××a/(a+c) a/(a+c) Negative percent agreement (new/non ref. std.)=Negative percent agreement (new/non ref. std.)=100%100%××d/(b+d)d/(b+d)

[[overall percent agreement=overall percent agreement=100%100%×(a+×(a+d)/(a+b+c+d)]d)/(a+b+c+d)]

““Perfect” new test: PPAPerfect” new test: PPA≠≠100% and NPA100% and NPA≠≠100%100%

Page 18: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

18

Pitfalls of AgreementPitfalls of Agreement

• agreement as defined here is not symmetric: agreement as defined here is not symmetric: calculation is different depending on which calculation is different depending on which marginal total you use for the denominatormarginal total you use for the denominator

• overall percent agreement is symmetric, but can be overall percent agreement is symmetric, but can be misleading (very different 2x2 data can give the misleading (very different 2x2 data can give the same overall agreementsame overall agreement

• agreement agreement ≠ “correct”≠ “correct”• overall agreement, PPA and NPA can change overall agreement, PPA and NPA can change

(possibly a lot) depending the prevalence (relative (possibly a lot) depending the prevalence (relative frequency of target condition in intended use frequency of target condition in intended use population)population)

Page 19: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

19

Overall Agreement MisleadingOverall Agreement Misleading

Non-Ref Non-Ref Non-Ref Non-Ref Standard Standard Standard Standard + + + +

New + 40 1New + 40 1 New + 40 19New + 40 19Test Test 19 512 19 512 Test Test 1 1 512 512 totaltotal 59 513 59 513 totaltotal 41 41 531 531

overall agreement = 96.5% ((40+512)/572))overall agreement = 96.5% ((40+512)/572))

PPA = 67.8% (40/59)PPA = 67.8% (40/59) PPA = 97.6% (40/41)PPA = 97.6% (40/41)

NPA = 99.8% (512/513)NPA = 99.8% (512/513) NPA = 96.4% (512/531)NPA = 96.4% (512/531)

Page 20: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

20

Agreement Agreement ≠≠ Correct Correct Original data:Original data: Non-Reference Standard Non-Reference Standard

++ NewNew + + 4040 5 5

TestTest 44 171171

Stratify data above by Reference Standard outcomeStratify data above by Reference Standard outcome

Reference Std + Reference Std + Reference Std Reference Std Non-Ref Std Non-Ref Std Non-Ref StdNon-Ref Std

+ + + +

New + 39 5New + 39 5 New + New + 11 0 0

Test Test 1 1 66 Test Test 3 3 165 165

tests agree and are wrong for 6+1 = 7 subjectstests agree and are wrong for 6+1 = 7 subjects

Page 21: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

21

BiasBias

Unknown and non-quantified uncertaintyUnknown and non-quantified uncertainty

• Often existence, size (magnitude), and Often existence, size (magnitude), and direction of bias cannot be determineddirection of bias cannot be determined

• Increasing overall number of subjects Increasing overall number of subjects reduces statistical uncertainty (confidence reduces statistical uncertainty (confidence interval widths) but may do nothing to reduce interval widths) but may do nothing to reduce biasbias

Page 22: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

22

Some Types of BiasSome Types of Bias

• error in reference standarderror in reference standard• use test under evaluation to establish diagnosisuse test under evaluation to establish diagnosis• spectrum biasspectrum bias – do not choose the “right” subjects” – do not choose the “right” subjects”• verification biasverification bias – only a non-representative subset – only a non-representative subset

of subjects evaluated by reference standard, no of subjects evaluated by reference standard, no statistical adjustments made to estimatesstatistical adjustments made to estimates

• many other types of biasmany other types of bias

See Begg (1987), Pepe (2003), Zhou et al. (2002)See Begg (1987), Pepe (2003), Zhou et al. (2002)

Page 23: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

23

Estimating Sens and Spec Without a Estimating Sens and Spec Without a Reference Standard Reference Standard

• Model-based approaches: latent class models and Model-based approaches: latent class models and Bayesian models. See Pepe (2003), and Zhou et al. Bayesian models. See Pepe (2003), and Zhou et al. (2002)(2002)

• Albert and Dodd (2004)Albert and Dodd (2004)– incorrect model leads to biased sens and spec estimates incorrect model leads to biased sens and spec estimates – different models can fit data equally well, yet produce different models can fit data equally well, yet produce

very different estimates of sens and specvery different estimates of sens and spec• FDA concerns & recommendations:FDA concerns & recommendations:

– difficult to verify that model and assumptions are correctdifficult to verify that model and assumptions are correct– try a range of models and assumptions and report range try a range of models and assumptions and report range

of resultsof results

Page 24: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

24

Reference Standard OutcomesReference Standard Outcomeson a Subseton a Subset

• Albert and Dodd (2006, under review)Albert and Dodd (2006, under review)– use info from verified and non-verified subjectsuse info from verified and non-verified subjects– choosing between competing models is easierchoosing between competing models is easier– explore subset choice (random, test dependent)explore subset choice (random, test dependent)

• Albert (2006, under review) Albert (2006, under review) – estimation via imputationestimation via imputation– study design implications (Albert, 2006)study design implications (Albert, 2006)

• Kondratovich (2003; 2002-Mar-8 FDA Kondratovich (2003; 2002-Mar-8 FDA Microbiology Devices Panel Meeting)Microbiology Devices Panel Meeting)– estimation via imputationestimation via imputation

Page 25: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

25

Practices to AvoidPractices to Avoid

• using terms “sensitivity” and “specificity” if using terms “sensitivity” and “specificity” if reference standard is not usedreference standard is not used

• discarding equivocal results in data discarding equivocal results in data presentations and calculationspresentations and calculations

• using data altered or updated by discrepant using data altered or updated by discrepant resolutionresolution

• using the new test as part of the comparative using the new test as part of the comparative benchmarkbenchmark

Page 26: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

26

External validity External validity

A study has high A study has high external validityexternal validity if the study if the study results are sufficiently reflective of the “real results are sufficiently reflective of the “real world” performance of the device in the world” performance of the device in the intended use populationintended use population

Page 27: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

27

External validity External validity

FDA recommendsFDA recommends• include appropriate subjects and/or specimensinclude appropriate subjects and/or specimens• use final version of the device according to the final use final version of the device according to the final

instructions for useinstructions for use• use several of these devices in your studyuse several of these devices in your study• include multiple users with relevant training and include multiple users with relevant training and

range of expertiserange of expertise• cover a range of expected use and operating cover a range of expected use and operating

conditions conditions

Page 28: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

28

Reporting RecommendationsReporting Recommendations

• CRITICAL - need sufficient detail to be able CRITICAL - need sufficient detail to be able to assess potential bias and external validityto assess potential bias and external validity

• just as (more?) important as computing CI’s just as (more?) important as computing CI’s correctlycorrectly

• see guidance for specific recommendationssee guidance for specific recommendations

Page 29: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

29

ReferencesReferences

Albert, P. S. (2006). Imputation approaches for estimating diagnostic Albert, P. S. (2006). Imputation approaches for estimating diagnostic accuracy for multiple tests from partially verified designs. Technical accuracy for multiple tests from partially verified designs. Technical Report 042, Biometric Research Branch, Division of Cancer Treatment Report 042, Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute (and Diagnosis, National Cancer Institute (http://linus.nci.nih.gov/~brb/TechReport.htmhttp://linus.nci.nih.gov/~brb/TechReport.htm). ).

Albert, P.S., & Dodd, L.E. (2004). A cautionary note on the robustness of Albert, P.S., & Dodd, L.E. (2004). A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold latent class models for estimating diagnostic error without a gold standard. standard. BiometricsBiometrics, , 60, 60, 427–435.427–435.

Albert, P. S. and Dodd, L. E. (2006). On estimating diagnostic accuracy Albert, P. S. and Dodd, L. E. (2006). On estimating diagnostic accuracy with multiple raters and partial gold standard evaluation. Technical with multiple raters and partial gold standard evaluation. Technical Report 041, Biometric Research Branch, Division of Cancer Treatment Report 041, Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute (and Diagnosis, National Cancer Institute (http://linus.nci.nih.gov/~brb/TechReport.htmhttp://linus.nci.nih.gov/~brb/TechReport.htm). ).

Begg, C.G. Biases in the assessment of diagnostic tests. Begg, C.G. Biases in the assessment of diagnostic tests. Statistics in Statistics in MedicineMedicine 1987;6:411-423. 1987;6:411-423.

Bossuyt, P.M., Reitsma, J.B., Bruns, D.E., Gatsonis, C.A., Glasziou, P.P., Bossuyt, P.M., Reitsma, J.B., Bruns, D.E., Gatsonis, C.A., Glasziou, P.P., Irwig, L.M., Lijmer, J.G., Moher, D., Rennie, D., & deVet, H.C.W. Irwig, L.M., Lijmer, J.G., Moher, D., Rennie, D., & deVet, H.C.W. (2003). Towards complete and accurate reporting of studies of diagnostic (2003). Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. accuracy: The STARD initiative. Clinical Chemistry, 49(1), Clinical Chemistry, 49(1), 1–6. (Also 1–6. (Also appears in appears in Annals of Internal MedicineAnnals of Internal Medicine (2003) (2003) 138(1), 138(1), W1–12 and in W1–12 and in British Medical JournalBritish Medical Journal (2003) (2003) 329(7379), 329(7379), 41–44)41–44)

Page 30: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

30

ReferencesReferences (continued) (continued)

Bossuyt, P.M., Reitsma, J.B., Bruns, D.E., Gatsonis, C.A., Glasziou, Bossuyt, P.M., Reitsma, J.B., Bruns, D.E., Gatsonis, C.A., Glasziou, P.P., Irwig, L.M., Moher, D., Rennie, D., deVet, H.C.W., & P.P., Irwig, L.M., Moher, D., Rennie, D., deVet, H.C.W., & Lijmer, J.G. (2003). The STARD statement for reporting studies of Lijmer, J.G. (2003). The STARD statement for reporting studies of diagnostic accuracy: Explanation and elaboration. diagnostic accuracy: Explanation and elaboration. Clinical Clinical ChemistryChemistry, , 49(1)49(1), 7–18. (Also appears in , 7–18. (Also appears in Annals of Internal Annals of Internal MedicineMedicine (2003) (2003) 138(1), 138(1), W1–12 and in W1–12 and in British Medical JournalBritish Medical Journal (2003) (2003) 329(7379)329(7379), 41–44), 41–44)

Lang, Thomas A. and Secic, Michelle. Lang, Thomas A. and Secic, Michelle. How to Report Statistics in How to Report Statistics in MedicineMedicine. Philadelphia: American College of Physicians, 1997.. Philadelphia: American College of Physicians, 1997.

Kondratovich, Marina (2003). Verification bias in the evaluation of Kondratovich, Marina (2003). Verification bias in the evaluation of diagnostic devices. diagnostic devices. Proceedings of the 2003 Joint Statistical Proceedings of the 2003 Joint Statistical Meetings, Biopharmaceutical SectionMeetings, Biopharmaceutical Section, San Francisco, CA., San Francisco, CA.

Pepe, M. S. (2003). Pepe, M. S. (2003). The statistical evaluation of medical tests for The statistical evaluation of medical tests for classification and predictionclassification and prediction. New York: Oxford University Press.. New York: Oxford University Press.

Zhou, X. H., Obuchowski, N. A., & McClish, D. K. (2002). Zhou, X. H., Obuchowski, N. A., & McClish, D. K. (2002). Statistical methods in diagnostic medicineStatistical methods in diagnostic medicine. New York: John Wiley . New York: John Wiley & Sons.& Sons.

Page 31: 1 Revising FDAs Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests FDA/Industry Statistics Workshop September 28-29, 2006

31