Confidential: For Review Only - BMJ · 2016-06-22 · Confidential: For Review Only Evaluation of Criteria for Obtaining Second Opinions to Improve Breast Histopathology Interpretation

Confidential: For Review O

nly

Evaluation of Criteria for Obtaining Second Opinions to

Improve Breast Histopathology Interpretation

Journal: BMJ

Manuscript ID BMJ.2015.030777.R1

Article Type: Research

BMJ Journal: BMJ

Date Submitted by the Author: 25-Mar-2016

Complete List of Authors: Elmore, Joann; University of Washington School of Medicine, Medicine Tosteson, Anna ; Geisel School of Medicine at Dartmouth, Medicine and Dartmouth Institute for Health Policy and Clinical Practice Pepe, Margaret; Fred Hutchinson Cancer Research Center, Program in Biostatistics and Biomathematics

Longton, Gary; Fred Hutchinson Cancer Research Center, Program in Biostatistics and Biomathematics Nelson, Heidi; Providence Health and Services Oregon, Providence Cancer Center; Oregon Health & Science University, Medical Informatics and Clinical Epidemiology and Medicine Geller, Berta; University of Vermont, Family Medicine and Radiology Departments Carney, Patricia; Oregon Health & Science University, Family Medicine Onega, Tracy; Geisel School of Medicine at Dartmouth, Community & Family Medicine Allison, Kimberly; Stanford University School of Medicine, Pathology Jackson, Sara; University of Washington , Medicine

Weaver, Donald; University of Vermont, Pathology and UVM Cancer Center

Keywords: breast pathology, second opinion, breast cancer, diagnostic variability, ductal carcinoma in situ, accuracy

https://mc.manuscriptcentral.com/bmj

BMJ


nly

Copyright 2015 American Medical Association. All rights reserved.

Diagnostic Concordance Among Pathologists InterpretingBreast Biopsy SpecimensJoann G. Elmore, MD, MPH; Gary M. Longton, MS; Patricia A. Carney, PhD; Berta M. Geller, EdD; Tracy Onega, PhD; Anna N. A. Tosteson, ScD;Heidi D. Nelson, MD, MPH; Margaret S. Pepe, PhD; Kimberly H. Allison, MD; Stuart J. Schnitt, MD; Frances P. O’Malley, MB; Donald L. Weaver, MD

IMPORTANCE A breast pathology diagnosis provides the basis for clinical treatment andmanagement decisions; however, its accuracy is inadequately understood.

OBJECTIVES To quantify the magnitude of diagnostic disagreement among pathologistscompared with a consensus panel reference diagnosis and to evaluate associated patient andpathologist characteristics.

DESIGN, SETTING, AND PARTICIPANTS Study of pathologists who interpret breast biopsies inclinical practices in 8 US states.

EXPOSURES Participants independently interpreted slides between November 2011 and May2014 from test sets of 60 breast biopsies (240 total cases, 1 slide per case), including 23 casesof invasive breast cancer, 73 ductal carcinoma in situ (DCIS), 72 with atypical hyperplasia(atypia), and 72 benign cases without atypia. Participants were blinded to the interpretationsof other study pathologists and consensus panel members. Among the 3 consensus panelmembers, unanimous agreement of their independent diagnoses was 75%, and concordancewith the consensus-derived reference diagnoses was 90.3%.

MAIN OUTCOMES AND MEASURES The proportions of diagnoses overinterpreted andunderinterpreted relative to the consensus-derived reference diagnoses were assessed.

RESULTS Sixty-five percent of invited, responding pathologists were eligible and consentedto participate. Of these, 91% (N = 115) completed the study, providing 6900 individual casediagnoses. Compared with the consensus-derived reference diagnosis, the overallconcordance rate of diagnostic interpretations of participating pathologists was 75.3% (95%CI, 73.4%-77.0%; 5194 of 6900 interpretations).

Consensus ReferenceDiagnosis

Pathologist Interpretation vs Consensus-Derived Reference Diagnosis, % (95% CI)No. ofInterpretations

Overall ConcordanceRate

OverinterpretationRate

UnderinterpretationRate

Benign without atypia 2070 87 (85-89) 13 (11-15)

Atypia 2070 48 (44-52) 17 (15-21) 35 (31-39)

DCIS 2097 84 (82-86) 3 (2-4) 13 (12-15)

Invasive carcinoma 663 96 (94-97) 4 (3-6)

Disagreement with the reference diagnosis was statistically significantly higher among biopsiesfrom women with higher (n = 122) vs lower (n = 118) breast density on prior mammograms(overall concordance rate, 73% [95% CI, 71%-75%] for higher vs 77% [95% CI, 75%-80%] forlower, P < .001), and among pathologists who interpreted lower weekly case volumes (P < .001)or worked in smaller practices (P = .034) or nonacademic settings (P = .007).

CONCLUSIONS AND RELEVANCE In this study of pathologists, in which diagnosticinterpretation was based on a single breast biopsy slide, overall agreement between theindividual pathologists’ interpretations and the expert consensus–derived referencediagnoses was 75.3%, with the highest level of concordance for invasive carcinoma and lowerlevels of concordance for DCIS and atypia. Further research is needed to understand therelationship of these findings with patient management.

JAMA. 2015;313(11):1122-1132. doi:10.1001/jama.2015.1405

Editorial page 1109

JAMA Report Video andAuthor Video Interview atjama.com

Supplemental content atjama.com

Author Affiliations: Authoraffiliations are listed at the end of thisarticle.

Corresponding Author: Joann G.Elmore, MD, MPH, Department ofMedicine, University of Washington,325 Ninth Ave, PO Box 359780,Seattle, WA 98104 ([email protected]).

Research

Original Investigation

1122 (Reprinted) jama.com


Downloaded From: http://jama.jamanetwork.com/ by a University of Washington Libraries User on 08/24/2015

Page 1 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


A pproximately 1.6 million women in the United Stateshave breast biopsies each year.1,2 The accuracy of pa-thologists’ diagnoses is an important and inad-

equately studied area. Although nearly one-quarter of biop-sies demonstrate invasive breast cancer,3 the majority arecategorized by pathologists according to a diagnostic spec-trum ranging from benign to preinvasive disease. Breast le-sions with atypia or ductal carcinoma in situ (DCIS) are asso-ciated with significantly higher risks of subsequent invasivecarcinoma, and women with these findings may require ad-ditional surveillance, prevention, or treatment to reduce theirrisks.4 The incidence of atypical ductal hyperplasia (atypia) andDCIS breast lesions has increased over the past 3 decades as aresult of widespread mammography screening.5,6 Misclassi-fication of breast lesions may contribute to either overtreat-ment or undertreatment of lesions identified during breastscreening.

The pathological diagnosis of a breast biopsy is usuallyconsidered the gold standard for patient management andresearch outcomes. However, a continuum of histologic fea-tures exists from benign to atypical to malignant on whichdiagnostic boundaries are imposed. Although criteria forthese diagnostic categories are established,7,8 whether theyare uniformly applied is unclear. Nonetheless, patients andtheir clinicians need a specific diagnostic classification ofbiopsy specimens to understand whether increased risk forbreast cancer exists and how best to manage identifiedlesions. Although studies from the 1990s demonstratedchallenges encountered by pathologists in agreeing on thediagnoses of atypia and DCIS,9-12 the extent to which thesechallenges persist is unclear. These issues are particularlyimportant in the 21st century because millions of breastbiopsies are performed annually.

For these reasons, we investigated the magnitude of over-interpretation and underinterpretation of breast biopsiesamong a national sample of practicing US pathologists in theBreast Pathology (B-Path) study. We also evaluated whetherpatient and pathologist characteristics were associated witha higher prevalence of inaccurate interpretations.

MethodsHuman Research Participants ProtectionThe institutional review boards at Dartmouth College, FredHutchinson Cancer Research Center, Providence Health andServices Oregon, University of Vermont, and University ofWashington approved all study activities. Informed consentwas obtained electronically from pathologists. Informed con-sent was not required of the women whose biopsy specimenswere included.

Test Set DevelopmentStudy methods and test set development have beendescribed.13-15 Briefly, 240 breast biopsy specimens (exci-sional or core needle) were randomly identified from a cohortof 19 498 cases obtained from pathology registries in NewHampshire and Vermont that are affiliated with the Breast

Cancer Surveillance Consortium.16 Random, stratified sam-pling was used to select cases based on the original patholo-gists’ diagnoses. Data on women’s age, breast density, andbiopsy type were available for each case. One or 2 new slidesfrom candidate cases were prepared in a single laboratory forconsistency. A single slide for each case best representing thereference diagnosis in the opinion of the panel members wasselected during the consensus review meetings.13

We oversampled cases with atypia and DCIS to gain sta-tistical precision in estimates of interpretive concordance forthese diagnoses. We also oversampled cases from women aged40 to 49 years and women with mammographically densebreast tissue because age and breast density are important riskfactors for both benign breast disease and breast cancer.17 Wehypothesized that discordance would be higher for these bi-opsy cases and that discordance would be higher when pa-thologists reported cases as “borderline” between 2 diagnos-tic categories.

A panel of 3 experienced pathologists, internationallyrecognized for research and continuing medical education ondiagnostic breast pathology, independently reviewed all 240cases and recorded their rating of case difficulty and diagno-ses using a Breast Pathology Assessment Tool and Hierarchyfor Diagnosis form, which was designed and rigorously testedfor this study (eFigure 1 in the Supplement).15 Panel memberswere blinded to previous interpretations of each specimenand to each other’s interpretations. Cases without unani-mous independent agreement were resolved with consensusdiscussion. Four full-day in-person meetings were held fol-lowing the panel members’ independent reviews to establisha consensus reference diagnosis for each case using a modi-fied Delphi approach,18 to create case teaching points, and todiscuss study design.

The 14 assessment terms were grouped into 4 diagnosticcategories (eTable 1 in the Supplement). The categories andcorresponding target distribution for the final sample of 240cases were benign without atypia (30%, including 10% non-proliferative and 20% proliferative without atypia), atypia(30%), DCIS (30%), and invasive carcinoma (10%). The non-proliferative and proliferative without atypia cases weremerged into 1 category (benign without atypia) because clini-cal management usually does not differ between the 2 cat-egories. When pathologists noted multiple diagnoses on acase, the most severe diagnostic category was assigned.

The 3 reference pathologists agreed unanimously on thediagnosis for 75% (180 of 240) of the cases after the initialindependent evaluation. Compared with the final consensus-derived reference diagnoses, overall concordance of the ini-tial independent diagnoses of the expert panel members was90.3% (650 of 720 interpretations; Figure 1). Concordanceand rates of overinterpretation and underinterpretation ofinitial diagnoses by the panel members compared withconsensus-derived reference diagnoses are presented inTable 1.

The 240 cases were randomly assigned to 1 of 4 test setseach including 60 cases with randomization stratified on thewoman’s age, breast density, reference diagnosis, and the ex-perts’ difficulty rating of the case.

Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research

jama.com (Reprinted) JAMA March 17, 2015 Volume 313, Number 11 1123



Page 2 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


Pathologist Identification, Recruitment,and Baseline CharacteristicsWe used publicly available information from 8 US states(Alaska, Maine, Minnesota, New Hampshire, New Mexico,Oregon, Vermont, and Washington) to invite pathologists toparticipate in this study (Figure 2). Pathologists interpretingbreast specimens for at least 1 year with plans to continue forat least 1 additional year were eligible. Residents and fellowswere ineligible.

Selected pathologists were sent an email invitation and,if needed, contacted with 2 follow-up emails, mailed invita-tions, and telephone follow-up. Participants completed aweb-based questionnaire that assessed their demographicand clinical practice characteristics, and attitudes aboutbreast pathology interpretation (eFigure 2 in the Supple-ment). The questionnaire was developed and pilot testedusing cognitive interviewing techniques.19 To compare clini-cal and demographic characteristics between participantsand nonparticipants, information was obtained on theentire population of invited pathologists from Direct Medi-cal Data.20

Test Set ImplementationParticipants interpreted the same slides as the referencepanel members. Participants were randomized with stratifi-

cation on clinical expertise to ensure equal distributionamong the 4 test sets. Clinical expertise was defined asbreast pathology fellowship completion, self-assessed per-ception that peers considered them a breast pathologyexpert, or both. Participants independently reviewed the60-case test set in random order. No standardized diagnos-tic definitions were provided. Participants were asked tointerpret the cases as they would in their own clinical prac-tice and complete the diagnostic assessment form online foreach case (eFigure 1 in the Supplement).

Participants were provided 1 hematoxylin and eosin–stained slide per case and told the woman’s age and type ofbiopsy. They were not limited by interpretation time. As com-pensation for their effort, pathologists were offered free cat-egory 1 continuing medical education (CME) credits for the slidereviews and an educational program that compared their in-terpretations with both the consensus-derived reference di-agnosis and the other participants’ diagnoses. At the comple-tion of the CME, participants were asked questions regardinghow the test cases compared with cases they typically see intheir practice.

Statistical AnalysisPrimary outcome measures included rates of overinterpre-tation, underinterpretation, and overall concordance. Over-

Table 1. Rates of Overinterpretation, Underinterpretation, and Concordance for the Reference Pathologists’Independent Preconsensus Interpretations vs the Consensus-Derived Reference Diagnosisa

ConsensusReferenceDiagnosis Total, No.

Rate, % (Range)b

Rate of Overinterpretation or Underinterpretationvs Consensus Diagnosis

Overall ConcordanceRate vs ConsensusDiagnosis

Overinterpretation Underinterpretation ConcordanceBenign withoutatypia

72 9 (3-13) 91 (87-97)

Atypia 72 12 (7-17) 8 (1-15) 80 (75-87)

DCIS 73 1 (0-1) 2 (0-4) 97 (95-100)

Invasivecarcinoma

23 3 (0-4) 97 (96-100)

Abbreviation: DCIS, ductal carcinomain situ.a Three reference pathologists, 240

breast biopsy cases.b Range values shown are the

minimum and maximum ofpathologist level rates for the 3consensus panel referencepathologists.

Figure 1. Comparison of the 3 Reference Panel Members’ Independent Preconsensus Diagnoses vs the Consensus-Derived Reference Diagnosis for240 Breast Biopsy Casesa

Reference Panel Members’ Individual Diagnoses (Preconsensus)

Cons

ensu

s Ref

eren

ceDi

agno

sis

Benignwithout atypia Atypia DCIS

Invasivecarcinoma Total

Benign without atypia 197 15 3 1b 216

Atypia 18 173 25 0 216

DCIS 2 2 213 2c 219

Invasive carcinoma 0 0 2d 67 69

Total 217 190 243 70 720

DCIS indicates ductal carcinoma in situ.a Concordance noted for 650 of 720 diagnoses or 90.3%.b The differential diagnosis was radial scar vs focal invasion (with a

consensus-derived reference diagnosis of radial scar).

c The differential diagnosis was focal microinvasion vs DCIS (with aconsensus-derived reference diagnosis of DCIS).

d The differential diagnosis was DCIS vs focal microinvasion (with aconsensus-derived reference diagnosis of microinvasion).

Research Original Investigation Diagnostic Concordance in Interpreting Breast Biopsies

1124 JAMA March 17, 2015 Volume 313, Number 11 (Reprinted) jama.com



Page 3 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


interpretation was defined as cases classified by the par-ticipants at a higher diagnostic category relative to theconsensus-derived reference diagnosis; underinterpretationwas defined as cases classified lower than the consensus-derived reference diagnosis; concordant cases werethose in which the diagnostic category of participants andreference panel were in agreement. Confidence intervalsaccounted for both within- and between-participantvariability by employing variance estimates of theform {var(ratep) + [avg(ratep) × (1 − avg(ratep))] / nc} / np,for which avg(ratep) is the average rate among pathologists,var(ratep) is the sample variance of rates among patholo-gists, nc is the number of cases interpreted by eachpathologist, and np is the number of pathologists. We alsoinvestigated variability across participants and cases byexamining distributions of participant and case-specificrates.

We investigated the extent to which experience of theparticipant and specific patient characteristics (age, breastdensity, and biopsy type) were associated with concor-dance. Logistic regression models of participant misclassifi-cation that simultaneously incorporated several pathologistcharacteristics (academic affiliation, breast-specific case-load, clinical expertise, and practice size) were modeled,and coefficients were tested with a bootstrap technique thatresampled participant data.

Sensitivity analyses were performed to determine if theresults were altered by use of a different diagnostic mappingscheme or by use of an alternate reference standard diagno-

sis instead of the expert-derived standard. First, we reana-lyzed the data using an alternative diagnostic mappingstrategy shown in eTable 1 in the Supplement. Second, weidentified cases for which the 3 reference panel members’independent assessments did not unanimously agree andfor which the consensus-derived reference diagnosis wasdifferent from the most frequent diagnosis recorded by theparticipants (17 of 240 cases). We reanalyzed the data bysubstituting the most frequent participant diagnosis as thereference diagnosis for the 17 cases, or by excluding the 17cases. Testing was 2-sided using a P value of less than .05 forsignificance. Stata statistical software (StataCorp), version13, was used.

ResultsTest Set CasesNearly half of the 240 cases were from women aged 40 to 49years (49%); the remainder were from women aged 50 to 59years (28%), 60 to 69 years (12%), and 70 years or older (11%).Breast density categories assessed on previous mammogra-phy included almost entirely fat (5.4%), scattered fibroglan-dular densities (43.8%), heterogeneously dense (40.4%), andextremely dense (10.4%) categories. Cases were from bothcore needle (57.5%) and excisional (42.5%) biopsies. Amongthe final sample of 240 cases, 72 (30%) were benign withoutatypia, 72 (30%) were atypia, 73 (30%) were DCIS, and 23(10%) were invasive carcinoma.

Figure 2. Pathologist Recruitment and Randomization into Test Sets

156 Excluded (no response)

137 Excluded (not interested,eligibility unknown)

252 Pathologists randomized

146 Excluded (ineligible) a

1740 Interpretationsincluded in primaryanalysis




29 Pathologists completedtest set A

27 Pathologists completedtest set B

30 Pathologists completedtest set C

29 Pathologists completedtest set D

31 Randomized to testset A (60 cases)

31 Randomized to testset B (60 cases)

32 Randomized to testset C (60 cases)

32 Randomized to testset D (60 cases)

126 Randomized to interpret traditionalglass slides (test sets A-D)

126 Randomized to alternate studyof whole slide digital imaging

389 Responded to invitation

545 Invited to participate

691 Pathologists assessed for eligibility

a Reasons for ineligibility not known.





Page 4 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


Pathologist Participation and CharacteristicsRates of pathologist recruitment, which began November 2011,are shown in Figure 2. Among 691 pathologists invited, 146 wereineligible (21.1%). We were unable to contact or verify eligibil-ity for 156 pathologists (22.6%), despite multiple email, postalmail, and telephone contact attempts. Among the remaining389 pathologists, 137 (35%) declined and 252 (65%) agreed toparticipate. There were no statistically significant differ-ences in mean age, sex, level of direct medical care, or pro-portion working in a population of 250 000 or more betweenthe participants and those who declined or those we were un-able to contact. Among the 252 participants, 126 participantswere randomized to the current study and 91% (115 of 126 par-ticipants) completed independent interpretation of all 60 casesand full participation in the study by May 2014. The remain-ing 126 participants were offered participation in a related fu-ture study.

Participants’ characteristics and clinical experience areshown in Table 2. Although most (93.1%) reported confi-dence interpreting breast pathology, 50.5% reported that breastpathology is challenging and 44.3% reported that breast pa-thology makes them more nervous than other types of pathol-ogy. The mean CME credits awarded for self-reported time

Table 2. Characteristics of Participating Pathologists (N=115)

Demographics No. (%)Age at survey, y

33-39 16 (13.9)

40-49 41 (35.7)

50-59 42 (36.5)

≥60 16 (13.9)

Sex

Men 69 (60.0)

Women 46 (40.0)

State of clinical practice

Alaska 4 (3.5)

Maine 11 (9.6)

Minnesota 19 (16.5)

New Hampshire 4 (3.5)

New Mexico 4 (3.5)

Oregon 15 (13.0)

Vermont 9 (7.8)

Washington 49 (42.6)

Clinical Practice and Breast Pathology Expertise

Laboratory group practice size

<10 pathologists 68 (59.1)

≥10 pathologists 47 (40.9)

Fellowship training in breast pathology

No 109 (94.8)

Yes 6 (5.2)

Affiliated with an academic medical center

No 87 (75.7)

Yes, adjunct/affiliated clinical faculty 17 (14.8)

Yes, primary appointment 11 (9.6)

Considered an expert in breast pathology bycolleagues

No 90 (78.3)

Yes 25 (21.7)

Years interpreting breast pathology cases(not including residency/fellowship training)

0-4 22 (19.1)

5-9 23 (20.0)

10-19 34 (29.6)

≥20 36 (31.3)

Percentage of breast specimen interpretationin caseload

0-9 59 (51.3)

10-24 45 (39.1)

25-49 8 (7.0)

50-74 2 (1.7)

≥75 1 (0.9)

No. of breast cases interpreted per week

0-4 31 (27.0)

5-9 44 (38.3)

10-19 31 (27.0)

20-29 4 (3.5)

30-39 3 (2.6)

40-49 1 (0.9)

≥50 1 (0.9)

(continued)

Table 2. Characteristics of Participating Pathologists (N=115) (continued)

Demographics No. (%)Impressions About Breast Pathology

Confidence in assessments of breast cases

1 (Very confident) 14 (12.2)

2 66 (57.4)

3 27 (23.5)

4 8 (7.0)

5 0

6 (Not confident at all) 0

Challenge of interpreting breast cases

1 (Very easy) 1 (0.9)

2 13 (11.3)

3 43 (37.4)

4 44 (38.3)

5 14 (12.2)

6 (Very challenging) 0

More nervous interpreting breast pathologythan other types of pathology

1 (Strongly disagree) 13 (11.3)

2 35 (30.4)

3 16 (13.9)

4 28 (24.3)

5 20 (17.4)

6 (Strongly agree) 3 (2.6)

Enjoys interpreting breast pathology

1 (Strongly disagree) 0

2 9 (7.8)

3 13 (11.3)

4 27 (23.5)

5 46 (40.0)

6 (Strongly agree) 20 (17.4)





Page 5 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


spent on this activity was 16 (95% CI, 15-17); 43 participants wereawarded the maximum 20 hours.

Pathologists’ Diagnoses Compared With Consensus-DerivedReference DiagnosesThe 115 participants each interpreted 60 cases, providing 6900total individual interpretations for comparison with the con-sensus-derived reference diagnoses (Figure 3). Participantsagreed with the consensus-derived reference diagnosis for75.3% of the interpretations (95% CI, 73.4%-77.0%). Partici-pants (n = 94) who completed the CME activity reported thatthe test cases were similar to the entire spectrum of breast pa-thology seen in their own practice (23% reported that they al-ways saw cases like the study test cases, 51% often saw caseslike these, 22% sometimes saw cases like these, no partici-pants marked never, and 3% did not respond to this ques-tion).

In general, overinterpretation and underinterpretation ofbreast biopsy cases was not limited to a few cases or a few prac-ticing pathologists but was widely distributed among patholo-gists (N = 115) and cases (N = 240) (eFigure 3A and 3B in theSupplement, respectively). The overall concordance rate forthe invasive breast cancer cases was high, at 96% (95% CI, 94%-97%; Table 3), although 1 of the invasive test cases containedpredominately DCIS with a focus of microinvasion. This fo-cus was initially missed by 2 reference panelists, but was con-firmed to be invasive during a consensus meeting.

The participants agreed with the consensus-derived ref-erence diagnosis on less than half of the atypia cases, with aconcordance rate of 48% (95% CI, 44%-52%; Figure 3, Figure 4,Figure 5; and eFigure 4 in the Supplement). Although over-interpretation of DCIS as invasive carcinoma occurred in only3% (95% CI, 2%-4%), overinterpretation of atypia was notedin 17% (95% CI, 15%-21%) and overinterpretation of benignwithout atypia was noted in 13% (95% CI, 11%-15%). Under-interpretation of invasive breast cancer was noted in 4% (95%CI, 3%-6%), whereas underinterpretation of DCIS was notedin 13% (95% CI, 12%-15%) and underinterpretation of atypia wasnoted in 35% (95% CI, 31%-39%).

Diagnostic agreement did not change substantially whenwe used an alternate diagnostic mapping schema or an alter-native participant-based method of defining the reference di-agnosis (eTable 1 in the Supplement).

Patient and Pathologist Characteristics Associated WithOverinterpretation and UnderinterpretationThe association of breast density with overall pathologists’concordance (as well as both overinterpretation and under-interpretation rates) was statistically significant, as shownin Table 3 when comparing mammographic density groupedinto 2 categories (low density vs high density). The overallconcordance estimates also decreased consistently withincreasing breast density across all 4 Breast Imaging-Reporting and Data System (BI-RADS) density categories:BI-RADS A, 81% (95% CI, 75%-86%); BI-RADS B, 77% (95%CI, 75%-79%); BI-RADS C, 74% (95% CI, 72%-76%); andBI-RADS D, 70% (95% CI, 64%-74%); P < .001, trend test.Overinterpretation rates were also significantly higher forbreast biopsies from women in their 40s (vs ≥50 years),although underinterpretation rates were lower for womenin their 40s (vs ≥50 years) (Table 3). The magnitude of theoverall density association did not change when covariatesfor patient age and diagnosis (eg, benign, atypia, DCIS, andinvasive) were included in a multivariable model.

Pathologists from outside of academic settings, those whointerpret lower weekly volumes of breast cases and those fromsmall-sized practices were statistically significantly less likelyto agree with the consensus-derived reference diagnosis. Eachof these pathologist variables remained statistically signifi-cant in a multivariable logistic model that accounted for thesimultaneous contribution of all 3 (eTable 3 in the Supple-ment). Although the differences noted for pathologist char-acteristics and patient age and breast density are statisticallysignificant, the absolute effects are small.

Discordance was higher when the pathologists indicateda case was difficult, borderline, they desired a second opin-ion, or when they reported low confidence in their assess-ment (Table 3 and eFigure 5 in the Supplement).

DiscussionIn this study of US pathologists in which diagnostic interpre-tation was based on a single breast biopsy slide for each case,we found an overall diagnostic concordance rate of 75.3%, witha high level of agreement between the pathologists’ and theconsensus-derived reference diagnosis for invasive breast can-

Figure 3. Comparison of 115 Participating Pathologists’ Interpretations vs the Consensus-Derived ReferenceDiagnosis for 6900 Total Case Interpretationsa

Participating Pathologists’ Interpretation

Cons

ensu

s Ref

eren

ceDi

agno

sisb

Benignwithout atypia Atypia DCIS

Invasivecarcinoma Total

Benign without atypia 1803 200 46 21 2070

Atypia 719 990 353 8 2070

DCIS 133 146 1764 54 2097

Invasive carcinoma 3 0 23 637 663

Total 2658 1336 2186 720 6900

DCIS indicates ductal carcinomain situ.a Concordance noted in 5194 of

6900 case interpretations or75.3%.

b Reference diagnosis was obtainedfrom consensus of 3 experiencedbreast pathologists.





Page 6 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


cer, and a substantially lower level of agreement for DCIS andatypia. Disagreement with the consensus-derived reference di-agnosis was statistically significantly more frequent when

breast biopsies were interpreted by pathologists with lowerweekly case volume, from nonacademic practices, or smallerpractices; and from women with dense breast tissue on mam-

Table 3. Patient, Pathologist, and Case Characteristics and Rates of Overinterpretation, Underinterpretation, and Concordance for the ParticipatingPathologists’ Interpretations vs the Consensus-Derived Reference Diagnosis

CharacteristicsNo. ofCases

No. ofInterpreta-tions

% (95%CI)Rate of Overinterpretation or Underinterpretationvs Reference Diagnosis

Overall Concordance Ratevs Reference Diagnosis

Overinterpretation P Value Underinterpretation P Value Concordance P ValueTest Case Patient Characteristics (N = 240 Test Cases)

Consensus ReferenceDiagnosisa

Benign without atypia 72 2070 13 (11-15)

<.001 <.001

87 (85-89)

<.001Atypia 72 2070 17 (15-21) 35 (31-39) 48 (44-52)

DCIS 73 2097 3 (2-4) 13 (12-15) 84 (82-86)

Invasive Breast Cancer 23 663 4 (3-6) 96 (94-97)

Age at time of biopsy, y

40-49 118 3391 11 (9-13).009

14 (12-16)<.001

76 (73-78).45

≥50 122 3509 9 (8-11) 16 (14-18) 75 (73-77)

Breast density

Low 118 3391 8 (7-10)<.001

14 (12-16).03

77 (75-80)<.001b

High 122 3509 11 (10-13) 16 (14-18) 73 (71-75)

Pathologist Characteristics (N = 115 Participants)

Academic affiliation

None 87 5220 11 (9-12)

.06

15 (14-17)

.19

74 (72-76)

.007cAdjunct affiliation 17 1020 8 (5-12) 14 (10-19) 78 (74-82)

Primary academic 11 660 7 (5-11) 12-(8-16) 81 (76-85)

Estimated No. of breast casesinterpreted/week

<5 31 1860 11 (8-14)

.17

17 (15-21)

.006

72 (68-75)

.001d5-9 44 2640 10 (8-13) 15 (12-18) 75 (72-78)

10-19 31 1860 9 (6-11) 13 (11-16) 78 (75-81)

≥20 9 540 9 (5-15) 12 (7-18) 80 (70-87)

Practice sizee

1-9 pathologists 68 4080 10 (8-12).81

16 (14-19).029

74 (71-76).034

≥10 pathologists 47 2820 9 (8-12) 13 (11-15) 78 (75-80)

Expertise in breastpathologyf

Nonexpert 88 5280 10 (9-12).41

16 (14-17).14

74 (72-76).055

Expert 27 1620 9 (7-12) 12 (9-16) 79 (75-82)

Case Characteristics (N = 6900 Interpretations)

Difficulty rating

Low difficulty (1-3) 4829 6 (5-7)<.001

13 (11-15)<.001

81 (79-83)<.001

High difficulty (4-6) 2071 19 (17-22) 19 (16-22) 62 (59-64)

Second opinion desired

No 4449 6 (5-7)<.001

12 (11-14)<.001

82 (80-84)<.001

Yes 2451 17 (15-20) 20 (17-23) 63 (60-66)

Confidence in assessment

Low (1-3) 5640 8 (7-9)<.001

13 (12-15)<.001

79 (77-80)<.001

High (4-6) 1260 19 (15-24) 21 (17-26) 60 (55-65)

a Values obtained using mapping scheme 1 described in eTable 1 in theSupplement.

b A test for trend based on a logistic regression model, which includes a single4-category ordinal variable for Breast Imaging-Reporting and Data Systemdensity yields a P value of less than .001.

c P value comparing none vs any academic affiliation (adjunct or primary).

d A test for trend based on a logistic regression model, which included a single4-category ordinal variable for number of cases interpreted per week.

e Fewer than 10 pathologists vs 10 or more other pathologists in the samelaboratory who also interpret breast tissue.

f Clinical expertise defined as self-reported completion of a fellowship in breastpathology or their peers considering them an expert in breast pathology.





Page 7 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


mography (vs low density), although the absolute differ-ences in rates according to these factors were generally small.

Most of the 1.6 million breast biopsies performed eachyear in the United States have benign diagnoses. Our resultssuggest that overinterpretation of benign without atypiabreast biopsies (13% among the 2070 interpretations for 72benign without atypia cases in this study) may be occurringmore often than underinterpretation of invasive breastcancer (4% among 663 interpretations for 23 cases in thisstudy). In addition, although the prevalence of atypia issmall (4%-10% of breast biopsies),3,21 the large number of

breast biopsies each year translates into approximately64 000 to 160 000 women diagnosed with atypia annually.Our results show that atypia is a diagnostic classificationwith considerable variation among practicing pathologists,with an overall concordance rate of 48% compared with theconsensus-derived reference diagnosis. Moreover, amongthe reference panel members, agreement of their indepen-dent preconsensus diagnosis of cases with the finalconsensus-derived reference diagnosis of atypia was 80%,suggesting that these cases may have the highest possibilityof disagreement among pathologists.

Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases

0 25 50 75 100

Interpretations, %

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

60

62

64

66

68

70

72

Case

Benign without atypia72 Cases 2070 Total interpretations

A

0 25 50 75 100

Interpretations, %

218

220

222

224

226

228

230

232

234

236

238

240

Case

Invasive carcinoma23 Cases 663 Total interpretations

D

0 25 50 75 100

Interpretations, %

147

145

149

151

153

155

157

159

161

163

165

167

169

171

173

175

177

179

181

183

185

187

189

191

193

195

197

199

201

203

205

207

209

211

213

215

217

Case

DCIS73 Cases 2097 Total interpretations

C

0 25 50 75 100

Interpretations, %

74

76

78

80

82

84

86

88

90

92

94

96

98

100

102

104

106

108

110

112

114

116

118

120

122

124

126

128

130

132

134

136

138

140

142

144

Case

Atypia72 Cases 2070 Total interpretations

B

Benign without atypia

Atypia

DCIS

Invasive carcinoma

Pathologist interpretation

DCIS indicates ductal carcinoma in situ.





Page 8 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


The variability of pathology interpretations is relevant toconcerns about overdiagnosis of atypia and DCIS.5,22,23

When a biopsy is overinterpreted (eg, interpreted as DCIS bya pathologist when the consensus-derived reference diagno-sis is atypia), a woman may undergo unnecessary surgery,radiation, or hormonal therapy.9,10,24-26 In addition, over-interpretation of atypia in a biopsy with otherwise benignfindings can result in unnecessary heightened surveillance,clinical intervention, costs, and anxiety.27-30 It has beenrecently suggested that women with a diagnosis of atypia ona breast biopsy consider annual screening magnetic reso-nance imaging examinations and chemoprevention.31 Givenour findings, clinicians and patients may want to obtain aformal second opinion for breast atypia prior to initiatingmore intensive surveillance or risk reduction using chemo-prevention or surgery.

The rates of overinterpretation and underinterpretation weobserved for assessments of atypia and DCIS highlight impor-tant issues in breast pathology. However, diagnostic variabil-ity is not confined to this specialty, as reports of observer vari-ability have been noted in other areas of clinical medicine.32,33

For example, extensive variability among radiologists has beennoted in the interpretation of mammograms.34 In addition, re-

sults of this study document disagreements even among ex-perienced and expert pathologists.

A unique aspect of our study is that we identified patientand pathologist characteristics associated with greater dis-cordance to explore possible approaches to reducing discor-dance. In this study, biopsies from women with dense breasttissue on mammography compared with biopsies fromwomen with less-dense breast tissue were more likely tohave discordant pathology diagnoses (concordance rate, 73%for dense vs 77% for less-dense). Mammographic density isprimarily attributable to increased fibrous tissue in thebreast, and it is unlikely that this would contribute to diag-nostic discordance. However, microenvironmental factors indense breast tissue may also be associated with epithelialhyperplasia that may increase diagnostic discordance.Recently there have been efforts to educate women about theassociation of breast density with screening mammographyaccuracy and efforts to identify better methods to screenwomen with dense breast tissue.35 Although our findingsrelated to breast density and the accuracy of pathologists’interpretations were statistically significant, the absolute dif-ferences were small and their clinical significance should befurther investigated.

Figure 5. Slide Example for Each Diagnostic Category

DCIS (case 163)a

A B

C D

30 Interpretations Benign without atypia Atypia

22

Invasive carcinoma (case 222)


00

Atypia (case 107)


913

Benign without atypia (case 62)


196

DCIS Invasive carcinoma

233


128


50


20

DCIS indicates ductal carcinoma insitu. Blue indicates concordantinterpretations. Each slide is ahematoxylin-eosin stain (originalmagnification ×100).a Sclerosing adenosis was present

elsewhere in this slide.





Page 9 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


Pathologists with higher clinical volumes of breastpathology and who work within larger group practices hadless discordance. Experience and informal learning obtainedwithin larger group practices may contribute to improvingand maintaining interpretive performance. We also notedthat, to some extent, pathologists could perceive when theirinterpretation may deviate from the reference diagnosis. Forexample, participants’ diagnoses were more likely to disagreewith the reference diagnosis when they indicated the diagno-sis was unclear, or when they were less confident in theirinterpretation. In clinical practice, these factors may prompta second consultative opinion, an option not allowed in ourstudy environment. Understanding how second opinionsmay improve diagnostic accuracy is an area requiring furtherinvestigation.

Although diagnostic disagreement among breastpathologists has been noted in the past, most previous stud-ies were published in the 1990s; had small numbers of testcases; employed cases that were not randomly selected; andincluded a smaller number of participants who were special-ists in breast pathology.9-11,24,36-39 In contrast, our study usedstandardized data on 240 randomly selected cases using astratified sampling scheme that oversampled cases of atypiaand DCIS to improve confidence in agreement estimates. Wealso enrolled 115 practicing pathologists from diverse geo-graphic locations and clinical settings in 8 US states, provid-ing 6900 individual case assessments. The high participationrate and commitment of the practicing pathologists partici-pating in our study, with most investing 15 hours to 17 hours,is likely related to a desire to improve their diagnostic skills ina challenging clinical area.

Our study findings should be interpreted considering sev-eral important limitations. First, it is unclear how the use oftest sets, weighted with more cases of difficult and problem-atic lesions, may have influenced interpretive performance.However, it is not feasible to add such a high number of testcases into a practicing pathologist’s daily routine in a blindedfashion. Second, we used only a single slide per case to en-

hance participation. In clinical practice, pathologists typi-cally review multiple slides per case and can request addi-tional levels or ancillary immunohistochemical stains prior toarriving at a final diagnosis. Third, although no perfect goldstandard exists for defining accuracy in pathology diagnosis,we used a carefully defined reference diagnosis based on a con-sensus of experienced breast pathologists. Among the 3 con-sensus panel members, unanimous agreement of their inde-pendent diagnoses was noted for 75% of cases. Moreover, thereis no evidence that the classifications of the consensus panelmembers are more accurate with respect to predicting clini-cal outcomes than the classifications of the participating pa-thologists. However, we noted little change in results after con-sidering alternative methods of defining the referencediagnosis. Fourth, diagnoses rendered in this study setting maynot reflect those rendered in actual clinical practice due tosubtle variations in the application of criteria or to differentemphasis placed on the influence of clinical management. Noattempt was made to standardize diagnostic criteria among par-ticipants through either written instructions or training slidesets. Fifth, no specific instructions were provided to partici-pants regarding whether their diagnoses should be made purelyon morphologic features, or whether biopsy type or clinicalmanagement should be considered. We have previously de-scribed some of the possible reasons for observer variabilityin the interpretation of research breast biopsies.15

ConclusionsIn this study of pathologists, in which diagnostic interpreta-tion was based on a single breast biopsy slide, overall agree-ment between the individual pathologists’ interpretations andthe expert consensus–derived reference diagnoses was 75.3%,with the highest level of concordance for invasive carcinomaand lower levels of concordance for DCIS and atypia. Furtherresearch is needed to understand the relationship of these find-ings with patient management.

ARTICLE INFORMATION

Author Affiliations: Department of Medicine,University of Washington School of Medicine,Seattle (Elmore); Program in Biostatistics andBiomathematics, Fred Hutchinson Cancer ResearchCenter, Seattle, Washington (Longton, Pepe);Department of Family Medicine, Oregon Health andScience University, Portland (Carney); Departmentof Family Medicine, University of Vermont,Vineyard Haven, Massachusetts (Geller);Department of Community and Family Medicine,The Dartmouth Institute for Health Policy andClinical Practice, Geisel School of Medicine atDartmouth, Norris Cotton Cancer Center, Lebanon,New Hampshire (Onega, Tosteson); Department ofMedicine, Geisel School of Medicine at Dartmouth,Lebanon, New Hampshire (Tosteson); ProvidenceCancer Center, Providence Health and ServicesOregon, Portland (Nelson); Department of MedicalInformatics and Clinical Epidemiology, OregonHealth and Science University, Portland (Nelson);Department of Clinical Epidemiology and Medicine,Oregon Health and Science University, Portland

(Nelson); Department of Pathology, StanfordUniversity School of Medicine, Stanford, California(Allison); Department of Pathology, Beth IsraelDeaconess Medical Center, Boston, Massachusetts(Schnitt); Harvard Medical School, Boston,Massachusetts (Schnitt); Department of LaboratoryMedicine and the Keenan Research Centre of theLi Ka Shing Knowledge Institute, Toronto, Ontario,Canada (O’Malley); St Michael’s Hospital and theUniversity of Toronto, Ontario, Canada (O’Malley);Department of Pathology and University ofVermont Cancer Center, University of Vermont,Burlington (Weaver).

Author Contributions: Drs Elmore and Pepe andMr Longton had full access to all of the data in thestudy and take responsibility for the integrity of thedata and the accuracy of the data analysis.Study concept and design: Elmore, Onega,Tosteson, Nelson, Pepe, Allison, Weaver.Acquisition, analysis, or interpretation of data: Allauthors.Drafting of the manuscript: Elmore, Carney, Geller,Nelson, Pepe, Schnitt, Weaver.

Critical revision of the manuscript for importantintellectual content: All authors.Statistical analysis: Longton, Onega, Pepe.Obtained funding: Elmore, Carney, Geller, Onega,Tosteson, Nelson, Weaver.Administrative, technical, or material support:Onega, Tosteson, O’Malley, Weaver.Study supervision: Elmore, Allison, Weaver.

Conflict of Interest Disclosures: All authors havecompleted and submitted the ICMJE Form forDisclosure of Potential Conflicts of Interest. DrElmore reports serving as a medical editor for thenonprofit Informed Medical Decisions Foundation.Dr Allison reports personal fees from Genentech.No other authors had potential conflicts of interestto report.

Funding/Support: This work was supported by theNational Cancer Institute (R01CA140560,R01CA172343, and K05CA104699) and by theNational Cancer Institute-funded Breast CancerSurveillance Consortium (U01CA70013 andHHSN261201100031C). The collection of cancer





Page 10 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly


and vital status data used in this study wassupported in part by several state public healthdepartments and cancer registries throughout theUnited States. For a full description of sources, visithttp://www.breastscreening.cancer.gov/work/acknowledgement.html. The American MedicalAssociation (AMA) is the source for the rawphysician data; statistics, tables, or tabulationswere prepared by the authors using AMA PhysicianMasterfile data. The AMA Physician Masterfile dataon pathologist age, sex, level of direct medical care,and proportion working in a population of 250 000or more were used in comparing characteristics ofparticipants and nonparticipants.

Role of the Funders/Sponsors: The fundingorganization had no role in the design and conductof the study; collection, management, analysis, andinterpretation of the data; preparation, review, orapproval of the manuscript; or decision to submitthe manuscript for publication.

Disclaimer: The content is solely the responsibilityof the authors and does not necessarily representthe views of the National Cancer Institute or theNational Institutes of Health.

Additional Contributions: We thank VentanaMedical Systems, a member of the Roche Group, foruse of iScan Coreo Au digital scanning equipment,and HD View SL for the source code used to buildour digital viewer.

REFERENCES

1. Silverstein M. Where’s the outrage? J Am Coll Surg.2009;208(1):78-79.

2. Silverstein MJ, Recht A, Lagios MD, et al. Specialreport: Consensus conference III: image-detectedbreast cancer: state-of-the-art diagnosis andtreatment [published correction appears in J AmColl Surg. 2009 Dec;209(6):802]. J Am Coll Surg.2009;209(4):504-520.

3. Weaver DL, Rosenberg RD, Barlow WE, et al.Pathologic findings from the Breast CancerSurveillance Consortium: population-basedoutcomes in women undergoing biopsy afterscreening mammography. Cancer. 2006;106(4):732-742.

4. Harris JR, Lippman ME, Morrow M, Osborne CK.Diseases of the Breast. 5th ed. Philadelphia, PA:Wolters Kluwer Health; 2014.

5. Bleyer A, Welch HG. Effect of 3 decades ofscreening mammography on breast-cancerincidence. N Engl J Med. 2012;367(21):1998-2005.

6. Hall FM. Identification, biopsy, and treatment ofpoorly understood premalignant, in situ, andindolent low-grade cancers: are we becomingvictims of our own success? Radiology. 2010;254(3):655-659.

7. O'Malley FP, Pinder SE, Mulligan AM. Breastpathology. Philadelphia, PA: Elsevier/Saunders; 2011.

8. Schnitt SJ, Collins LC. Biopsy interpretation of thebreast. Philadelphia, PA: Wolters KluwerHealth/Lippincott Williams & Wilkins; 2009.

9. Rosai J. Borderline epithelial lesions of thebreast. Am J Surg Pathol. 1991;15(3):209-221.

10. Schnitt SJ, Connolly JL, Tavassoli FA, et al.Interobserver reproducibility in the diagnosis ofductal proliferative breast lesions usingstandardized criteria. Am J Surg Pathol. 1992;16(12):1133-1143.

11. Wells WA, Carney PA, Eliassen MS, Tosteson AN,Greenberg ER. Statewide study of diagnosticagreement in breast pathology. J Natl Cancer Inst.1998;90(2):142-145.

12. Della Mea V, Puglisi F, Bonzanini M, et al.Fine-needle aspiration cytology of the breast:a preliminary report on telepathology throughInternet multimedia electronic mail. Mod Pathol.1997;10(6):636-641.

13. Oster NV, Carney PA, Allison KH, et al.Development of a diagnostic test set to assessagreement in breast pathology: practicalapplication of the Guidelines for ReportingReliability and Agreement Studies (GRRAS). BMCWomens Health. 2013;13(1):3.

14. Feng S, Weaver DL, Carney PA, et al. Aframework for evaluating diagnostic discordance inpathology discovered during research studies. ArchPathol Lab Med. 2014;138(7):955-961.

15. Allison KH, Reisch LM, Carney PA, et al.Understanding diagnostic variability in breastpathology: lessons learned from an expertconsensus review panel. Histopathology. 2014;65(2):240-251.

16. National Cancer Institute. Breast cancersurveillance consortium. http://breastscreening.cancer.gov/. Accessed June 1, 2011.

17. Ginsburg OM, Martin LJ, Boyd NF.Mammographic density, lobular involution, and riskof breast cancer. Br J Cancer. 2008;99(9):1369-1374.

18. Helmer O. The systematic use of expertjudgment in operations research. http://www.rand.org/pubs/papers/P2795.html. Accessed March 27,2012.

19. Willis GB. Cognitive Interviewing: A Tool ForImproving Questionnaire Design. Thousand Oaks,CA:Sage Publications; 2005.

20. American Medical Association. Physicians.http://www.dmddata.com/data_lists_physicians.asp. Accessed January 27, 2015.

21. Rubin E, Visscher DW, Alexander RW, Urist MM,Maddox WA. Proliferative disease and atypia inbiopsies performed for nonpalpable lesionsdetected mammographically. Cancer. 1988;61(10):2077-2082.

22. Zahl PH, Jørgensen KJ, Gøtzsche PC.Overestimated lead times in cancer screening hasled to substantial underestimation of overdiagnosis.Br J Cancer. 2013;109(7):2014-2019.

23. Gøtzsche PC, Jørgensen KJ. Screening forbreast cancer with mammography. CochraneDatabase Syst Rev. 2013;6(6):CD001877.

24. Collins LC, Connolly JL, Page DL, et al.Diagnostic agreement in the evaluation of

image-guided breast core needle biopsies: resultsfrom a randomized clinical trial. Am J Surg Pathol.2004;28(1):126-131.

25. Haas JS, Cook EF, Puopolo AL, Burstin HR,Brennan TA. Differences in the quality of care forwomen with an abnormal mammogram or breastcomplaint. J Gen Intern Med. 2000;15(5):321-328.

26. Saul S. Prone to error: earliest steps to findcancer. New York Times. July 19, 2010. http://www.nytimes.com/2010/07/20/health/20cancer.html?pagewanted=all&_r=0. Accessed February 16, 2015.

27. Rakovitch E, Mihai A, Pignol JP, et al. Is expertbreast pathology assessment necessary for themanagement of ductal carcinoma in situ? BreastCancer Res Treat. 2004;87(3):265-272.

28. Dupont WD, Page DL. Risk factors for breastcancer in women with proliferative breast disease.N Engl J Med. 1985;312(3):146-151.

29. Dupont WD, Parl FF, Hartmann WH, et al.Breast cancer risk associated with proliferativebreast disease and atypical hyperplasia. Cancer.1993;71(4):1258-1265.

30. London SJ, Connolly JL, Schnitt SJ, Colditz GA.A prospective study of benign breast disease andthe risk of breast cancer. JAMA. 1992;267(7):941-944.

31. Hartmann LC, Degnim AC, Santen RJ, DupontWD, Ghosh K. Atypical hyperplasia of thebreast—risk assessment and management options.N Engl J Med. 2015;372(1):78-89.

32. Feinstein AR. A bibliography of publications onobserver variability. J Chronic Dis. 1985;38(8):619-632.

33. Elmore JG, Feinstein AR. A bibliography ofpublications on observer variability (finalinstallment). J Clin Epidemiol. 1992;45(6):567-580.

34. Elmore JG, Wells CK, Lee CH, Howard DH,Feinstein AR. Variability in radiologists’interpretations of mammograms. N Engl J Med.1994;331(22):1493-1499.

35. Lee CI, Bassett LW, Lehman CD. Breast densitylegislation and opportunities for patient-centeredoutcomes research. Radiology. 2012;264(3):632-636.

36. Carney PA, Eliassen MS, Wells WA, Swartz WG.Can we improve breast pathology reportingpractices? a community-based breast pathologyquality improvement program in New Hampshire.J Community Health. 1998;23(2):85-98.

37. Trocchi P, Ursin G, Kuss O, et al. Mammographicdensity and inter-observer variability of pathologicevaluation of core biopsies among women withmammographic abnormalities. BMC Cancer. 2012;12:554.

38. Shaw EC, Hanby AM, Wheeler K, et al. Observeragreement comparing the use of virtual slides withglass slides in the pathology review component ofthe POSH breast cancer cohort study. J Clin Pathol.2012;65(5):403-408.

39. Stang A, Trocchi P, Ruschke K, et al. Factorsinfluencing the agreement on histopathologicalassessments of breast biopsies among pathologists.Histopathology. 2011;59(5):939-949.





Page 11 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

1

Evaluation of Criteria for Obtaining Second Opinions to Improve Breast

Histopathology Interpretation: A Simulation Study

Joann G. Elmore, MD,1 Anna N. A. Tosteson, ScD,2,3 Margaret S. Pepe, PhD,4 Gary Longton, MS,5 Heidi D. Nelson, MD,6 Berta Geller, EdD,7 Patricia A. Carney, PhD,8 Tracy Onega, PhD,9 Kimberly H. Allison, MD,10 Sara L. Jackson, MD,11 Donald L. Weaver, MD12 1 Professor, Department of Medicine, University of Washington School of Medicine

325 Ninth Ave, Seattle, WA 98104, Box 359780 2 Professor, The Dartmouth Institute for Health Policy and Clinical Practice, Geisel

School of Medicine at Dartmouth, Norris Cotton Cancer Center One Medical Center Drive, HB 7505, Lebanon, NH, 03756

3 Professor, Department of Medicine, Geisel School of Medicine at Dartmouth One Medical Center Drive, HB 7505, Lebanon, NH, 03756

4 Professor, Department of Biostatistics, University of Washington School of Public Health; Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, M2-B500, P.O. Box 19024, Seattle, WA 98109

5 Senior Statistical Analyst, Program in Biostatistics and Biomathematics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, M2-B500, P.O. Box 19024, Seattle, WA 98109

6 Professor, Providence Cancer Center, Providence Health and Services Oregon, and

Departments of Medical Informatics and Clinical Epidemiology and Medicine, Oregon Health & Science University, 3181 SW Sam Jackson Park Road, Mail Code L475, Portland, OR 97239

7 Research Professor, Department of Family Medicine, University of Vermont One South Prospect Street, UHC, Burlington, VT 05401

8 Professor, Department of Family Medicine, Oregon Health & Science University 3181 SW Sam Jackson Park Rd, Mail Code FM, Portland, OR, 97239

9 Associate Professor, Community & Family Medicine, Geisel School of Medicine at Dartmouth, One Medical Center Drive, HB 7937, Lebanon, NH, 03756

10 Associate Professor, Department of Pathology, Stanford University School of Medicine, 300 Pasteur Drive, Lane 235, Stanford, CA 94305

11 Clinical Assistant Professor, Department of Medicine, University of Washington School of Medicine, 325 Ninth Ave, Seattle, WA 98104, Box 359780

12 Professor, Department of Pathology and UVM Cancer Center, University of Vermont, Given Courtyard Building, 89 Beaumont Ave, Burlington, VT, 05405

Page 12 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

2

Word Count: 3,602 Structured Abstract: 299 Print Abstract: 306 Correspondence to: Joann G. Elmore, MD, MPH University of Washington Mailbox 359780 325 Ninth Avenue, Seattle, WA 98104 Telephone: (206) 744-3632 Fax: (206) 744-9917 E-mail: [email protected]

Page 13 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

3

STRUCTURED ABSTRACT

Objective: Evaluate the potential impact of second opinions on improving breast

histopathology diagnostic interpretation accuracy.

Design: Simulation study of different strategies and criteria for acquiring independent

second opinions.

Setting and Participants: Interpretations from 115 pathologists of 240 breast biopsy

specimens, one slide per case, were compared with expert consensus derived

reference diagnoses.

Main Outcome Measures: Misclassification rates for individual pathologists and for 12

simulated second opinion strategies. Simulations compared independent

interpretations for pairs of pathologists with resolution of disagreements by an

independent third pathologist. Twelve strategies were evaluated in which second

opinion acquisition depended on initial diagnoses, assessment of case difficulty or

borderline characteristics, pathologists’ clinical volumes, or whether a second opinion

was required by policy or desired by pathologists. The 240 cases included benign

without atypia (10% nonproliferative, 20% proliferative without atypia), atypia (30%),

ductal carcinoma in situ (DCIS, 30%), and invasive cancer (10%). Overall

misclassification rates and agreement statistics depend on the composition of the test

set which included a higher prevalence of difficult cases than in typical practice.

Results: Misclassification rates decreased (p<0.001) with all criteria for second opinion

acquisition except when only obtaining second opinions for invasive cancer cases. The

misclassification rate decreased 6.6% when all cases received second opinions

(p<0.001). Obtaining both first and second opinions from high-volume pathologists

Page 14 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

4

resulted in the lowest misclassification rate in this test set (14.3%, 95%CI: 10.9% to

18.0%). Obtaining second opinions only for cases with initial interpretations of atypia,

DCIS, or invasive cancer, decreased the over-interpretation of benign cases without

atypia from 12.9% to 6.0%. Atypia cases had the highest misclassification rate after

single interpretation (52.2%), remaining >34% in all second opinion scenarios.

Conclusion: Criteria-based second opinions may significantly improve diagnostic

agreement for pathologists’ breast biopsy interpretations; however, diagnostic variability

will not be completely eliminated, especially for atypia.

Page 15 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

5

PRINT ABSTRACT

Study Question: Evaluating the impact of second opinions on improving accuracy of

breast histopathology interpretation.

Methods: Interpretations from 115 pathologists, one slide per case, were used to

establish baseline accuracy of single observers. These were compared to accuracy

based on independent interpretations by simulating pairs of pathologists, with resolution

by an independent third pathologist when needed. Twelve strategies were evaluated

with acquisition of second opinions dependent on initial diagnoses, assessment of case

difficulty or borderline characteristics, pathologists’ clinical volumes, or whether a

second opinion was required by policy or desired by pathologists. The diagnoses (initial

and post-second opinion) were compared to expert consensus-derived reference

diagnoses to calculate misclassification rates, and between pathologist agreement

statistics calculated. The 240 cases included benign without atypia (10%

nonproliferative, 20% proliferative without atypia), atypia (30%), ductal carcinoma in situ

(DCIS, 30%), and invasive cancer (10%).

Study Answer and Limitations: Misclassification rates decreased (p<0.001) with all

second opinion strategies except when second opinions were only obtained for initial

invasive cancer diagnoses. The misclassification rate decreased 6.6% when all cases

received second opinions (p<0.001). The lowest misclassification rate in this test set

resulted when high-volume pathologists provided both first and second opinions (14.3%,

95%CI: 10.9% to 18.0%). Obtaining second opinions only for cases with initial

interpretations of atypia, DCIS, or invasive cancer, decreased the over-interpretation of

benign cases without atypia from 12.9% to 6.0%. These statistics depend on test set

Page 16 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

6

composition, which included a higher prevalence of difficult cases than in typical

practice. Atypia cases had the highest misclassification rate after single interpretation

(52.2%), remaining >34% for all second opinion scenarios.

What This Study Adds: Second opinions may significantly improve accuracy of breast

histopathology interpretations but will not completely eliminate diagnostic variability,

especially for atypia.

Funding, Competing Interests, Data Sharing: Funded by the National Cancer

Institute. Please contact the authors for data sharing.

Page 17 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

7

What is already known on this subject

Previous studies have documented extensive variability in the interpretation of breast

biopsy tissue by pathologists,1 with resulting concern about patient harm.2

While significant changes in diagnosis have been reported in >10% of breast biopsy

cases upon secondary review,3-8 no studies have systematically compared different

criteria for obtaining second opinions as an approach to reducing errors.

What this study adds

Second opinions based on defined criteria may significantly improve diagnostic

accuracy of breast histopathology interpretation.

Accuracy improves regardless of pathologists’ confidence in their diagnosis or their

experience.

Second opinions improve but do not completely eliminate diagnostic variability in the

challenging cases of atypia and ductal carcinoma in situ (DCIS).

Page 18 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

8

INTRODUCTION

Attention to diagnostic errors in the medical literature and mass media has led many to

consider obtaining second opinions as a method to prevent errors and improve quality.9-

11 For example, obtaining second opinions, such as double reading of screening

mammograms, has been associated with improved cancer detection rates.12, 13

Obtaining a second opinion is a strategy commonly suggested to improve diagnostic

accuracy in breast pathology.1, 2, 14 Interpretation of breast pathology is notoriously

difficult, and rates of disagreement between pathologists are high, especially in cases of

atypia (e.g. atypical ductal hyperplasia; ADH) and ductal carcinoma in situ (DCIS).1, 15-17

A survey of U.S. laboratories noted that 6.6% of all histopathology cases were reviewed

before sign out, suggesting second opinions are frequently obtained in clinical practice,

especially in challenging areas such as breast pathology.18 Guidelines have also been

published for obtaining second opinions in pathology to prevent medical errors,19 and

approximately two-thirds of U.S. pathology laboratories have policies, with most

requiring a second review of new invasive cancer diagnoses.18 However, criteria for

when and how to obtain second opinions in breast pathology vary considerably.18, 20

In our previous study of 252 pathologists, 81% reported requesting second opinions in

the absence of institutional policy for at least some of their breast cases, and 96% felt

that second opinions improved their diagnostic accuracy.20 Other studies also suggest

possible improvements in patient outcomes. For example, in one study, second review

of 405 node negative breast cancer cases resulted in significant modifications in

treatments,4 potentially decreasing unnecessary interventions and subsequent costs.14

Despite this strong endorsement by practicing pathologists, and multiple studies noting

Page 19 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

9

that >10% of breast cases have significant changes in diagnoses after review,3-8 the

best guidelines for when to obtain second opinions in pathology are unknown. We know

of no study comparing the impact of different trigger criteria for obtaining a second

opinion on accuracy. Many potential strategies exist, ranging from review of every

biopsy case by multiple pathologists to obtaining second opinions only for specific case

categories (e.g., only cases interpreted initially as invasive breast cancer), or only from

different pathology situations (e.g., high-volume versus low-volume pathologists).

The purpose of this study is to compare the effect of different criteria for triggering

procurement of second opinions on the accuracy of breast pathology interpretation. Our

study was uniquely designed to assess improvements in accuracy in a controlled test

situation using data from 6,900 individual interpretations by 115 pathologists. We

evaluated twelve strategies with different criteria for obtaining second opinions and

compared how each approach may affect over- and under-interpretation rates relative to

reference diagnoses. Our study provides important insight into methods to improve

clinical practice.

METHODS

Test Set Cases and Consensus Reference Diagnoses

This study uses data from the Breast Pathology Study (B-Path), a national study of the

accuracy of pathologic interpretation of breast tissue.1, 21 The 240 biopsy cases were

divided into four test sets of 60 cases.1, 21 Breast biopsy specimens were selected from

two state registries (NH, VT) which are part of the National Cancer Institute-sponsored

Breast Cancer Surveillance Consortium.22 Case selection was stratified by age (49%

Page 20 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

10

age 40-49 years, 51% age ≥50 years), breast density (51% with heterogeneously or

extremely dense breast tissue based on mammography), and biopsy type (58% core

needle, 42% excisional). Three experienced breast pathologists interpreted each case

independently before arriving at a consensus reference diagnosis for each case using a

modified Delphi approach.23 Their diagnoses were categorized using the Breast

Pathology Assessment Tool and Hierarchy for Diagnosis (BPATH-Dx).1, 15 This tool

incorporated fourteen distinct diagnostic assessments into four main BPATH-Dx

categories: 1) benign without atypia (including non-proliferative and proliferative without

atypia); 2) atypia (e.g., atypical ductal hyperplasia); 3) DCIS; and 4) invasive carcinoma.

We oversampled cases of atypia and DCIS to improve the statistical precision of

accuracy estimates. Of the final 240 cases, 72 were benign without atypia (24 non-

proliferative and 48 proliferative without atypia), 72 were atypia, 73 were DCIS, and 23

were invasive breast cancer based on the reference consensus diagnoses.21 The cases

within each diagnostic category were randomly assigned to the four test sets using

stratification to achieve balance on patient age, breast density, biopsy type, and the

reference panelists’ difficulty rating.

Participating Pathologists

Pathologists were invited to participate from eight U.S. states (Alaska, Maine,

Minnesota, New Hampshire, New Mexico, Oregon, Vermont, and Washington). Details

of their identification and recruitment have been described elsewhere.1, 24 Pathologists

were eligible if they had interpreted breast biopsies in the past year, planned to continue

for the next year, and were not residents or fellows in training. A web-based survey

Page 21 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

11

queried participants about demographics, clinical practices, and interpretive

experience.20, 24

Pathologists were randomized to independently interpret one of the four test sets of 60

cases (each case was represented by a single glass slide) and they recorded their

interpretations using the online BPATH-Dx tool.1, 15 Participants also indicated whether

the case was borderline between two diagnoses, and whether they would obtain a

second opinion in their usual clinical practice because of laboratory policies, the second

opinion was personally desired, or both. Each case was assessed for perceived

diagnostic difficulty using a six-point Likert scale, with results summarized as a binary

variable (difficult cases rated as 4, 5, or 6).

Patient Involvement

No patients were involved in setting the research question or the outcome measures,

nor were they involved in developing plans for recruitment, design, or implementation of

the study. No patients were asked to advise on interpretation or writing up of results.

There are no plans to disseminate the results of the research to the relevant patient

community.

Protection of Human Research Subjects

The Institutional Review Boards of Dartmouth College (IRB approval #21926), Fred

Hutchinson Cancer Research Center (#6958), Providence Health & Services of Oregon

(#10-055A), University of Vermont (#M09-281), and University of Washington (#39631)

approved all study procedures. All participating pathologists signed an informed consent

Page 22 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

12

form. Informed consent was not required of the women whose biopsy specimens were

included.

Definitions of Single Interpretation, Interpretations with Second Opinions and

Criteria for Obtaining Second Opinions

Single (initial) interpretation results, based on the categorical interpretation by each

participating pathologist of each case, have been reported previously.1 Interpretations

that incorporated second opinions were defined by considering each possible pair of

pathologists interpreting the same case and, when disagreement occurred, included

resolution by using a third, independent interpretation. Resolution was defined by

assigning the case to the BPATH-Dx diagnosis category identified by two of the three

pathologists or, if all three disagreed, assigning the middle diagnosis (Figure 1).

We evaluated twelve criteria for obtaining a second opinion, beginning with the strategy

where all cases received second opinions. We then evaluated eight selective

strategies for obtaining a second opinion which were determined by criteria based on

the initial pathologist’s diagnosis (i.e., second opinions were obtained only for cases

initially diagnosed as atypia/DCIS/invasive, DCIS/invasive, or invasive only), determined

by the initial pathologist’s assessment of the case (i.e., only for cases marked borderline

or only for cases considered difficult), and determined by whether a second opinion

would be required by policy or desired by the pathologist (i.e., only for required; desired;

required or desired).

Finally, we assessed how the clinical volume of the interpreting pathologist affected

diagnoses in three strategies with designs shaped by previous findings from the B-Path

Page 23 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

13

study.1 These strategies included combinations of low- and high-volume pathologists

providing first, second, and third opinions, as needed, to resolve discordant diagnoses.

We defined low-to-average volume as <10 breast cases/week and high-volume as ≥10

cases/week.

Statistical Analyses

We assessed rates of over-interpretation, under-interpretation, and overall

misclassification compared to the expert consensus reference diagnosis. Over-

interpretation was defined as cases classified by participants at a hierarchically more

severe diagnostic BPATH-Dx category relative to the reference diagnosis; under-

interpretation was defined as cases classified lower than the reference diagnosis

category; and misclassification was defined as cases either over-interpreted or under-

interpreted compared to the reference diagnosis category.

To simulate interpretations that involved obtaining second opinions, we combined the

independent interpretations of study pathologists. For each case, we created an ordered

data record of interpretations for every three pathologists who interpreted the case and

used the majority or median interpretation as their final assessment (Figure 1). This is

analytically equivalent to using the assessment of the first two pathologists if they agree

and using the third pathologist for resolution if they disagree. The advantage of creating

data records in this manner, resulting in 5,145,480 triple reader data records, is that the

correct relative weighting is provided to interpretations of cases where the first two

pathologist interpretations agree versus where they do not agree, while allowing us to

include data for all potential third readers rather than picking one third reader at random

Page 24 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

14

from those available when a third reading was required. The 5,145,480 data records

resulted from 29 pathologists interpreting the 60 cases in test set A, 27 for test set B, 30

for test set C and 29 for test set D, yielding a total of

60×(29×28×27+27×26×25+30×29×28+29×28×27)=5,145,480 triple interpretations.

Figure 1 shows how the triple records were used in conjunction with different criteria for

procuring second opinions to arrive at final assessments. The final assessments were

compared with reference diagnoses to calculate rates of over-interpretation, under-

interpretation, and overall misclassification.

Confidence intervals for the over-interpretation, under-interpretation, and overall

misclassification rates used percentiles of the bootstrap distribution of each rate where

resampling of pathologists was performed 1,000 times. Second opinion interpretations

that included the same pathologist for second or third interpretations were discarded

from the bootstrapped estimates. P-values for the Wald test of a difference in rates

between the single pathologist and second opinion strategies were based on the

bootstrap standard error of the difference in rates. Kappa statistics and rates of

agreement between single interpretations were calculated from a simple cross

tabulation of all pairwise interpretations of the same cases. The computational burden of

analogous calculations for the 5,145,480 assessments involving second opinions was

not tenable so we instead paired the triple readings for each case with a random

permutation of the triple readings of the same case and after excluding pairs where the

same reader was included in both sets of triples, we calculated agreement statistics.

This procedure was replicated 1000 times and average agreement and kappa statistics

are reported.

Page 25 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

15

All analyses were conducted using Stata statistical software, version 13.

Role of Funding Source

This work was supported by the National Cancer Institute. The content is solely the

responsibility of the authors and does not necessarily represent the views of the

National Cancer Institute or the National Institutes of Health.

RESULTS

Thirty-five percent of pathologists interpreted ≥10 breast cases weekly and were defined

as high-volume participants; 24% were affiliated with academic medical centers; and

22% reported that their colleagues considered them breast pathology experts (Table 1).

High-volume breast pathologists were more likely to report that they spend greater

proportions of their clinical time interpreting breast cases and that their peers consider

them experts in breast pathology.

Among the entire 6,900 initial test case interpretations, the pathologists reported that

they desired second opinions for 35% (2,451/6,900). Figure 2 shows results by

pathologists’ diagnosis of the case. The highest rate of desired second opinions (66%)

was for cases interpreted as atypia. When second opinions were desired for specific

cases, participants noted that 71% (1,731/2,451) would not be required by lab policies

in their own clinical practices.

Rates of agreement with the reference diagnosis after single interpretations and under

different criteria for obtaining a second opinion based on characteristics of the initial

interpretation are shown in Table 2. For each strategy, the percentage of cases

Page 26 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

16

requiring a second pathologist and requiring a third pathologist for resolution of

differences between the first two pathologists is also shown in Table 2. The highest

misclassification rate within diagnostic categories after single interpretation was for

cases of atypia (52.2%), followed by DCIS (15.9%), benign without atypia (12.9%), and

invasive breast cancer (3.9%).1

The overall misclassification rate for a single interpretation (24.7%; 95% CI: 23.6 to

25.8) was used to compare performance of the different second opinion strategies.

Among the strategies described in Table 2, the lowest overall misclassification rate

resulted when second opinions were obtained for all cases. In this strategy, the rates

decreased from 9.9% to 6.0% (95% CI: 4.7 to 7.5) for over-interpretation, from 14.8% to

12.1% (95% CI: 10.0 to 14.3) for under-interpretation, and from 24.7% to 18.1% (95%

CI: 16.1 to 20.0) for overall misclassification. The percentage of cases requiring a third

opinion for resolution of the diagnosis ranged from 3.7% for invasive carcinoma to

55.9% for atypia. The fraction of assessments where all three readers disagreed was

very small: 5.1% overall; 2.0% for cases in the benign without atypia reference

category; 12.0% for cases with atypia; 3.1% for DCIS; and 0.1% for invasive carcinoma

cases. The between pathologist agreement rate for single interpretations of the same

case was 70.4% while the corresponding agreement rate for interpretations that

included second opinions was higher, 79.3%. Corresponding kappa statistics were

0.579 and 0.706, respectively.

Overall misclassification rates relative to the expert consensus reference diagnosis after

implementing all of the remaining strategies ranged from 19.2% to 23.9%. The only

strategy that did not have a statistically significant improvement was obtaining a second

Page 27 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

17

opinion exclusively for cases with initial interpretations of invasive breast cancer; the

overall misclassification rate was reduced from 24.7% on initial interpretation to 23.9%

(95% CI: 22.1% to 25.7%, p=0.25). In that scenario only 10.4% of interpretations

required a second opinion and very few (1.0-2.6%) were from reference diagnostic

categories other than invasive carcinoma.

As expected, the cases that were not classified as borderline, difficult, or needing a

second opinion had lower initial misclassification rates (Figure 3); however, the

misclassification rates for these cases were also reduced when a second opinion was

obtained albeit the improvement was more dramatic for cases that were classified as

borderline, difficult, or needing a second opinion.

The second opinion strategies in Table 3 are based on the initial pathologists’ weekly

breast pathology case volume. The overall misclassification rates after single

interpretations by pathologists with low weekly volume were 26.4% (95% CI: 25.1% to

27.8%) versus 21.5% (95% CI: 19.5% to 23.4%) for single interpretations by

pathologists with high weekly volume. These second opinion strategies all

demonstrated statistically significant reductions in misclassification rates (p<0.0001)

compared with single interpretations, with improvement also noted when the initial

pathologist had high weekly case volume. The lowest overall misclassification rate was

noted when both first and second opinions were obtained from high-volume pathologists

(14.3%, 95% CI: 10.9% to 18.0%). The greatest reduction in overall misclassification

rate was noted when the first pathologist was from the low-volume group and the

second and, if needed third, pathologist was from the high-volume group.

Page 28 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

18

DISCUSSION

Main Findings

This is the first study to examine the effects of twelve different strategies for obtaining

second opinions by pathologists for breast biopsies. The results support the common

belief among clinicians that second opinions should be sought ideally from those with

greater clinical experience and especially in those categories where the primary

reviewing pathologist is uncertain. All strategies demonstrated statistically significant

improvements in accuracy except when second opinions were only obtained for cases

with initial interpretations of invasive breast cancer. Improvements varied according to

diagnostic attributes of the cases and the pathologists’ clinical experience. Importantly,

none of the strategies completely eliminated diagnostic variability, especially for cases

of breast atypia, suggesting that approaches beyond obtaining a second opinion should

be investigated for these challenging cases.

The majority of pathology laboratories have policies requiring second opinions for cases

of invasive breast carcinoma, yet invasive cancer diagnoses already have high

diagnostic agreement among pathologists. The addition of a second opinion strategy

only for invasive breast cancer cases provided no statistically significant improvement;

however, this finding does not indicate there is no clinical value in assuring the highest

level of accuracy for invasive carcinoma considering the risks and benefits of treatment.

Larger improvements were observed when second opinion strategies included cases

with initial diagnoses of atypia and DCIS, two diagnostic categories less frequently

included in laboratory polices mandating second opinions. However, even after applying

Page 29 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

19

an array of strategies for obtaining second opinions, the misclassification rates for

atypia and DCIS remained high. In actual clinical practice, obtaining second opinions in

such diagnostically complex areas may, over time, promote intra-practice consensus by

highlighting diagnostic areas requiring education or expert consultation.

Importantly, a small, proportional reduction in the over-interpretation of breast biopsies

may have a large absolute effect at a population level. Obtaining second opinions for all

cases with an initial diagnosis of atypia, DCIS, or invasive breast cancer substantially

reduced over-interpretation of benign cases without atypia; we observed a reduction in

over-interpretation of these cases from 12.9% with single interpretation to 6.0% with

second opinion.

Strengths and Limitations of the Study

This study has potential limitations. The pathologists’ interpretations were independent

and only involved a single slide per case, yet in clinical practice, there may be many

slides per case and a second pathologist may be informed of the initial - interpretation. It

would be impossible and infeasible to design a study of second opinion strategies

where full clinical case material for 60 breast biopsies was inserted in a hidden fashion

into the day-to-day practice of more than 100 pathologists from very diverse clinical

practices. Knowledge of the initial pathologist’s interpretation may influence additional

opinions, and this should be studied. Interestingly, about half the pathologists reported

they typically blind the second reviewer to their initial diagnosis when seeking a second

opinion.20

Page 30 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

20

The study cases were weighted to include more atypia, DCIS and proliferative lesions,

such as usual hyperplasia, than typically observed in clinical practice, resulting in higher

overall misclassification rates than observed in clinical practice. While the overall

misclassification rate is useful for comparing different criteria for obtaining second

opinions, the results within individual diagnostic categories are more relevant to clinical

practice given the weighting of the test cases.25 In addition, future studies should

consider differentiating DCIS grade and microinvasion.

It has been suggested that the true “gold standard” in assessment of the accuracy of a

pathology diagnosis is the clinical course of the disease.26 However, the natural history

is altered by diagnostic excision, clinical treatment, and heightened surveillance

following breast biopsy; thus, we defined our reference standard as the consensus

diagnosis of three experienced breast pathologists, a standard acceptable to most

women undergoing biopsy and their clinicians. The expert consensus panel reference

standard was selected after comparison to a reference standard that included

participant majority opinion.1, 27 The consensus reference standard does not necessarily

represent biologic truth but includes a peer review of applied diagnostic criteria.

Strengths of this study include the large number of participating pathologists (N=115),

each interpreting 60 cases from the full range of diagnostic categories, providing a total

of 5,145,480 group level interpretations. We also assessed the impact of twelve

different criteria and standards for obtaining second opinions, including obtaining a

second opinion on all cases and for pre-defined subsets based on the initial

interpretation or on the pathologists level of experience. In typical clinical practice,

providers often identify challenging cases and then solicit second opinions from the

Page 31 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

21

most knowledgeable pathologist (i.e., local expert). We simulated this by having high-

volume pathologists provide second opinions in some strategies. For many study cases,

participants indicated a desire for a second opinion prior to finalizing their diagnosis;

thus, pathologists are likely already obtaining second opinions for cases they encounter

in clinical practice, highlighting the relevance of our data.

Comparison with Other Studies

We suspect that patient outcomes will be improved by second opinions in clinical

practice, but this was not evaluated. Previous studies have noted consistent rates of

discrepant diagnoses uncovered by second opinion within surgical pathology in

general,28 and within breast pathology specifically, with second reviews reported to

identify clinically significant discrepancies in >10% of breast biopsy cases.3-8

Clinical and Policy Implications

Providing second opinions for all breast biopsy specimens or requiring that the

interpreting pathologists must be experienced high volume clinicians may be unfeasible

given the estimated millions of breast biopsies each year.29, 30 Our analysis, therefore,

presents strategies that may be more realistic for clinical practice. Possible barriers to

the adoption of second opinion strategies in clinical practice include workload

constraints;31 uncertainty regarding the impact of second opinions on clinical outcomes,

lack of readily available colleagues with expertise in breast pathology, limited

reimbursement by third-party payers, and concerns about treatment delay. Conversely,

the availability of digital whole-slide imaging may speed second opinions in the future

via telepathology.

Page 32 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

22

While adopting a routine second opinion strategy for some or all breast biopsies may

seem daunting, the significant disagreement among practicing pathologists on breast

biopsy cases is concerning.1 The financial burden of this variability in pathology

diagnoses may be substantial.32 This includes unnecessary or incorrect treatment, lost

income, morbidity, and death.

Conclusion

In summary, breast biopsies are challenging to interpret1 and many pathologists are

seeking second opinions in clinical practice via informal routes.20 It might be time for

clinical support systems and payment structures to catch up with and better support

clinicians in their current practice. This study observed reductions in both over- and

under-interpretation of breast pathology when second opinion strategies were added

and we noted that pathologists desire second opinions in a substantial proportion of

breast biopsy cases. Improvement was observed regardless of whether a second

opinion was or was not desired by the initial pathologist and was most notable when the

initial interpretation was atypia or DCIS. The feasibility and cost of implementing specific

second opinion strategies in clinical practice need further consideration.

Page 33 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

23

Acknowledgements:

The collection of cancer and vital status data used in this study was supported in part by

several state public health departments and cancer registries throughout the U.S. For a

full description of the Breast Cancer Surveillance Consortium (BCSC) sources see:

http://www.breastscreening.cancer.gov/work/acknowledgement.html.

Contributions:

All authors contributed to the overall conception and design of the study. JE wrote the

first draft of this manuscript. GL extracted the data. GL and MP performed the statistical

analyses. All authors contributed to the interpretation of results and drafting of the

manuscript. All authors read and approved the final manuscript. JE is the guarantor.

Data Sharing:

Details of how to obtain additional data from the study (e.g., statistical code, datasets)

are available from the corresponding author at [email protected].

Declaration of Interests:

This work was supported by the National Cancer Institute (R01 CA140560, R01

CA172343 and K05 CA104699) and by the National Cancer Institute-funded Breast

Cancer Surveillance Consortium (HHSN261201100031C). The content is solely the

responsibility of the authors and does not necessarily represent the views of the

National Cancer Institute or the National Institutes of Health.

Page 34 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

24

All authors have completed the Unified Competing Interest form

at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding

author) and declare that (1) all authors have support from the National Cancer Institute

for the submitted work; (2) no authors have relationships with any company that might

have an interest in the submitted work in the previous 3 years; (3) their spouses,

partners, or children have no financial relationships that may be relevant to the

submitted work; and (4) no authors have non-financial interests that may be relevant to

the submitted work.

Page 35 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

25

Transparency Declaration

The lead author, Joann Elmore, affirms that this manuscript is an honest, accurate, and

transparent account of the study being reported; that no important aspects of the study

have been omitted; and that any discrepancies from the study as planned (and, if

relevant, registered) have been explained.

Copyright

The Corresponding Author has the right to grant on behalf of all authors and does grant

on behalf of all authors, a worldwide license to the Publishers and its licensees in

perpetuity, in all forms, formats and media (whether known now or created in the

future), to i) publish, reproduce, distribute, display and store the Contribution, ii)

translate the Contribution into other languages, create adaptations, reprints, include

within collections and create summaries, extracts and/or, abstracts of the Contribution,

iii) create any other derivative work(s) based on the Contribution, iv) to exploit all

subsidiary rights in the Contribution, v) the inclusion of electronic links from the

Contribution to third party material where-ever it may be located; and, vi) license any

third party to do any or all of the above.

Page 36 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

26

References

1. Elmore JG, Longton G, Carney PA, et al. Diagnostic Concordance Among Pathologists Interpreting

Breast Biopsy Specimens. J Am Med Assoc 2015;313(11):1122-32.

2. Davidson NE, Rimm DL. Expertise vs evidence in assessment of breast biopsies: An atypical science. J

Am Med Assoc 2015;313(11):1109-10.

3. Khazai L, Middleton LP, Goktepe N, et al. Breast pathology second review identifies clinically

significant discrepancies in over 10% of patients. J Surg Oncol 2015;111(2):192-7.

4. Kennecke HF, Speers CH, Ennis CA, et al. Impact of routine pathology review on treatment for node-

negative breast cancer. J Clin Oncol 2012;30(18):2227-31.

5. Newman EA, Guest AB, Helvie MA, et al. Changes in surgical management resulting from case review

at a breast cancer multidisciplinary tumor board. Cancer 2006;107(10):2346-51.

6. Marco V, Muntal T, García-Hernandez F, et al. Changes in breast cancer reports after pathology

second opinion. Breast Journal 2014;20(3):295-301.

7. Romanoff AM, Cohen A, Schmidt H, et al. Breast pathology review: does it make a difference? Ann

Surg Oncol 2014;21(11):3504-08.

8. Staradub VL, Messenger KA, Hao N, et al. Changes in breast cancer therapy because of pathology

second opinions. Ann Surg Oncol 2002;9(10):982-7.

9. Frable W. Surgical pathology - Second reviews, institutional reviews, audits, and correlations: What's

out there? Error or diagnostic variation? Arch Pathol Lab Med 2006;130:620-24.

10. National Academies of Sciences, Engineering, and Medicine. To Err is Human: Building a Safer Health

System. Washington, DC, 2000.

11. National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care. In:

National Academies of Sciences E, and Medicine, ed. Washington, DC, 2015.

12. Hofvind S, Geller BM, Rosenberg RD, et al. Screening-detected Breast Cancers: Discordant

Independent Double Reading in a Population-based Screening Program. Radiology

2009;253(3):652-60.

13. Dinnes J, Moss S, Melia J, et al. Effectiveness and cost-effectiveness of double reading of

mammograms in breast cancer screening: findings of a systematic review. The Breast

2001;10(6):455-63.

14. Bleiweiss IJ, Raptis G. Look again: the importance of second opinions in breast pathology. J Clin Oncol

2012;30(18):2175-6.

15. Allison KH, Reisch LM, Carney PA, et al. Understanding diagnostic variability in breast pathology:

lessons learned from an expert consensus review panel. Histopathology 2014;65(2):240-51.

16. Rosai J. Borderline epithelial lesions of the breast. Am J Surg Pathol 1991;15(3):209-21.

17. Schnitt SJ, Connolly JL, Tavassoli FA, et al. Interobserver Reproducibility in the Diagnosis of Ductal

Proliferative Breast-Lesions Using Standardized Criteria. Am J Surg Pathol 1992;16(12):1133-43.

18. Nakhleh R, Bekeris L, Souers R, et al. Surgical pathology case reviews before sign-out: a College of

American Pathologists Q-Probes study of 45 laboratories. Arch Pathol Lab Med 2010;134(5):740-

3.

19. Tomaszewski JE, Bear HD, Connally JA, et al. Consensus conference on second opinions in diagnostic

anatomic pathology. Who, What, and When. Am J Clin Pathol 2000;114(3):329-35.

20. Geller BM, Nelson HD, Carney PA, et al. Second opinion in breast pathology: policy, practice and

perception. J Clin Pathol 2014;67(11):955-60.

21. Oster NV, Carney PA, Allison KH, et al. Development of a diagnostic test set to assess agreement in

breast pathology: practical application of the Guidelines for Reporting Reliability and Agreement

Studies (GRRAS). BMC Women's Health 2013;13(1):3.

22. Breast Cancer Surveillance Consortium. Available at: http://breastscreening.cancer.gov: (Accessed

June 1, 2011).

Page 37 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

27

23. Helmer O. The systematic use of expert judgment in operations research. Santa Monica, CA: The

RAND Corporation, 1964.

24. Onega T, Weaver D, Geller B, et al. Digitized whole slides for breast pathology interpretation: current

practices and perceptions. J Digit Imaging 2014;27(5):642-8.

25. Elmore JG, Nelson HD, Pepe MS, et al. Variability in Pathologists' Interpretations of Individual Breast

Biopsy Slides: A Population Perspective. Ann Intern Med [Epub ahead of print 22 March 2016].

26. Manion E, Cohen MB, Weydert J. Mandatory second opinion in surgical pathology referral material:

clinical consequences of major disagreements. Am J Surg Pathol 2008;32(5):732-7.

27. Elmore JG, Pepe MS, Weaver DL. Discordant Interpretations of Breast Biopsy Specimens by

Pathologists--Reply. JAMA 2015;314(1):83-4.

28. Kronz JD, Westra WH, Epstein JI. Mandatory second opinion surgical pathology at a large referral

hospital. Cancer 1999;86(11):2426-35.

29. Silverstein M, Recht A, Lagios MD, et al. Special report: Consensus conference III. Image-detected

breast cancer: state-of-the-art diagnosis and treatment. J Am Coll Surg 2009;209(4):504-20.

30. Silverstein M. Where’s the outrage? J Am Coll Surg 2009;208(1):78-79.

31. Tsung JS. Institutional pathology consultation. Am J Surg Pathol 2004;28(3):399-402.

32. Middleton LP, Feeley TW, Albright HW, et al. Second-opinion pathologic review is a patient safety

mechanism that helps reduce error and decrease waste. J Oncol Pract 2014;10(4):275-80.

Page 38 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

28

Table 1. Characteristics of participating pathologists (N=115) by reported weekly volume of breast cases.

Reported Weekly Breast Caseload Volume

Characteristics

Total N (%)

Low-volume

(< 10 breast cases/week)

N (%)

High-volume

(≥ 10 breast cases/week)

N (%)

P-

value1

Total 115 75 40

Demographics

Age at survey (years) 33-39 16(14) 11(15) 5(13) 0.98

40-49 41(36) 27(36) 14(35)

50-59 42(37) 25(33) 17(43)

60+ 16(14) 12(16) 4(10)

Gender Female 46(40) 28(37) 18(45) 0.42

Male 69(60) 47(63) 22(55)

Breast Pathology

Experience

Fellowship training in

breast pathology No 109(95)

71(95)

38(95)

0.94

Yes 6(5) 4(5) 2(5)

Affiliation with

academic medical

center

No 87(76) 59(79) 28(70) 0.50

Yes,

adjunct/affiliated 17(15)

9(12)

8(20)

Yes, primary

appointment 11(10)

7(9)

4(10)

Breast pathology

experience (years) 0-4 22(19)

17(23)

5(13)

0.45

5-9 23(20) 14(19) 9(23)

10-19 34(30) 21(28) 13(33)

20+ 36(31) 23(31) 13(33)

Breast specimens as a

proportion of total

clinical case load (% of

total clinical work

interpreting breast)

0-9 59(51)

51(68)

8(20)

<0.001

10-24 45(39) 23(31) 22(55)

25-49 8(7)

1(1) 7(18)

≥50 3(3) 0(0) 3(8)

Do your colleagues

consider you an expert

in breast pathology?

No

90(78)

67(89)

23(58)

<0.001

Yes 25(22)

8(11)

17(43)

NOTE: Percentages may not sum to 100 due to rounding.

1P-values are based on a Wilcoxon rank sum test for difference in age, breast pathology experience, and breast specimen

composition of total caseload between low- and high-volume caseload groups. Otherwise p-values correspond to a Pearson chi-square test for a difference between caseload groups.

Page 39 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Confidential: For Review Only

29

Table 2. Over-Interpretation, Under-Interpretation and Misclassification Rates of Single Interpretation Compared to a Reference Consensus Standard. Nine Second Opinion Strategies Are Shown Based on Characteristics of the Case as Assessed by the Initial Pathologist.

Strategy

Rate, %

Reference Consensus Diagnosis

Benign Atypia DCIS Invasive Overall (95% CI)

P value1

SINGLE INTERPRETATION AND SECOND OPINION APPLIED TO ALL CASES

Single interpretation

% requiring 2nd

opinion 0.0 0.0 0.0 0.0 0.0

Over-interpretation % 12.9 17.4 2.6 - 9.9 (9.0, 10.8)

Under-interpretation % - 34.7 13.3 3.9 14.8 (13.8, 15.9)

Misclassification 12.9 52.2 15.9 3.9 24.7 (23.6, 25.8) n/a

Second opinion with resolution applied to all cases

% requiring 2nd

opinion 100.0 100.0 100.0 100.0 100.0

% requiring 3rd

opinion 19.7 55.9 21.6 3.7 29.6

Over-interpretation % 8.4 11.1 0.6 -- 6.0 (4.7, 7.5)

Under-interpretation % -- 29.9 9.3 3.5 12.1 (10.0, 14.3)

Misclassification 8.4 40.9 9.9 3.5 18.1 (16.1, 20.0) P<0.0001

CRITERION FOR OBTAINING SECOND OPINION BASED ON INITIAL DIAGNOSIS

Second opinion only for initial interpretations considered atypia or DCIS or invasive

% requiring 2nd

opinion 12.9 65.3 93.7 99.6 61.5 % requiring 3

rd opinion 10.4 36.5 17.8 3.3 19.8




Second opinion only for initial interpretations considered DCIS or invasive

% requiring 2nd

opinion 3.2 17.4 86.7 99.6 42.1

% requiring 3rd

opinion 2.9 13.1 12.2 3.3 8.8




Second opinion only for

Page 40 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


30

initial interpretations considered invasive

% requiring 2nd

opinion 1.0 0.4 2.6 96.1 10.4

% requiring 3rd

opinion 0.8 0.3 2.4 1.9 1.2



Misclassification 12.5 51.9 13.6 4.7 23.9 (22.1, 25.7) P=0.25

SECOND OPINION ONLY OBTAINED FOR CASES CONSIDERED BORDERLINE OR DIFFICULT

Second opinion obtained only for initial interpretations considered borderline

% requiring 2nd

opinion 19.0 45.3 21.4 3.5 26.1

% requiring 3rd

opinion 7.5 25.6 9.3 0.8 12.8




Second opinion obtained only for initial interpretations considered difficult

% requiring 2nd

opinion 23.2 48.2 24.8 11.1 30.0

% requiring 3rd

opinion 9.0 27.3 9.8 1.5 14.0




SECOND OPINION ONLY OBTAINED FOR CASES WHEN DESIRED OR REQUIRED BY POLICY OR BOTH

Second opinion only for cases when desired by pathologist

% requiring 2nd

opinion 26.7 55.9 30.5 15.5 35.5

% requiring 3rd

opinion 10.1 31.3 11.2 1.5 16.0




Second opinion only when required by policy

% requiring 2nd

opinion 33.8 40.6 55.3 59.9 44.9

% requiring 3rd

opinion 7.6 23.0 10.0 2.1 12.4




Second opinion only when desired or required by policy

Page 41 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


31

% requiring 2nd

opinion 54.0 80.4 75.5 69.8 70.0

% requiring 3rd

opinion 14.9 45.3 18.0 3.0 23.8



Misclassification 8.1 44.3 10.3 3.7 19.2 (17.3, 21.0) P<0.0001 1P-values are based on a Wald test for the difference in overall misclassification rates between the second opinion strategy and single pathologist interpretation. The test statistic uses the

bootstrap standard error of the difference in rates.

Page 42 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


32

Table 3. Over-Interpretation, Under-Interpretation and Misclassification Rates of Three Second Opinion Strategies

Based on Case Volume of Interpreting Pathologist Compared to a Reference Standard

Single interpretation rate, % Second opinion strategy rate, % P value3 Reference Consensus Diagnosis Reference Consensus Diagnosis

Strategy Benign Atypia DCIS Invasive Overall (95% CI)

Benign Atypia DCIS Invasive

Overall (95% CI)

Strategy 1. Low-volume pathologist

Second opinion from low-volume pathologist, third opinion from high-volume pathologist

% Over-interpretation

14.4 17.9 2.8 -- 10.5 (9.4,

11.7) 8.4 11.2 0.7 -- 6.1 (4.7, 6.7)

% Under-interpretation

-- 36.9 14.5 4.4 15.9 (14.5,

17.4) -- 30.0 9.4 3.6

12.2 (10.1, 14.3)

Misclassification rate %

14.4 54.8 17.2 4.4 26.4 (25.1,

27.8) 8.4 41.2 10.1 3.6

18.3 (16.2, 20.1)

P<0.0001

Strategy 2. Low-volume pathologist

Second opinion from high-volume pathologist, third opinion from high-volume pathologist


14.4 17.9 2.8 -- 10.5 (9.4,

11.7) 7.3 11.0 0.4 -- 5.6 (4.1, 7.6)


-- 36.9 14.5 4.4 15.9 (14.5,

17.4) -- 26.8 7.9 3.1

10.7 (8.5, 13.0)


14.4 54.8 17.2 4.4 26.4 (25.1,

27.8) 7.3 37.8 8.2 3.1

16.3 (13.9, 18.7)

P<0.0001

Strategy 3. High-volume pathologist

Second opinion from high-volume pathologist, third opinion from high-volume pathologist


10.1 16.5 2.2 -- 8.7 (7.1, 10.1) 6.1 10.4 0.2 -- 5.0 (3.0, 8.2)


-- 30.7 11.1 3.0 12.9 (11.3,

14.3) -- 23.7 6.4 2.5 9.3 (6.1, 12.8)


10.1 47.2 13.3 3.0 21.5 (19.5,

23.4) 6.1 34.1 6.6 2.5

14.3 (10.9, 18.0)

P<0.0001

Footnotes:

1. A high-volume pathologist is defined as a pathologist who reports interpreting an average of 10 or more breast cases per week. A lower volume pathologist reports

9 or fewer breast cases per week. We had 75 lower volume pathologists and 40 high-volume pathologists in the study sample.

2. Comparison of the overall misclassification across all reference diagnoses for single interpretation vs. the specified second opinion strategy.

3. P-values are based on a Wald test for the difference in overall misclassification rates between the second opinion strategy and single pathologist interpretation. The test statistic uses the bootstrap standard error of the difference in rates.

Page 43 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

33

Figure Legends

Figure 1. Determination of final biopsy interpretation when considering different policy

strategies for obtaining a second opinion.

Figure 2. Percent of individual case assessments in which a second opinion was

desired and/or would be required by policy in their clinical practice shown by the

pathologists’ diagnosis of the test case (N=115 pathologists, N=6,900 individual case

assessments).

Figure 3. Percent of cases misclassified based on whether the initial pathologist

indicated the case was borderline, difficult or would have obtained a second opinion on

the case (either desired or because of a policy at their lab). Results are shown for single

interpretations and after a second opinion strategy is applied to these cases.

Figure 3.A. Indicated the case was borderline between two diagnoses (26% of 6,900

single interpretations) vs. not borderline (74% of 6,900 interpretations).

Figure 3.B. Indicated the case was difficult (30% of 6,900 interpretations) vs. not

difficult (70% of 6,900 interpretations).

Figure 3.C. Policy or desired a second opinion (70% of 6,900 interpretations) vs. no

policy or no desire for a second opinion (30% of 6,900 interpretations).

Page 44 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

34

Figure 1. Determination of final biopsy interpretation when considering different criteria for obtaining a

second opinion.*

* Up to 3 pathologists may be needed to obtain a final interpretation. Data are comprised of 5,145,480

observations each involving 3 independent pathologist interpretations of a single slide from a breast

biopsy case and are derived from 115 single pathologists interpreting 60 cases each in 4 test sets.

†E.g., Based on the first pathologist’s interpretation or level of experience.

First Pathologist’s Interpretation

Is the criterion for obtaining a second

opinion met?†

No Yes

Final Interpretation

is first pathologist’s

interpretation

Second Pathologist’s Interpretation

Does second interpretation agree

with first?

No Yes

Third Pathologist’s Interpretation Final Interpretation

is common

interpretation of first

and second pathologists Final Interpretation

is majority interpretation

if 2 of 3 pathologists

agree or middle

interpretation if all 3

disagree

Page 45 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

35

Figure 2

Page 46 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

36

Figure 3A

Page 47 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

37

Figure 3B

Page 48 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

38

Figure 3C

Page 49 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


nly

* Up to 3 pathologists may be needed to obtain a final interpretation. Data are comprised of 5,145,480

observations each involving 3 independent pathologist interpretations of a single slide from a breast

biopsy case and are derived from 115 single pathologists interpreting 60 cases each in 4 test sets.

†E.g., Based on the first pathologist’s interpretation or level of experience, or case characteristics.

First Pathologist’s Interpretation

Is the criterion for obtaining a second

opinion met?†

No Yes

Final Interpretation

is first pathologist’s

interpretation

Second Pathologist’s Interpretation

Does second interpretation agree

with first?

No Yes

Third Pathologist’s Interpretation Final Interpretation

is common

interpretation of first

and second pathologists Final Interpretation

is majority interpretation

if 2 of 3 pathologists

agree or middle

interpretation if all 3

disagree

Page 50 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


Page 51 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


Page 52 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


Page 53 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960


Page 54 of 54


BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Documents

Confidential: For Review Only - BMJ · 2016-06-22 · Confidential: For Review Only Evaluation of Criteria for Obtaining Second Opinions to Improve Breast Histopathology Interpretation