Upload
john
View
213
Download
1
Embed Size (px)
Citation preview
C
J
TFe1wtmdCw
firimdcmdirT
DA
1d
linical Trials in Full-Field Digital Mammography
ohn Lewin, MD
Since the development of the first prototype over a decade ago, full-field digital mammog-raphy (FFDM) has been touted as a technically superior way to detect breast cancercompared with screen-film mammography (SFM). Proving technical superiority, however, iseasy compared with proving clinical superiority. Whereas the former requires only mea-surements on phantom images, the latter requires clinical trials to measure the interactionof the technology with real patients and the technologists and radiologists taking care ofthem. Starting in 1997, four clinical screening trials comparing FFDM with SFM have beenconducted. The results of the trials are mixed, but with the publication of the largest andmost recent trial, conducted by the American College of Radiology Imaging Network, thereare finally statistically significant results showing FFDM to be clinically superior to SFM forspecific groups of women.Semin Breast Dis 9:87-91 © 2006 Elsevier Inc. All rights reserved.
KEYWORDS digital mammography, breast cancer, breast imaging, clinical trials
ceastehTtdeeim
hhgofabwcwbc
he first full-field digital mammography (FFDM) proto-type device to be ready for clinical use was made by
ischer Imaging Corporation and tested on asymptomaticmployee volunteers at their factory in Denver in August995. Although only seven employees were imaged, a canceras detected in one volunteer from the images. This volun-
eer was less than 4 months out from her routine screen-filmammogram (SFM). It seemed that the promise of FFDM toetect cancers not visible on SFM, as predicted by a Nationalancer Institute (NCI) consensus conference in 1990,1
ould soon be realized.The rationale for proposing FFDM to be better than SFM
or detecting breast cancers rested on the technical superior-ty of FFDM in the areas of contrast resolution and dynamicange.2,3 Contrast resolution is a measure of the ability of anmaging system to distinguish different shades of gray. For
ammography, it tells how well it can depict different tissueensities. Dynamic range refers to the ability to maintainontrast resolution over a wide range of exposures. For mam-ography, this translates to being able to distinguish fineifferences in density (and thereby demonstrate tissue detail)
n both the dense and fatty parts of the breast. With SFM, theecording and display of the image are linked on the film.his inherently limits the dynamic range of the system. In the
iversified Radiology of Colorado, Denver, CO.ddress reprint requests to John Lewin, MD, Diversified Radiology of Col-
orado, PC, 938 Bannock Street, Suite 300, Denver, CO 80204. E-mail:
[email protected]092-4450/06/$-see front matter © 2006 Elsevier Inc. All rights reserved.oi:10.1053/j.sembd.2007.01.002
enter of this range, SFM responds linearly to differences inxposure, maximizing contrast resolution. At both the highnd low parts of the range of exposures, corresponding re-pectively to the very lightest and darkest parts of the image,he contrast resolution drops, going to zero where the film isssentially clear (at the low end of exposure) or black (at theigh end). FFDM decouples image recording and display.he detector records the image and a computer worksta-
ion displays it. Because the display task is separate, theetector can be made to respond linearly to a wide range ofxposures, thus maximizing contrast resolution over thentire range of dense and fatty breast tissue. Thus, FFDMs able to “see through” dense breast tissue better than film
ammography.The task of breast cancer detection is a complex one,
owever, involving not only the mammography unit, butuman beings who create the mammogram (technolo-ists) and interpret it (radiologists). How well the technol-gy interacts with the human is critical to its clinical per-ormance. There is no way to accurately simulate theppearance of the wide ranges of normal and cancerousreast tissue present in the screening population. How doe then demonstrate whether FFDM “sees through” the
ancers as well as the normal dense breast tissue? Onlyith clinical trials can the clinical performance of FFDMe determined. In this article, the data from the four majorlinical trials comparing the abilities of FFDM versus SFM
o detect breast cancers are presented.87
TMSTdsgtcwsvibotb
ssbitwtsrlet
Vitaawttmmni
GtpceutapfpD
a1trifbs
RAwattcFpEycswtm
wvuTiipabfugs
ma1mombmapami
t
88 J. Lewin
he Colorado/assachusetts Screening Trial
tudy Designhe first screening trial comparing FFDM with SFM was con-ucted at the University of Colorado and University of Mas-achusetts medical schools between 1997 and 2000.4,5 Theoal of the trial was to determine whether FFDM was betterhan, equivalent to, or worse than SFM in detecting breastancer. To do this, the prototype FFDM system being testedas placed in the same room as a standard film system. Each
ubject would undergo both SFM and FFDM at the sameisit, performed by the same technologist. The film and dig-tal mammograms would then be interpreted independentlyy different radiologists, each blinded to the results of thether mammogram. The radiologists would switch modali-ies daily so that each interpreted approximately equal num-ers of film and digital mammograms.An accurate two-tailed comparison required that the two
ystems be treated equivalently within the trial. Failure to doo would introduce biases. For example, choosing subjectsased on an abnormal clinical (film) mammogram would
ntroduce an obvious selection bias. To avoid this, all asymp-omatic women of screening age (40 years old and above)ere eligible. Unfortunately, the enrollment criteria meant
hat the incidence of breast cancer in the trial would be es-entially the same as that seen in clinical screening mammog-aphy, about 4 cancers per 1000 women screened.6-8 Thisow cancer incidence necessitated a large accrual to havenough cancers in the cohort to measure a difference in de-ection rate with statistical significance.
Another potential bias in such a trial is verification bias.erification bias occurs when a positive test from one modal-
ty is more likely to be proven true than a positive test fromhe other modality. For a screening mammography trial,voiding verification bias requires that findings on both filmnd digital are treated equivalently, with appropriate imagingorkup and biopsy when indicated. This design required
hat the experimental device, in this case the FFDM proto-ype, be allowed to affect clinical care in exactly the sameanner as the standard clinical device, the clinical film mam-ography system against which it was being tested. This isot the usual design for trials of diagnostic devices, but was
mplemented in this trial.The prototype FFDM system used in this trial was made by
eneral Electric Medical Systems (Milwaukee, WI). The sys-em used an 18- � 23-cm detector incorporating an amor-hous silicon thin-film transistor bonded to a cesium iodiderystal scintillator. Other than the detector and associatedlectronics, the system was identical to one of GE’s SFMnits, the DMR. For this reason, the GE DMR was chosen ashe film unit for the trial. The digital and film units hadlmost identical dimensions, allowing the same patientositioning to be used on both. The only noticeable dif-erence was the thickness of the digital detector on therototype, compared with the film cassette used on the
MR. sThe workstation supplied with the prototype consisted ofUnix-based computer with two 21-inch CRT monitors with800- � 2300-pixel resolution. Although state of the art forhe time, these monitors were much less bright and had loweresolution than the 2000- � 2500-pixel monitors used clin-cally today.9 Additionally, the workstation software wasairly crude, as it was not written by software engineers, buty the physicists who had developed the detector and acqui-ition system.
esultstotal of 6768 paired examinations were conducted on 4521omen (women could re-enroll at the time of their next
nnual screening) over a 30-month period at the 2 institu-ions. A total of 2048 findings called on at least 1 modality ledo 183 biopsies. Forty-two of the biopsies were positive forancer. Nine of these cancers were detected only on theFDM interpretation, 15 were detected only on SFM inter-retation, and 18 were detected on both interpretations.ight additional cancers were detected clinically within aear of a negative mammogram in the study. These intervalancers were used in the sensitivity calculations. Thus, theensitivity of film was 33/50 � 66%, whereas that of digitalas only 24/50 � 48%. Using a McNemar’s test, however,
he difference in the proportion of cancers detected by the 2ethods was not statistically significant.There was a significant difference in recall rate, however,
ith 15.0% of SFM studies being recalled for additionaliews versus only 11.9% on FFDM. Both of these rates werenusually high, compared with clinical practice.6-8,10-13
he positive predictive values of the two modalities weredentical, reflecting that the trend toward higher sensitiv-ty/true positives for film mammography came at the ex-ense of more recalls/false positives. Receiver operating char-cteristic (ROC) curves for the two modalities, constructedy having the readers give a likelihood of malignancy scoreor each finding, were not significantly different. The areander the ROC curve, a measure of overall accuracy, wasreater for SFM than for FFDM, but the difference was nottatistically significant.
A secondary analysis was performed to attempt to deter-ine the reason behind discordant readings between FFDM
nd SFM for each case in which a finding was called on onlymodality. For all mammographic findings, the most com-on reasons were slight difference in positioning, causing
verlapping normal tissue to simulate a possible mass on 1odality but not the other, and small differences of opinion
etween the readers. For the 24 cancers called on only 1odality, no single reason was notably more common than
nother. The only trend was that factors involving the inter-retation (as opposed to the appearance of the cancer) playedrole in about one-third of the cancers identified on SFM andissed on FFDM. These factors included differences of opin-
on, radiologist error, and workstation issues.The suboptimal workstation, especially the software and
he dim monitors, was considered a major limitation of the
tudy. Also cited was the lack of an automated exposurestSt
TSTtfwcsdpMbudptbhosaiboaFr
tiOstrrmP
RApsnrvsidsvt
7tcngiim
TSOmitFiefit
RAuijsdhe2eP2
ASFMdItlsFtucsootlp
Clinical trials in FFDM 89
etting on the digital prototype. The relative inexperience ofhe radiologists with FFDM, as compared with their years ofFM experience, was also felt to have contributed to the lesshan expected performance of FFDM.
he Oslo I Studytudy Designhe Oslo I and Oslo II studies14-16 were conducted sequen-
ially as part of the Norwegian National Screening Programrom January 2000 through December 2001. The Oslo I trialas a paired trial similar in design to the Colorado/Massa-
husetts trial. Each subject received both SFM and FFDMcreening mammograms at the same visit. The trial was con-ucted using the commercial system which resulted from therototype used in the earlier trial (Senographe 2000D; GEedical Systems, Milwaukee, WI). This unit was approved
y the U.S. Food and Drug Administration (FDA) for clinicalse in the United States in January 2000. Although the basicesign of the commercial unit was the same as that of therototype, it had some significant improvements, includinghe development of an automatic exposure system. Theiggest improvements were made to the workstation,owever, including the use of much brighter, higher res-lution 2000- � 2500-pixel monitors. The workstationoftware was improved both from an ergonomic standpointnd in terms of image processing. The largest processingmprovement was the implementation of an algorithm torighten up the areas of the mammogram near the peripheryf the breast so that the entire breast could be viewed withoutdjusting the contrast and brightness of the digital image.ilm mammograms were taken using a Siemens Mammomatather than the GE DMR.
As opposed to the single reading of each mammogram inhe Colorado/Massachusetts study, a practice typical of clin-cal mammography in the U.S., each mammogram in theslo trials was double read, with discordant reads being re-
olved in a consensus conference between the two readers. Athis conference, the readers would decide whether or not toecall a patient. Old films were not available for the initialeads but were available for the consensus conference. Thisode of practice is used in the Norwegian National Screening
rogram.
esultstotal of 3683 subjects were enrolled. There were 1054
ositive readings, which resulted in 296 recalls after consen-us conferences. From these recalls, 31 cancers were diag-osed. Unlike the Colorado/Massachusetts trial, the recallate for FFDM was actually higher than that for film (16.6%ersus 12.0% before consensus; 4.6% versus 3.5% after con-ensus). The differences were not tested for statistical signif-cance, however. Like the Colorado/Massachusetts trial, SFMetected more cancers than digital, but not at a statisticallyignificant level. Three cancers were detected only on FFDMersus 8 detected only on SFM. Twenty cancers were de-
ected on both modalities. The cancer detection rate was D.6/1000 for SFM versus 6.2/1000 for FFDM. Relative sensi-ivities were 90% and 74%, respectively. Of note is that theancer detection rate for each modality in the study was sig-ificantly higher than that of the Norweigan screening pro-ram as a whole. The cancer detection rate for clinical screen-ng in Norway is only 4.0/1000. The cause for the differences unknown. One can speculate that high-risk patients were
ore likely to enroll in the trial.
he Oslo II Trialtudy Designf the four screening trials conducted to test digital mam-ography, only the Oslo II trial16 had a nonpaired random-
zed design. In this trial, subjects from the Norwegian Na-ional Screening Program were randomized to receive eitherFDM or SFM at the time of their annual or bi-annual screen-
ng mammogram. Because each subject underwent only onexamination, the results of that examination, whether it waslm or digital, were used to guide their clinical care. Thus,here is no verification bias with this design.
esultstotal of 25,263 subjects were enrolled. Of these, 17,911
nderwent SFM and 6997 underwent FFDM screening stud-es. The cancer detection rate for FFDM was 5.9/1000 sub-ects (41 cancers total), whereas that for film was 4.1/1000ubjects (73 cancers total). The difference between canceretection rates was not statistically significant. FFDM didave a significantly higher recall rate after consensus confer-nce of 3.8% versus 2.7% for SFM. The recall rates were4.3% and 17.4%, respectively, before consensus confer-nce. These values were not tested for statistical significance.ositive predictive values were not significantly different at1.6% for FFDM and 22.1% for SFM.
CRIN-DMISTtudy Designollowing publication of the interim results of the Colorado/assachusetts trial, plans began for a larger trial to attempt to
etect a difference between digital and film mammography.t was recognized that the Colorado/Massachusetts trial, al-hough quite large compared with other radiology trials,acked statistical power due to the low cancer incidence in acreening mammography population. It was also clear thatFDM technology advanced rapidly even in the 3 years itook to run the Colorado/Massachusetts trial. Multiple man-facturers now had working prototypes ready to test. Theseompanies did not have the resources to conduct a largecreening trial to test them. The American College of Radiol-gy had just formed ACRIN, the American College of Radi-logy Imaging Network, to set up and oversee multi-institu-ional clinical trials in diagnostic imaging. ACRIN wasooking for good trials to develop and FFDM seemed an im-ortant technology to evaluate. The new trial was termed the
igital Mammographic Imaging Screening Trial (DMIST).tmcjsc
fcssbstaasArasrumoria
pnaa4HHOam
RAyd
dwctcsdsisSTc9
Rsldaws
wwstBewdotecc
CSreant
T
A<PH
90 J. Lewin
As with the Colorado/Massachusetts trial, the area underhe ROC curve was chosen as the primary measure of perfor-ance, with sensitivity and recall rate as secondary out-
omes. Power calculations indicated that about 50,000 sub-ects would be needed for a paired trial to be able to show atatistically significant difference in the area under the ROCurve.
A 25 million-dollar grant was obtained from the NCI tound the trial. Four different manufacturers’ devices werehosen. Along with the GE system used in the Oslo studies,ystems from Fischer Imaging, Hologic, and Fuji were cho-en. At the time, the GE system was the only system approvedy the FDA for clinical use in the United States. The Fischerystem was a slot scanning system very similar to the proto-ype tested on employees in 1995. It used a one-dimensionalrray of charge-coupled device (CCD) detector chips to scancross the breast. The Hologic system used a two-dimen-ional array of CCD chips coupled to fiber optics bundles.bout one-third of the way through the trial, this system waseplaced at all Hologic sites by an entirely different system,lso made by Hologic. The newer system used an amorphouselenium flatpanel detector. The Fuji system was a computedadiography (CR) system, similar in concept to the systemssed in many hospitals for general radiography. In CR-mam-ography, plates containing phosphor screens are exposed
n a standard film mammography unit and then carried to aeader which extracts the information and converts it to dig-tal form. The Fuji system had been used clinically in Japannd Europe for a decade.
Twenty sites were initially chosen for the trial. Becauseatient accrual at these sites was less than projected, theumber of sites was gradually increased to 33, with 2 sitesccruing on 2 machines and thus acting as separate sites, fortotal of 35. Seventeen of these sites were GE, 7 were Fischer,were Hologic, 6 were Fuji, and 2 sites switched betweenologic and GE machines during the trial (one went fromologic to GE and the other vice versa). Accrual started inctober 2001 and lasted 25.5 months. Accrual was limited to
symptomatic women 40 and over presenting for screeningammography.17
esultstotal of 49,528 subjects were enrolled. Available for anal-
sis after exclusions were 42,760. A total of 237 cancers wereetected by mammography in the trial; 63 of these were only
able 1 Results of ACRIN DMIST9
Cohort
No. CancersDetected Only
on Digital
No. CancersDetected On
on Film
ll subjects 63 5250 years old 22 6re- or peri-menopausal 33 11eterogeneously orextremely dense
40 19
breasts
etected by FFDM, 52 were only detected by SFM, and 122ere detected by both. The design of the trial counted any
ancer detected within 455 days of the study mammographyo be a miss. With this definition, there were 98 missed can-ers. Most of these were found on the subsequent year’screening mammogram and thus were found more than 365ays after study entry. To be consistent with other publishedtudies, only the cancers diagnosed within 365 days werencluded for calculating sensitivities and specificities. Theensitivities for the detection of breast cancer of FFDM andFM were calculated to be 70% and 66%, respectively.hese numbers were not found to be statistically signifi-ant. Specificity was equal for both FFDM and SFM at2%.18
The primary endpoint of the study was the area under aOC curve, calculated from a seven-point rating scale as-igned to each mammogram by the reader to indicate theikelihood of malignancy for that examination. The area un-er the curve (AUC) is a measure of the accuracy of the test,nd is affected by both sensitivity and specificity. The AUCas 0.78 for digital and 0.74 for film. This difference was not
tatistically significant.Analyses were then performed for the subgroups of
omen under 50, pre- or peri-menopausal women, andomen with dense breasts. This last group was defined as
ubjects whose breast density was subjectively rated in theop two categories in the American College of Radiology’sreast Imaging Reporting and Data System (BIRADS): “het-rogeneously dense” or “extremely dense.” Since youngeromen are more likely to be premenopausal and to haveenser breasts than older women, these groups have largeverlap. As for the entire cohort, FFDM performed betterhan SFM. In each of these subgroups, however, the differ-nce was statistically significant. Table 1 gives the number ofancers detected by each modality and the AUC for the totalohort and each subgroup in the study.
onclusionscientific evaluation of FFDM for breast cancer screening isendered difficult by both the low natural prevalence of dis-ase and the variability from human factors in the acquisitionnd, especially, the interpretation of the test. These factorsecessitate large numbers of subjects in clinical trials to ob-ain statistically significant results. The four major screening
. Cancersetected on
Bothodalities
No. CancersDetected on
NeitherModality
AreaUnder
ROC CurveDigital
AreaUnder
ROC CurveFilm
122 98 0.78 � 0.02 0.74 � 0.0226 18 0.84 � 0.03 0.69 � 0.0532 24 0.82 � 0.03 0.67 � 0.0554 52 0.78 � 0.03 0.68 � 0.03
ly
NoD
M
ttwpllitbdihtrsS
STfAFw
R
1
1
1
1
1
1
1
1
1
T
COODD
*†‡
Clinical trials in FFDM 91
rials of FFDM, summarized in Table 2, showed differingrends in breast cancer detection and accuracy, but noneere able to show a statistically significant difference com-ared with SFM for the cohort studied. Only the extremely
arge (and relatively expensive) ACRIN DMIST trial, by ana-yzing subgroups, showed a statistically significant increasen detection and accuracy for FFDM. Additional analyses onhe ACRIN DMIST data are pending, however, and otherenefits, or limitations, of FFDM may emerge. In any case,igital mammography, used clinically for only 6 years, is in
ts infancy compared with film mammography, which hasad four decades of use and improvement. It can be expectedhat technologic advances and clinical refinements will beapid for FFDM, improving what currently appears to be amall and difficult to detect performance advantage overFM.
ummaryhere have been four clinical trials comparing FFDM to SFM
or screening. These trials have produced mixed results. TheCRIN DMIST trial, however, has shown an advantage forFDM in cancer detection in the subgroups of youngeromen and those with dense breasts.
eferences1. Shtern F: Digital mammography and related technologies: a perspective
from the National Cancer Institute. Radiology 183:629-630, 19922. Feig SA, Yaffe MJ: Current status of digital mammography. Semin
Ultrasound CT MR 17:424-443, 19963. Suryanarayanan S, Karellas A, Vedantham S, et al: Flat-panel digital
mammography system: contrast-detail comparison between screen-film radiographs and hard-copy images. Radiology 225:801-807, 2002
4. Lewin JM, Hendrick RE, D’Orsi CJ, et al: Comparison of full-fielddigital mammography to screen-film mammography for cancer detec-tion: results of 4945 paired examinations. Radiology 218:873-880,2001
5. Lewin JM, D’Orsi CJ, Hendrick RE, et al: Clinical comparison of full-field digital mammography to screen-film mammography for breast
able 2 Comparison of Overall Results for Four Major Screen
TrialCancer
Detection/Sensitiv
olorado/Massachusetts Trend for filmslo I Trend for filmslo II Trend for digitalMIST–Total Cohort Trend for digitalMIST–Subgroups* Digital superior
Subgroups are: <50 years old; pre-/peri-menopausal; heterogeneoLarge trend not tested for significance in published paper.ROC analysis not performed in the Oslo I and II studies.
cancer detection. AJR Am J Roentgenol 179:671-677, 2002
6. Poplack SP, Tosteson AN, Grove MR, et al: Mammography in 53,803women from the New Hampshire Mammography Network. Radiology217:832-840, 2000
7. Rosenberg RD, Hunt WC, Williamson MR, et al: Effects of age, breastdensity, ethnicity, and estrogen replacement therapy on screeningmammographic sensitivity and cancer stage at diagnosis: review of183,134 screening mammograms in Albuquerque, New Mexico. Radi-ology 209:511-518, 1998
8. Mandelson MT, Oestricher N, Porter PL, et al: Breast density as apredictor of mammographic detection: comparison of interval- andscreen-detected cancers. J Natl Cancer Inst 92:1081-1087, 2000
9. Kim HH, Pisano ED, Cole EB, et al: Comparison of calcification speci-ficity in digital mammography using soft-copy display versus screen-film mammography. AJR Am J Roentgenol 187:47-50, 2006
0. Bassett LW, Hendrick RE, Bassford TL, et al: Quality Determinants ofMammography. Clinical Practice Guideline No. 13. AHCPR Publica-tion No. 95-0632. Rockville, MD: Agency for Health Care Policy andResearch, Public Health Service, U.S. Dept of Health and Human Ser-vices, Oct 1994
1. Linver MV, Paster S, Rosenberg RD, et al: Improvement in mammog-raphy interpetation skills in a community radiology practice after ded-icated teaching courses: 2-year medical audit of 38,633 cases. Radiol-ogy 184:39-43, 1992
2. Robertson CL: A private breast imaging practice: medical audit of25,788 screening and 1077 diagnostic examinations. Radiology 187:75-79, 1993
3. Sickles EA: Quality assurance: how to audit your own mammographypractice. Radiol Clin North Am 30:265-275, 1992
4. Skaane P, Young K, Skjennald A: Population-based mammographyscreening: comparison of screen-film and full-field digital mammogra-phy with soft-copy reading: Oslo I Study. Radiology 229:877-884,2003
5. Skaane P, Skjennald A, Young K, et al: Follow-up and final results of theOslo I Study comparing screen-film mammography and full-field dig-ital mammography with soft-copy reading. Acta Radiol 46:679-689,2005
6. Skaane P, Skjennald A: Screen-film mammography versus full-fielddigital mammography with soft-copy reading: randomized trial in apopulation-based screening program: the Oslo II Study. Radiology232:197-204, 2004
7. Pisano ED, Gatsonis CA, Yaffe MJ, et al: American College of RadiologyImaging Network digital mammographic imaging screening trial: ob-jectives and methodology. Radiology 236:404-412, 2005
8. Pisano ED, Gatsonis C, Hendrick E, et al: Digital Mammographic Im-aging Screening Trial (DMIST) Investigators Group. Diagnostic perfor-mance of digital versus film mammography for breast-cancer screening.
rials
RecallRate/Specificity
Area Under theROC Curve
Digital superior Trend for filmFilm superior† N/A‡Film superior N/A‡No difference Trend for digitalNo difference Digital superior
r extremely dense breasts.
ing T
ity
usly o
N Engl J Med 27:353:1773-1783, 2005