Clinical Trials in Full-Field Digital Mammography

C

J

TFe1wtmdCw

firimdcmdirT

DA

1d

linical Trials in Full-Field Digital Mammography

ohn Lewin, MD

Since the development of the first prototype over a decade ago, full-field digital mammog-raphy (FFDM) has been touted as a technically superior way to detect breast cancercompared with screen-film mammography (SFM). Proving technical superiority, however, iseasy compared with proving clinical superiority. Whereas the former requires only mea-surements on phantom images, the latter requires clinical trials to measure the interactionof the technology with real patients and the technologists and radiologists taking care ofthem. Starting in 1997, four clinical screening trials comparing FFDM with SFM have beenconducted. The results of the trials are mixed, but with the publication of the largest andmost recent trial, conducted by the American College of Radiology Imaging Network, thereare finally statistically significant results showing FFDM to be clinically superior to SFM forspecific groups of women.Semin Breast Dis 9:87-91 © 2006 Elsevier Inc. All rights reserved.

KEYWORDS digital mammography, breast cancer, breast imaging, clinical trials

ceastehTtdeeim

hhgofabwcwbc

he first full-field digital mammography (FFDM) proto-type device to be ready for clinical use was made by

ischer Imaging Corporation and tested on asymptomaticmployee volunteers at their factory in Denver in August995. Although only seven employees were imaged, a canceras detected in one volunteer from the images. This volun-

eer was less than 4 months out from her routine screen-filmammogram (SFM). It seemed that the promise of FFDM toetect cancers not visible on SFM, as predicted by a Nationalancer Institute (NCI) consensus conference in 1990,1

ould soon be realized.The rationale for proposing FFDM to be better than SFM

or detecting breast cancers rested on the technical superior-ty of FFDM in the areas of contrast resolution and dynamicange.2,3 Contrast resolution is a measure of the ability of anmaging system to distinguish different shades of gray. For

ammography, it tells how well it can depict different tissueensities. Dynamic range refers to the ability to maintainontrast resolution over a wide range of exposures. For mam-ography, this translates to being able to distinguish fineifferences in density (and thereby demonstrate tissue detail)

n both the dense and fatty parts of the breast. With SFM, theecording and display of the image are linked on the film.his inherently limits the dynamic range of the system. In the

iversified Radiology of Colorado, Denver, CO.ddress reprint requests to John Lewin, MD, Diversified Radiology of Col-

orado, PC, 938 Bannock Street, Suite 300, Denver, CO 80204. E-mail:
[email protected]
092-4450/06/$-see front matter © 2006 Elsevier Inc. All rights reserved.oi:10.1053/j.sembd.2007.01.002

enter of this range, SFM responds linearly to differences inxposure, maximizing contrast resolution. At both the highnd low parts of the range of exposures, corresponding re-pectively to the very lightest and darkest parts of the image,he contrast resolution drops, going to zero where the film isssentially clear (at the low end of exposure) or black (at theigh end). FFDM decouples image recording and display.he detector records the image and a computer worksta-

ion displays it. Because the display task is separate, theetector can be made to respond linearly to a wide range ofxposures, thus maximizing contrast resolution over thentire range of dense and fatty breast tissue. Thus, FFDMs able to “see through” dense breast tissue better than film

ammography.The task of breast cancer detection is a complex one,

owever, involving not only the mammography unit, butuman beings who create the mammogram (technolo-ists) and interpret it (radiologists). How well the technol-gy interacts with the human is critical to its clinical per-ormance. There is no way to accurately simulate theppearance of the wide ranges of normal and cancerousreast tissue present in the screening population. How doe then demonstrate whether FFDM “sees through” the

ancers as well as the normal dense breast tissue? Onlyith clinical trials can the clinical performance of FFDMe determined. In this article, the data from the four majorlinical trials comparing the abilities of FFDM versus SFM
o detect breast cancers are presented.
87

TMSTdsgtcwsvibotb

ssbitwtsrlet

Vitaawttmmni

GtpceutapfpD

a1trifbs

RAwattcFpEycswtm

wvuTiipabfugs

ma1mombmapami

t

88 J. Lewin

he Colorado/assachusetts Screening Trial

tudy Designhe first screening trial comparing FFDM with SFM was con-ucted at the University of Colorado and University of Mas-achusetts medical schools between 1997 and 2000.4,5 Theoal of the trial was to determine whether FFDM was betterhan, equivalent to, or worse than SFM in detecting breastancer. To do this, the prototype FFDM system being testedas placed in the same room as a standard film system. Each

ubject would undergo both SFM and FFDM at the sameisit, performed by the same technologist. The film and dig-tal mammograms would then be interpreted independentlyy different radiologists, each blinded to the results of thether mammogram. The radiologists would switch modali-ies daily so that each interpreted approximately equal num-ers of film and digital mammograms.An accurate two-tailed comparison required that the two

ystems be treated equivalently within the trial. Failure to doo would introduce biases. For example, choosing subjectsased on an abnormal clinical (film) mammogram would

ntroduce an obvious selection bias. To avoid this, all asymp-omatic women of screening age (40 years old and above)ere eligible. Unfortunately, the enrollment criteria meant

hat the incidence of breast cancer in the trial would be es-entially the same as that seen in clinical screening mammog-aphy, about 4 cancers per 1000 women screened.6-8 Thisow cancer incidence necessitated a large accrual to havenough cancers in the cohort to measure a difference in de-ection rate with statistical significance.

Another potential bias in such a trial is verification bias.erification bias occurs when a positive test from one modal-

ty is more likely to be proven true than a positive test fromhe other modality. For a screening mammography trial,voiding verification bias requires that findings on both filmnd digital are treated equivalently, with appropriate imagingorkup and biopsy when indicated. This design required

hat the experimental device, in this case the FFDM proto-ype, be allowed to affect clinical care in exactly the sameanner as the standard clinical device, the clinical film mam-ography system against which it was being tested. This isot the usual design for trials of diagnostic devices, but was

mplemented in this trial.The prototype FFDM system used in this trial was made by

eneral Electric Medical Systems (Milwaukee, WI). The sys-em used an 18- � 23-cm detector incorporating an amor-hous silicon thin-film transistor bonded to a cesium iodiderystal scintillator. Other than the detector and associatedlectronics, the system was identical to one of GE’s SFMnits, the DMR. For this reason, the GE DMR was chosen ashe film unit for the trial. The digital and film units hadlmost identical dimensions, allowing the same patientositioning to be used on both. The only noticeable dif-erence was the thickness of the digital detector on therototype, compared with the film cassette used on the
MR. s
The workstation supplied with the prototype consisted ofUnix-based computer with two 21-inch CRT monitors with800- � 2300-pixel resolution. Although state of the art forhe time, these monitors were much less bright and had loweresolution than the 2000- � 2500-pixel monitors used clin-cally today.9 Additionally, the workstation software wasairly crude, as it was not written by software engineers, buty the physicists who had developed the detector and acqui-ition system.

esultstotal of 6768 paired examinations were conducted on 4521omen (women could re-enroll at the time of their next

nnual screening) over a 30-month period at the 2 institu-ions. A total of 2048 findings called on at least 1 modality ledo 183 biopsies. Forty-two of the biopsies were positive forancer. Nine of these cancers were detected only on theFDM interpretation, 15 were detected only on SFM inter-retation, and 18 were detected on both interpretations.ight additional cancers were detected clinically within aear of a negative mammogram in the study. These intervalancers were used in the sensitivity calculations. Thus, theensitivity of film was 33/50 � 66%, whereas that of digitalas only 24/50 � 48%. Using a McNemar’s test, however,

he difference in the proportion of cancers detected by the 2ethods was not statistically significant.There was a significant difference in recall rate, however,

ith 15.0% of SFM studies being recalled for additionaliews versus only 11.9% on FFDM. Both of these rates werenusually high, compared with clinical practice.6-8,10-13

he positive predictive values of the two modalities weredentical, reflecting that the trend toward higher sensitiv-ty/true positives for film mammography came at the ex-ense of more recalls/false positives. Receiver operating char-cteristic (ROC) curves for the two modalities, constructedy having the readers give a likelihood of malignancy scoreor each finding, were not significantly different. The areander the ROC curve, a measure of overall accuracy, wasreater for SFM than for FFDM, but the difference was nottatistically significant.

A secondary analysis was performed to attempt to deter-ine the reason behind discordant readings between FFDM

nd SFM for each case in which a finding was called on onlymodality. For all mammographic findings, the most com-on reasons were slight difference in positioning, causing

verlapping normal tissue to simulate a possible mass on 1odality but not the other, and small differences of opinion

etween the readers. For the 24 cancers called on only 1odality, no single reason was notably more common than

nother. The only trend was that factors involving the inter-retation (as opposed to the appearance of the cancer) playedrole in about one-third of the cancers identified on SFM andissed on FFDM. These factors included differences of opin-

on, radiologist error, and workstation issues.The suboptimal workstation, especially the software and

he dim monitors, was considered a major limitation of the
tudy. Also cited was the lack of an automated exposure

stSt

TSTtfwcsdpMbudptbhosaiboaFr

tiOstrrmP

RApsnrvsidsvt

7tcngiim

TSOmitFiefit

RAuijsdhe2eP2

ASFMdItlsFtucsootlp

Clinical trials in FFDM 89

etting on the digital prototype. The relative inexperience ofhe radiologists with FFDM, as compared with their years ofFM experience, was also felt to have contributed to the lesshan expected performance of FFDM.

he Oslo I Studytudy Designhe Oslo I and Oslo II studies14-16 were conducted sequen-

ially as part of the Norwegian National Screening Programrom January 2000 through December 2001. The Oslo I trialas a paired trial similar in design to the Colorado/Massa-

husetts trial. Each subject received both SFM and FFDMcreening mammograms at the same visit. The trial was con-ucted using the commercial system which resulted from therototype used in the earlier trial (Senographe 2000D; GEedical Systems, Milwaukee, WI). This unit was approved

y the U.S. Food and Drug Administration (FDA) for clinicalse in the United States in January 2000. Although the basicesign of the commercial unit was the same as that of therototype, it had some significant improvements, includinghe development of an automatic exposure system. Theiggest improvements were made to the workstation,owever, including the use of much brighter, higher res-lution 2000- � 2500-pixel monitors. The workstationoftware was improved both from an ergonomic standpointnd in terms of image processing. The largest processingmprovement was the implementation of an algorithm torighten up the areas of the mammogram near the peripheryf the breast so that the entire breast could be viewed withoutdjusting the contrast and brightness of the digital image.ilm mammograms were taken using a Siemens Mammomatather than the GE DMR.

As opposed to the single reading of each mammogram inhe Colorado/Massachusetts study, a practice typical of clin-cal mammography in the U.S., each mammogram in theslo trials was double read, with discordant reads being re-

olved in a consensus conference between the two readers. Athis conference, the readers would decide whether or not toecall a patient. Old films were not available for the initialeads but were available for the consensus conference. Thisode of practice is used in the Norwegian National Screening

rogram.

esultstotal of 3683 subjects were enrolled. There were 1054

ositive readings, which resulted in 296 recalls after consen-us conferences. From these recalls, 31 cancers were diag-osed. Unlike the Colorado/Massachusetts trial, the recallate for FFDM was actually higher than that for film (16.6%ersus 12.0% before consensus; 4.6% versus 3.5% after con-ensus). The differences were not tested for statistical signif-cance, however. Like the Colorado/Massachusetts trial, SFMetected more cancers than digital, but not at a statisticallyignificant level. Three cancers were detected only on FFDMersus 8 detected only on SFM. Twenty cancers were de-
ected on both modalities. The cancer detection rate was D
.6/1000 for SFM versus 6.2/1000 for FFDM. Relative sensi-ivities were 90% and 74%, respectively. Of note is that theancer detection rate for each modality in the study was sig-ificantly higher than that of the Norweigan screening pro-ram as a whole. The cancer detection rate for clinical screen-ng in Norway is only 4.0/1000. The cause for the differences unknown. One can speculate that high-risk patients were

ore likely to enroll in the trial.

he Oslo II Trialtudy Designf the four screening trials conducted to test digital mam-ography, only the Oslo II trial16 had a nonpaired random-

zed design. In this trial, subjects from the Norwegian Na-ional Screening Program were randomized to receive eitherFDM or SFM at the time of their annual or bi-annual screen-

ng mammogram. Because each subject underwent only onexamination, the results of that examination, whether it waslm or digital, were used to guide their clinical care. Thus,here is no verification bias with this design.

esultstotal of 25,263 subjects were enrolled. Of these, 17,911

nderwent SFM and 6997 underwent FFDM screening stud-es. The cancer detection rate for FFDM was 5.9/1000 sub-ects (41 cancers total), whereas that for film was 4.1/1000ubjects (73 cancers total). The difference between canceretection rates was not statistically significant. FFDM didave a significantly higher recall rate after consensus confer-nce of 3.8% versus 2.7% for SFM. The recall rates were4.3% and 17.4%, respectively, before consensus confer-nce. These values were not tested for statistical significance.ositive predictive values were not significantly different at1.6% for FFDM and 22.1% for SFM.

CRIN-DMISTtudy Designollowing publication of the interim results of the Colorado/assachusetts trial, plans began for a larger trial to attempt to

etect a difference between digital and film mammography.t was recognized that the Colorado/Massachusetts trial, al-hough quite large compared with other radiology trials,acked statistical power due to the low cancer incidence in acreening mammography population. It was also clear thatFDM technology advanced rapidly even in the 3 years itook to run the Colorado/Massachusetts trial. Multiple man-facturers now had working prototypes ready to test. Theseompanies did not have the resources to conduct a largecreening trial to test them. The American College of Radiol-gy had just formed ACRIN, the American College of Radi-logy Imaging Network, to set up and oversee multi-institu-ional clinical trials in diagnostic imaging. ACRIN wasooking for good trials to develop and FFDM seemed an im-ortant technology to evaluate. The new trial was termed the
igital Mammographic Imaging Screening Trial (DMIST).

tmcjsc

fcssbstaasArasrumoria

pnaa4HHOam

RAyd

dwctcsdsisSTc9

Rsldaws

wwstBewdotecc

CSreant

T

A<PH

90 J. Lewin

As with the Colorado/Massachusetts trial, the area underhe ROC curve was chosen as the primary measure of perfor-ance, with sensitivity and recall rate as secondary out-

omes. Power calculations indicated that about 50,000 sub-ects would be needed for a paired trial to be able to show atatistically significant difference in the area under the ROCurve.

A 25 million-dollar grant was obtained from the NCI tound the trial. Four different manufacturers’ devices werehosen. Along with the GE system used in the Oslo studies,ystems from Fischer Imaging, Hologic, and Fuji were cho-en. At the time, the GE system was the only system approvedy the FDA for clinical use in the United States. The Fischerystem was a slot scanning system very similar to the proto-ype tested on employees in 1995. It used a one-dimensionalrray of charge-coupled device (CCD) detector chips to scancross the breast. The Hologic system used a two-dimen-ional array of CCD chips coupled to fiber optics bundles.bout one-third of the way through the trial, this system waseplaced at all Hologic sites by an entirely different system,lso made by Hologic. The newer system used an amorphouselenium flatpanel detector. The Fuji system was a computedadiography (CR) system, similar in concept to the systemssed in many hospitals for general radiography. In CR-mam-ography, plates containing phosphor screens are exposed

n a standard film mammography unit and then carried to aeader which extracts the information and converts it to dig-tal form. The Fuji system had been used clinically in Japannd Europe for a decade.

Twenty sites were initially chosen for the trial. Becauseatient accrual at these sites was less than projected, theumber of sites was gradually increased to 33, with 2 sitesccruing on 2 machines and thus acting as separate sites, fortotal of 35. Seventeen of these sites were GE, 7 were Fischer,were Hologic, 6 were Fuji, and 2 sites switched betweenologic and GE machines during the trial (one went fromologic to GE and the other vice versa). Accrual started inctober 2001 and lasted 25.5 months. Accrual was limited to

symptomatic women 40 and over presenting for screeningammography.17

esultstotal of 49,528 subjects were enrolled. Available for anal-

sis after exclusions were 42,760. A total of 237 cancers wereetected by mammography in the trial; 63 of these were only

able 1 Results of ACRIN DMIST9

Cohort

No. CancersDetected Only

on Digital

No. CancersDetected On

on Film

ll subjects 63 5250 years old 22 6re- or peri-menopausal 33 11eterogeneously orextremely dense

40 19

breasts

etected by FFDM, 52 were only detected by SFM, and 122ere detected by both. The design of the trial counted any

ancer detected within 455 days of the study mammographyo be a miss. With this definition, there were 98 missed can-ers. Most of these were found on the subsequent year’screening mammogram and thus were found more than 365ays after study entry. To be consistent with other publishedtudies, only the cancers diagnosed within 365 days werencluded for calculating sensitivities and specificities. Theensitivities for the detection of breast cancer of FFDM andFM were calculated to be 70% and 66%, respectively.hese numbers were not found to be statistically signifi-ant. Specificity was equal for both FFDM and SFM at2%.18

The primary endpoint of the study was the area under aOC curve, calculated from a seven-point rating scale as-igned to each mammogram by the reader to indicate theikelihood of malignancy for that examination. The area un-er the curve (AUC) is a measure of the accuracy of the test,nd is affected by both sensitivity and specificity. The AUCas 0.78 for digital and 0.74 for film. This difference was not

tatistically significant.Analyses were then performed for the subgroups of

omen under 50, pre- or peri-menopausal women, andomen with dense breasts. This last group was defined as

ubjects whose breast density was subjectively rated in theop two categories in the American College of Radiology’sreast Imaging Reporting and Data System (BIRADS): “het-rogeneously dense” or “extremely dense.” Since youngeromen are more likely to be premenopausal and to haveenser breasts than older women, these groups have largeverlap. As for the entire cohort, FFDM performed betterhan SFM. In each of these subgroups, however, the differ-nce was statistically significant. Table 1 gives the number ofancers detected by each modality and the AUC for the totalohort and each subgroup in the study.

onclusionscientific evaluation of FFDM for breast cancer screening isendered difficult by both the low natural prevalence of dis-ase and the variability from human factors in the acquisitionnd, especially, the interpretation of the test. These factorsecessitate large numbers of subjects in clinical trials to ob-ain statistically significant results. The four major screening

. Cancersetected on

Bothodalities

No. CancersDetected on

NeitherModality

AreaUnder

ROC CurveDigital

AreaUnder

ROC CurveFilm

122 98 0.78 � 0.02 0.74 � 0.0226 18 0.84 � 0.03 0.69 � 0.0532 24 0.82 � 0.03 0.67 � 0.0554 52 0.78 � 0.03 0.68 � 0.03

ly

NoD

M

ttwpllitbdihtrsS

STfAFw

R

1

1

1

1

1

1

1

1

1

T

COODD

*†‡

Clinical trials in FFDM 91

rials of FFDM, summarized in Table 2, showed differingrends in breast cancer detection and accuracy, but noneere able to show a statistically significant difference com-ared with SFM for the cohort studied. Only the extremely

arge (and relatively expensive) ACRIN DMIST trial, by ana-yzing subgroups, showed a statistically significant increasen detection and accuracy for FFDM. Additional analyses onhe ACRIN DMIST data are pending, however, and otherenefits, or limitations, of FFDM may emerge. In any case,igital mammography, used clinically for only 6 years, is in

ts infancy compared with film mammography, which hasad four decades of use and improvement. It can be expectedhat technologic advances and clinical refinements will beapid for FFDM, improving what currently appears to be amall and difficult to detect performance advantage overFM.

ummaryhere have been four clinical trials comparing FFDM to SFM

or screening. These trials have produced mixed results. TheCRIN DMIST trial, however, has shown an advantage forFDM in cancer detection in the subgroups of youngeromen and those with dense breasts.

eferences1. Shtern F: Digital mammography and related technologies: a perspective

from the National Cancer Institute. Radiology 183:629-630, 19922. Feig SA, Yaffe MJ: Current status of digital mammography. Semin

Ultrasound CT MR 17:424-443, 19963. Suryanarayanan S, Karellas A, Vedantham S, et al: Flat-panel digital

mammography system: contrast-detail comparison between screen-film radiographs and hard-copy images. Radiology 225:801-807, 2002

4. Lewin JM, Hendrick RE, D’Orsi CJ, et al: Comparison of full-fielddigital mammography to screen-film mammography for cancer detec-tion: results of 4945 paired examinations. Radiology 218:873-880,2001

5. Lewin JM, D’Orsi CJ, Hendrick RE, et al: Clinical comparison of full-field digital mammography to screen-film mammography for breast

able 2 Comparison of Overall Results for Four Major Screen

TrialCancer

Detection/Sensitiv

olorado/Massachusetts Trend for filmslo I Trend for filmslo II Trend for digitalMIST–Total Cohort Trend for digitalMIST–Subgroups* Digital superior

Subgroups are: <50 years old; pre-/peri-menopausal; heterogeneoLarge trend not tested for significance in published paper.ROC analysis not performed in the Oslo I and II studies.

cancer detection. AJR Am J Roentgenol 179:671-677, 2002

6. Poplack SP, Tosteson AN, Grove MR, et al: Mammography in 53,803women from the New Hampshire Mammography Network. Radiology217:832-840, 2000

7. Rosenberg RD, Hunt WC, Williamson MR, et al: Effects of age, breastdensity, ethnicity, and estrogen replacement therapy on screeningmammographic sensitivity and cancer stage at diagnosis: review of183,134 screening mammograms in Albuquerque, New Mexico. Radi-ology 209:511-518, 1998

8. Mandelson MT, Oestricher N, Porter PL, et al: Breast density as apredictor of mammographic detection: comparison of interval- andscreen-detected cancers. J Natl Cancer Inst 92:1081-1087, 2000

9. Kim HH, Pisano ED, Cole EB, et al: Comparison of calcification speci-ficity in digital mammography using soft-copy display versus screen-film mammography. AJR Am J Roentgenol 187:47-50, 2006

0. Bassett LW, Hendrick RE, Bassford TL, et al: Quality Determinants ofMammography. Clinical Practice Guideline No. 13. AHCPR Publica-tion No. 95-0632. Rockville, MD: Agency for Health Care Policy andResearch, Public Health Service, U.S. Dept of Health and Human Ser-vices, Oct 1994

1. Linver MV, Paster S, Rosenberg RD, et al: Improvement in mammog-raphy interpetation skills in a community radiology practice after ded-icated teaching courses: 2-year medical audit of 38,633 cases. Radiol-ogy 184:39-43, 1992

2. Robertson CL: A private breast imaging practice: medical audit of25,788 screening and 1077 diagnostic examinations. Radiology 187:75-79, 1993

3. Sickles EA: Quality assurance: how to audit your own mammographypractice. Radiol Clin North Am 30:265-275, 1992

4. Skaane P, Young K, Skjennald A: Population-based mammographyscreening: comparison of screen-film and full-field digital mammogra-phy with soft-copy reading: Oslo I Study. Radiology 229:877-884,2003

5. Skaane P, Skjennald A, Young K, et al: Follow-up and final results of theOslo I Study comparing screen-film mammography and full-field dig-ital mammography with soft-copy reading. Acta Radiol 46:679-689,2005

6. Skaane P, Skjennald A: Screen-film mammography versus full-fielddigital mammography with soft-copy reading: randomized trial in apopulation-based screening program: the Oslo II Study. Radiology232:197-204, 2004

7. Pisano ED, Gatsonis CA, Yaffe MJ, et al: American College of RadiologyImaging Network digital mammographic imaging screening trial: ob-jectives and methodology. Radiology 236:404-412, 2005

8. Pisano ED, Gatsonis C, Hendrick E, et al: Digital Mammographic Im-aging Screening Trial (DMIST) Investigators Group. Diagnostic perfor-mance of digital versus film mammography for breast-cancer screening.

rials

RecallRate/Specificity

Area Under theROC Curve

Digital superior Trend for filmFilm superior† N/A‡Film superior N/A‡No difference Trend for digitalNo difference Digital superior

r extremely dense breasts.

ing T

ity

usly o

N Engl J Med 27:353:1773-1783, 2005

Documents

Clinical Trials in Full-Field Digital Mammography