9
Original contribution Development of consensus guidelines for the histologic recognition of microscopic esophagitis in patients with gastroesophageal reflux disease: the Esohisto project ,☆☆ Roberto Fiocca MD a, , Luca Mastracci MD a , Robert Riddell MD b , Kaiyo Takubo MD c , Michael Vieth MD d , Lisa Yerian MD e , Prateek Sharma MD f , Paula Fernström MSc g , Magnus Ruth MD g a Department of Anatomic Pathology, University of Genoa, Genoa, Italy b Department of Pathology, Mount Sinai Hospital, Toronto, Canada c Department of Clinical Pathology, Tokyo Metropolitan Institute of Gerontology, Tokyo, Japan d Institute of Pathology, Klinikum Bayreuth, Bayreuth, Germany e Department of Anatomic Pathology, Cleveland Clinic, Cleveland, OH f Division of Gastroenterology and Hepatology, University of Kansas School of Medicine, Kansas City, KS, USA g AstraZeneca R&D, Mölndal, Sweden Received 26 March 2009; revised 18 July 2009; accepted 23 July 2009 Keywords: Consensus development; Esophageal histology; Esophagitis; Gastroesophageal reflux disease Summary No gold standard test exists for gastroesophageal reflux disease (GERD). Diagnostic difficulties are greatest when reflux symptoms occur without visible esophageal mucosal damage at conventional endoscopy. However, two thirds of such patients do have microscopic esophageal lesions. This study aimed to develop and standardize criteria for recognizing these microscopic esophageal lesions in GERD. Draft histologic criteria were developed and tested by an international group of 5 independent gastrointestinal pathologists using 167 biopsy specimens from GERD patients and healthy controls (phase I). Draft criteria were refined and reassessed using 250 photographs of biopsy specimens (phase II). Histologic lesions evaluated were basal cell hyperplasia, papillary elongation, intraepithelial eosinophil, neutrophil and mononuclear cell number, necrosis/erosion, healed erosion, and dilated intercellular spaces. Interobserver agreement and κ values increased significantly from phase I to II. When tested in annotated photographs (phase II), mean pairwise agreements were 74%, 89%, 93%, 97%, 81%, 97%, 94%, and 74%, respectively. Mean pairwise κ estimates (±SD) were 0.49 (0.16), 0.81 (0.05), 0.87 (0.05), 0.84 (0.09), 0.60 (0.09), 0.90 (0.04), 0.73 (0.14), and 0.61 (0.08), respectively. Estimated intraclass correlation coefficients for basal cell layer thickness and papillary length increased from 0.38 and 0.56 to 0.69 and 0.95, respectively, when revised criteria were used. The AstraZeneca R&D, Mölndal, Sweden, provided economic support for travel arrangements, logistics, biopsy specimens/photos and statistical analyses. ☆☆ Conflicts of interest: None declared: R. Fiocca, L. Mastracci, R. Riddell, K. Takubo, M. Vieth, L. Yerian, P. Sharma. Employees of AstraZeneca: P. Fernström, M. Ruth. Corresponding author. Division of Anatomical Pathology, University of Genoa, Via De Toni 14, 16132 Genoa, Italy. E-mail address: [email protected] (R. Fiocca). www.elsevier.com/locate/humpath 0046-8177/$ see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.humpath.2009.07.016 Human Pathology (2010) 41, 223231

Development of consensus guidelines for the histologic recognition of microscopic esophagitis in patients with gastroesophageal reflux disease: the Esohisto project

Embed Size (px)

Citation preview

www.elsevier.com/locate/humpath

Human Pathology (2010) 41, 223–231

Original contribution

Development of consensus guidelines for the histologicrecognition of microscopic esophagitis in patients withgastroesophageal reflux disease: the Esohisto project☆,☆☆

Roberto Fiocca MDa,⁎, Luca Mastracci MDa, Robert Riddell MDb,Kaiyo Takubo MDc, Michael Vieth MDd, Lisa Yerian MDe,Prateek Sharma MD f, Paula Fernström MSc g, Magnus Ruth MDg

aDepartment of Anatomic Pathology, University of Genoa, Genoa, ItalybDepartment of Pathology, Mount Sinai Hospital, Toronto, CanadacDepartment of Clinical Pathology, Tokyo Metropolitan Institute of Gerontology, Tokyo, JapandInstitute of Pathology, Klinikum Bayreuth, Bayreuth, GermanyeDepartment of Anatomic Pathology, Cleveland Clinic, Cleveland, OHfDivision of Gastroenterology and Hepatology, University of Kansas School of Medicine, Kansas City, KS, USAgAstraZeneca R&D, Mölndal, Sweden

Received 26 March 2009; revised 18 July 2009; accepted 23 July 2009

P

0d

Keywords:Consensus development;Esophageal histology;Esophagitis;Gastroesophageal refluxdisease

Summary No gold standard test exists for gastroesophageal reflux disease (GERD). Diagnosticdifficulties are greatest when reflux symptoms occur without visible esophageal mucosal damage atconventional endoscopy. However, two thirds of such patients do have microscopic esophageal lesions.This study aimed to develop and standardize criteria for recognizing these microscopic esophageallesions in GERD. Draft histologic criteria were developed and tested by an international group of5 independent gastrointestinal pathologists using 167 biopsy specimens from GERD patients andhealthy controls (phase I). Draft criteria were refined and reassessed using 250 photographs of biopsyspecimens (phase II). Histologic lesions evaluated were basal cell hyperplasia, papillary elongation,intraepithelial eosinophil, neutrophil and mononuclear cell number, necrosis/erosion, healed erosion,and dilated intercellular spaces. Interobserver agreement and κ values increased significantly fromphase I to II. When tested in annotated photographs (phase II), mean pairwise agreements were 74%,89%, 93%, 97%, 81%, 97%, 94%, and 74%, respectively. Mean pairwise κ estimates (±SD) were 0.49(0.16), 0.81 (0.05), 0.87 (0.05), 0.84 (0.09), 0.60 (0.09), 0.90 (0.04), 0.73 (0.14), and 0.61 (0.08),respectively. Estimated intraclass correlation coefficients for basal cell layer thickness and papillarylength increased from 0.38 and 0.56 to 0.69 and 0.95, respectively, when revised criteria were used. The

☆ AstraZeneca R&D, Mölndal, Sweden, provided economic support for travel arrangements, logistics, biopsy specimens/photos and statistical analyses.☆☆ Conflicts of interest: None declared: R. Fiocca, L. Mastracci, R. Riddell, K. Takubo, M. Vieth, L. Yerian, P. Sharma. Employees of AstraZeneca:

. Fernström, M. Ruth.⁎ Corresponding author. Division of Anatomical Pathology, University of Genoa, Via De Toni 14, 16132 Genoa, Italy.E-mail address: [email protected] (R. Fiocca).

046-8177/$ – see front matter © 2010 Elsevier Inc. All rights reserved.oi:10.1016/j.humpath.2009.07.016

224 R. Fiocca et al.

draft criteria achieved promising levels of agreement when assessed independently by 5 pathologists.Further steps include evaluation of lesions without indicating the area to be assessed and exploring thecorrelation of microscopic esophagitis with symptoms and esophageal acid exposure.© 2010 Elsevier Inc. All rights reserved.

1. Introduction of different methods for identifying patients and controls, for

Gastroesophageal reflux disease (GERD) encompasses awide variety of clinical manifestations, ranging fromsymptoms without visible lesions at conventional endoscopy(nonerosive reflux disease [NERD], including microscopicesophagitis or normal esophageal epithelium) to refluxesophagitis (ie, macroscopic esophagitis [Los Angelesgrades A-D]) and complications such as esophageal ulcers,strictures, hemorrhage, Barrett's esophagus, and esophagealadenocarcinoma [1-3]. The typical reflux syndrome isdefined by the presence of troublesome heartburn and/orregurgitation, although patients may also have othersymptoms such as epigastric pain or sleep disturbance [2].

There is presently no gold standard diagnostic test forGERD, and the diagnostic difficulties aremost apparent in thelarge proportion of patients with reflux symptoms who do nothave visible distal esophageal mucosal damage at conven-tional endoscopy [4-6], a large proportion of whom (up to50%) also display a normal acid exposure on prolongedpH-metry [7,8]. Diagnostic difficulty also occurs in patientswith atypical symptoms that may be attributed to reflux.

Ismail-Beigi et al [9] identified abnormal histologicfeatures in the unbroken esophageal mucosa in GERD inthe 1970s and concluded that basal cell hyperplasia andpapillary elongation close to the epithelial surface werehistologic consequences of gastroesophageal reflux. A majoradvantage of histologic assessment of these abnormalhistologic features is the easy accessibility of obtainingesophageal biopsy specimens during endoscopy in GERDpatients. However, their use as markers of esophagealdamage in patients with GERD has been hampered by thefact that the distribution of histologic changes has beenthought to be patchy and that the diagnostic sensitivity andspecificity associated with esophageal biopsy examinationshave generally been considered unsatisfactory [4]. As aconsequence, the general belief was that histologic examina-tion cannot be recommended in the diagnosis of NERD [10].

A recently published systematic review suggests that atleast two thirds of GERD patients who do not have visuallesions at conventional endoscopy have microscopic esoph-ageal lesions [11]. These include dilation of intercellularspaces (DIS), papillary elongation, basal cell hyperplasia,and infiltration of inflammatory cells. All of these featureswere more prevalent in GERD patients with abnormalesophageal acid exposure, and all histologic changesresponded to acid-suppressive therapy [12-15]. The author[11] concluded that variations in reported frequencies ofhistologic change among studies were likely to have beencaused by methodological factors, most importantly the use

histologic sampling and for scoring of the biopsy specimens.To gain acceptance for clinical use, histologic markers of

esophageal damage in patients with GERD must beappropriately validated. Although previous attempts atassessing interobserver agreement on histologic assessmenthave been performed in patients with GERD, most werebased on single-site studies [16,17]. In the present study, anindependent international group of pathologists (the Esohistogroup) have sought to standardize the use of histologicmarkers in the assessment of microscopic esophageallesions. This is part of a broader ongoing initiative (theEsohisto project) aimed at the development of internationallyacceptable, consensus-driven histologic criteria for therecognition of microscopic esophagitis in patients withGERD, and to test interobserver variability.

2. Materials and methods

2.1. Working group

An independent international group of pathologists wasconvened initially at a workshop in October 2004 todetermine whether the standardization of histologic markersof GERD could be improved. The pathologists involved inthe study were well-published experts in the field represent-ing different geographical areas (Europe, Japan, andNorth America).

A lack of standardization in histologic markers and theirscoring was identified. Subsequently, a meeting was held inJuly 2005 to gain agreement on the histologic markers to beused and the scoring methods to be assessed. In October2005, a pathology session was held to agree on a case reportform design, a definition document including the drafthistologic criteria for the first study, and also to conduct asmall-scale interobserver variability check on histologicslides using the agreed criteria.

2.2. Phase I

After the October 2005 meeting, a study (phase I) usingthe agreed criteria was conducted by a physical relay ofhistologic slides in an independent evaluation by 5 pathol-ogists from different institutions in Europe (n = 2), NorthAmerica (n = 2), and Japan (n = 1). The study in which thepatients and healthy subjects participated was conducted inaccordance with the principles of the Declaration of Helsinkiand was approved by an independent ethics review board.

225Esohisto project

The study was a substudy of a prospective, randomized,double-blind, cross-over trial of esomeprazole, 40 mg, twicedaily versus placebo (data on file). The trial included menand women aged 19 to 60 years with a history of heartburn(assessed using the reflux disease questionnaire) occurringon average on a weekly basis during a minimum 3-monthperiod before screening (Los Angeles classification of refluxesophagitis [n = 15]; NERD [n = 17]). The trial also includeda control group of healthy subjects (n = 10), who hadno significant clinical disease or history of heartburn,regurgitation, dysphagia, or chest pain or any other majorabdominal complaints in the last year or history of signs ofpathologic changes of the esophageal mucosa visible withconventional endoscopic examination.

Four biopsy specimens, one in each quadrant, from 2 cmabove the normally located squamocolumnar junction weretaken by endoscopic sampling (Cook forceps ADFS-2.2-160; Cook Medical Inc., Bloomington, IN) from eachindividual and placed in 4 separate containers. Whereverpossible, the biopsy specimens were taken from the top ofmucosal folds while avoiding erosive changes if present.Biopsy specimens were fixed, embedded in paraffin, and cutinto 4 to 6 μm thick sections. Sections for morphologicalanalysis were stained with hematoxylin-eosin. All biopsyspecimens were coded to enable blinded evaluation, asdescribed previously.

The 5 pathologists independently evaluated 167 biopsyspecimen slides (from 1 patient, only 3 biopsy specimenswere obtained). The biopsy specimens were assessed in acompletely blinded fashion. They were recoded with newcodes for this study purpose, and all pathologists used thesame specimens. The biopsy specimens had not beenpreviously evaluated by any of the participating pathologists.

The draft features for assessment of the slides of biopsyspecimens included total epithelial thickness, basal cell layerthickness, and papillary length in micrometers (Fig. 1);number of intraepithelial eosinophils, neutrophils, andmononuclear cells/×40 high-power field; presence ofnecrosis/erosion; and severity of DIS (Fig. 2). The upperlimit of the basal cell layer was defined as the level abovewhich the nuclei were separated by a distance greater thanthe nuclear diameter. Papillary length was defined as theupper limit of the vessel wall and was assessed in a fieldwhere the base of the papillae could be clearly recognized.Two DIS patterns were identified: “bubble” and “laddered”patterns. Bubbles (Fig. 2B and 2C) were defined as irregularround dilatations, whereas ladders (Fig. 2D) were defined asa diffuse widening of the intercellular space. For

Fig. 1 Biopsy sample slide stained with hematoxylin-eosin(original magnification 10×). Microscopic esophageal changes inreflux disease: increasing grades of basal cell hyperplasia andpapillary elongation. (A) Normal (basal cell layer is 2 cell layersthick [6.5% of total epithelial thickness]). (B) Mild (basal cell layeris 5-6 cell layers thick [16%]). (C) Marked (basal cell layer N 10 celllayers thick [31%]).

Fig. 2 Biopsy sample slide stained with hematoxylin-eosin (original magnification 40×). Microscopic esophageal changes in reflux disease:increasing grades of DIS (A, absent; C, small; B and D, large). (B) Arrows indicate large DIS “bubbles.” (D) Arrows indicate large DIS“ladders.” (C) Arrows indicate small DIS “bubbles.” Intraepithelial eosinophilic granulocytes are present in C and D.

226 R. Fiocca et al.

inflammatory cells, the raw number of neutrophils, eosino-phils, and mononuclear cells was counted in the mostaffected high-power field (40×). All types of mononuclearcells were considered. The assessment methods and scoringsystems used for these lesions during this phase of the studyare summarized in Table 1.

The histologic slides were also scored for quality. Slidequality was scored as 1, 2, or 3 based on subjective evaluationof sectioning, staining quality, and tissue orientation.

2.3. Phase II

After phase I was completed, the consensus at the reviewmeeting in June 2006 was that the variability seen could becaused by either a “true” disagreement (ie, the observers did notinterpret or score the feature in the same way) or it was due to adifference in the selection of the area to be assessed (ie,“topographical” factors). For phase II, basal cell layer thicknessand papillary length were described by use of the absolutemeasure (micrometer) only, and the subjective assessments of

basal cell layer thickness and papillary length, as percentagesof the overall epithelial thickness, were abandoned.

The criteria used in phase I were refined and reassessedusing photographic images taken from the Long-Term Usageof Acid Suppression Versus Antireflux Surgery (LOTUS)study biopsies for which clinical details have already beenpublished [18]. The use of the LOTUS study slides for thecurrent study fell within the scope of the original ethicsapproval, and all of the images used were renderedanonymous. In brief, the LOTUS study is a randomizedstudy comparing the efficacy of laparoscopic antirefluxsurgery with that of esomeprazole therapy and includes 554GERDpatients (mean age, 45.1 years; 71.8%male), 52.5% ofwhom had reflux esophagitis visible at conventionalendoscopy at baseline. A sample of 250 slides from thestudy was selected by one of the authors (LM) to represent thefull spectrum of histologic lesion severity. The selector wasan expert in esophageal pathology but did not participate inthe subsequent assessments. For the assessment of basal celllayer thickness, papillary length, and erosions, photographs

Table 1 Phase I histologic criteria used for the assessment ofbiopsy specimens

Criterion Assessment method Scoring

Basal cell layerthickness

Measured in μm (usinga micrometer) andassessed by sight as apercentage of the totalepithelial thickness

0 (b15%)1 (15%-30%)2 (N30%)

Papillary length Measured in μm andassessed as a percentageof the total epithelialthickness

0 (b50%)1 (50%-75%)2 (N75%)

Intraepithelialeosinophils andneutrophils

Counted in the mostaffected high-powerfield (×40)

0 (absent)1 (1-2 cells)2 (N2 cells)

Intraepithelialmononuclearcells

Counted in the mostaffected high-powerfield (×40)

0 (b10 cells)1 (10-30 cells)2 (N30 cells)

DIS “bubbles” Identified as irregularround dilations

0 (absent)1 (small)2 (large/very large)

DIS “ladders” Identified as diffusewidening ofintercellular spaces

0 (absent)1 (small)2 (large/very large)

227Esohisto project

were taken in well-oriented areas with a 10× objective, andphotographs for the assessment of DIS and inflammatorycells were taken with a 40× objective. For the assessment ofbasal cell layer thickness and papillary length, the area to beassessed was already marked on each photograph (annotatedphotographs). All the photographs were printed in highdefinition on high-gloss, photo-quality paper (one picture oneach page), and each pathologist received their own copy ofeach photograph.

The working group reassessed the importance of theindividual draft criteria that were to be applied to thehistologic samples and refined them as follows.

• Absence or presence of necrosis/erosion and healederosion was added (Fig. 3).

• DIS bubbles and ladders were collapsed into a singleDIS criterion (“DIS bubbles/ladders”).

The photographs were also scored for quality as 1, 2, or3 based on subjective evaluation of sectioning, stainingquality, and tissue orientation.

ig. 3 Biopsy sample slide stained with hematoxylin-eosinriginal magnification 10×). Microscopic esophageal changes influx disease: erosion (A) is defined by the presence of eitherecrosis or fibrinogranulocytic pseudomembrane, and healedrosion (B) shows granulation tissue (subepithelial fibrosis andilated capillaries) covered by thinned regenerative epithelium.

2.4. Statistical analysis

For categorical variables, κ values (the proportion ofextra agreement after adjustment for chance agreement) werecalculated when assessing the interobserver variation. κ wasdetermined through K = (Po − Pe) (1 − Pe). In this equation,Po describes the observed agreement, whereas Pe representsthe expected rate of chance agreement, which is calculated as

the product of the marginal distributions. A κ value of0.21 to 0.40 is considered as fair, 0.41 to 0.60 as moderate,0.61 to 0.80 as high, and 0.81 to 1.00 as very high [19].

For continuous variables, interobserver agreement in thegrading of the draft histologic criteria for microscopicesophagitis was evaluated by the intraclass correlationcoefficient (ICC). Because the procedure used for estimatingICC relies on the theory of models with normally distributedresiduals, a logistic transformation was used before calcu-lation of the ICC. The ICC was estimated by fitting a linearrandom effects model with random effects for slide andpathologist and a random error term. Each of the effects wasmodeled as a normally distributed random variable. Inmathematical terms, the model has the form y ≈ N (μ, σi +σs + σr), where N indicates that y follows a normaldistribution, σi denotes the SD between investigators, σs

the SD between slides, and σr the SD of the random errorterm. The correlation coefficient ICC was then defined as ρ =

F(orened

Table 3 Phase I—estimated ICCs based on evaluation of 167histologic slides by a working group of 5 pathologists

Variable ICC a 95% CI

Basal cell layer thickness (μm) 0.38 0.10–0.73Papillary length (μm) 0.56 0.28–0.84

Abbreviation: CI, confidence interval.a Intraclass correlation coefficient for logistically transformed

values estimated from a model including random effects ofinvestigator and slides.

228 R. Fiocca et al.

σs/(σs + σi + σr). In this equation, σs described variationsdue to differences between different slides, that is, truedifference; σi described variations due to differencesbetween different assessors; and σr described variationsdue to random differences between measurements. An ICCvalue close to 1 indicated a high level of agreement within aslide, and conversely, a value close 0 indicated virtually noagreement within a slide.

Continuous variables were also converted into discretevariables by being categorized according to the severitycutoff scores. This allowed percentage agreement and κvalues to be calculated for continuous variables, which wasuseful because it allowed comparisons to be made with othervariables that were discrete by nature.

3. Results

3.1. Phase I

Mean pairwise κ estimates for basal cell layer thickness,papillary length, DIS, and inflammatory cell infiltration areshown in Table 2. κ values were also calculated based onlyon those slides that all investigators had classified as being ofacceptable quality (86.9% of slides). This did not signifi-

Table 2 Phase I—mean pairwise κ estimates and agreementsbased on an evaluation of 167 histologic slides by a workinggroup of 5 pathologists

Variable Pairwise κ Uncorrectedpairwiseagreement (%)

Mean ± SD Range

Basal cell layerthickness (%) a

0.38 ± 0.25 0.03-0.72 59

Basal cell layerthickness (μm) b

0.34 ± 0.25 0.04-0.64 55

Basal cell hyperplasiascore c

0.39 ± 0.25 0.05-0.70 57

Papillary length (%) a 0.63 ± 0.22 0.39-0.91 88Papillary length (μm) b 0.59 ± 0.24 0.32-0.89 87Papillary elongationscore c

0.39 ± 0.14 0.24-0.56 62

DIS bubbles 0.28 ± 0.12 0.07-0.45 50DIS ladders 0.34 ± 0.17 0.09-0.58 54DIS, overall 0.32 ± 0.17 0.06-0.56 52Intraepithelialeosinophils

0.74 ± 0.14 0.54-0.89 86

Intraepithelialneutrophils

0.56 ± 0.36 0.09-0.96 89

Intraepithelialmononuclear cells

0.38 ± 0.15 0.16-0.59 64

NOTE. Number of pairs = 10.a Percentage of total thickness was evaluated by sight.b Total epithelial thickness, basal cell layer thickness, and papillary

length were measured by means of a micrometer.c A semiquantitative score of severity was used.

cantly alter the rate of interobserver agreement (data notshown). Table 3 shows the estimated ICC values for thecontinuous variables basal cell layer thickness and papillarylength in micrometers for phase I.

A qualitative assessment of the biopsy specimens showedthat the histologic damage was distributed relatively equallyacross the 4 esophageal mucosal quadrants. There was slightlymore damage at the 12-o'clock and 6-o'clock positions,compared with the 3-o'clock and 9-o'clock positions, butthese differences were not statistically significant.

3.2. Definitions and methodologies agreed basedon phase I results

The consensus view was that changes to the definitions ofthe histologic criteria were required. DIS bubbles and ladderswere collapsed into a single DIS criterion. This was becauseDIS bubbles and ladders tended to occur concurrently in theDIS-positive cases, making it difficult to distinguish betweenthe two, and assessing them separately did not increasereproducibility. Necrosis/erosions and healed erosion (yes/no) were added, as erosion represents the most severeexpression of microscopic esophagitis, and their presencealso makes the assessment of other criteria difficult and thus

Table 4 Phase II—mean pairwise κ estimates and agreementsbased on an evaluation of 250 histologic photographs by aworking group of 5 pathologists

Variable Pairwise κ Uncorrectedpairwiseagreement (%)

Mean ± SD Range

Basal cell layerthickness (μm)

0.49 ± 0.16 0.26-0.72 74

Papillary length (μm) 0.81 ± 0.05 0.74-0.88 89DIS bubbles/ladders 0.61 ± 0.08 0.49-0.75 74Intraepithelialeosinophils

0.87 ± 0.05 0.80-0.94 93

Intraepithelialneutrophils

0.84 ± 0.09 0.74-1.00 97

Intraepithelialmononuclear cells

0.60 ± 0.09 0.43-0.68 81

Necrosis/erosions 0.90 ± 0.04 0.83-0.96 97Healed erosions 0.73 ± 0.14 0.55-1.00 94

NOTE. Number of pairs = 10.

Table 5 Phase II—estimated ICCs based on an evaluation of250 histologic photographs by a working group of 5pathologists

Variable ICC a 95% CI

Basal cell layer thickness (μm) 0.69 0.39-0.90Papillary length (μm) 0.95 0.87-0.99

a Intraclass correlation coefficient for logistically transformedvalues estimated from a model including random effects of investigatorand slides.

229Esohisto project

potentially meaningless. Because the 3 methods for reportingbasal cell layer thickness and papillary length in phase I didnot yield significantly different results (ie, absolute measurein micrometers, % by sight, and semiquantitative score), itwas decided to use only micrometer measures in phase II.

3.3. Phase II—evaluation of annotatedhistologic photographs

Mean pairwise κ estimates using the refined draft criteriain the assessment of the annotated photographic images arepresented in Table 4. Applying the refined draft criteria to thephotographs, interobserver agreement was moderate forbasal cell layer thickness; high for DIS, mononuclear cells,and healed erosions; and almost perfect for all the othervariables assessed. κ values were also calculated based onlyon those photographs that all investigators had classified asbeing of acceptable quality (77.8% of photographs). This didnot significantly alter the rate of interobserver agreement(data not shown). Table 5 shows the estimated ICC values forbasal cell layer thickness and papillary length in micrometersfor phase II.

4. Discussion

Our initial validation of draft criteria for the recognition ofmicroscopic esophagitis was conducted using histologicslides of biopsy specimens (phase I). After refinement, thecriteria were reassessed using photographic images of biopsyspecimens (phase II). In phase II, the area to be assessed wasindicated on each image to test the hypothesis that thevariability seen was caused by a difference in the selection ofthe area to be assessed. The quality of the slides andphotographs was graded based on subjective evaluation ofsectioning, staining quality, and tissue orientation, and mostslides and photographs were judged by the pathologists asbeing of acceptable quality. Our study did not support thefindings of the worst histologic damage being at the3-o'clock position [20,21]. Rather, they were fairly evenlydistributed, with a slight but nonsignificant increase in the6-o'clock and 12-o'clock positions.

The 12 initial draft criteria for the recognition ofmicroscopic esophagitis showed levels of agreement that

ranged from 50% to 89%. Corresponding mean κ valuesranged from fair to high (0.28-0.74), with only 2 of the12 criteria assessed (papillary length and intraepithelialeosinophils) reaching high κ values. It was agreed that thevariability seen could be caused by either a “true”disagreement (ie, the observers did not interpret or scorethe feature in the same way) or a difference in the selection ofthe area to be assessed (ie, “topographical” factors). Whenthe draft histologic criteria were refined after phase I, theexamination of the annotated photographic images led tomarked increases in interobserver agreement, ranging from74% to 97%, with mean κ values being high for 6 of the8 criteria assessed. Overall, κ values ranged from moderateto very high (0.49-0.90). Particularly large improvements ininterobserver agreement were detected for the assessment ofbasal cell layer thickness in micrometer and that of DIS,which was collapsed from the separate assessment of DISbubbles and ladders into a single DIS criterion. DIS wasassessed by light microscopy without measuring intercellularspace widths because such measurements are laborious andrequire electron microscopy and are therefore unsuited toclinical practice. The lesions of necrosis/erosion and ofhealed erosion were added as assessment criteria in phase II,and both criteria showed excellent levels of agreement (97%and 94%, respectively) and high/very high mean κ values(0.90 and 0.73, respectively).

The marked improvements in interobserver agreement inphase II showed that the amendments to the draft criteriaaided assessment of microscopic esophagitis and also suggestthat the variability observed in phase I was due in large part toa difference in the selection of the area to be assessed. Thus, inaddition to improving the interassessor reproducibility of thehistologic criteria, the current study also identified animportant avenue for their future improvement—the devel-opment of specific guidance on how to select the area ofassessment for each lesion, so as to eliminate topographicalvariation as a major source of inconsistency amongpathologists. Both of these outcomes represent a significantcontribution to the goal of standardizing histologic markersfor the assessment of microscopic esophageal lesions.

The 2 histologic variables for which κ values were onlymoderate in phase II were basal cell layer thickness andintraepithelial infiltration of mononuclear cells, although thepercentage agreement was 74% and 81%, respectively.However, κ calculations take into account the proportion ofexpected agreement and, as such, have limitations [22]. Thisis the situation in particular when the prevalence of histologicvariables to be assessed is low, as is the case with infiltrationof mononuclear cells, or if the expected agreement is high, aswould be the case with measurement in micrometer of aparameter such as basal cell layer thickness. In such cases, κvalues may be low although the levels of agreement are highand although individual ratings are accurate.

The observed agreements in the present study arepromising, especially in comparison with numerous acceptedhistologic definitions routinely used in daily pathology

230 R. Fiocca et al.

practice. For example, in a study on the histologic detectionof Helicobacter pylori gastritis, κ values for atrophy werepoor to fair, ranging from 0.08 to 0.29 [23]. In a study usingthe Houston analog scale for the grading of histopathologicparameters of gastric atrophy, the κ value for grade ofatrophy was poor (0.18) [24]. Furthermore, a study using 3different scoring systems for chronic hepatitis showed onlyfair agreement for piecemeal necrosis (κ values of 0.39 and0.40) [25]. The present studies show that interobserveragreement may be improved by systematic identification andelimination of sources for disagreement as has previouslybeen shown for common training [26,27] and the establish-ment of consensus guidelines [26,28,29].

Microscopic esophagitis is a common type of esophagealepithelial injury in patients with GERD, and it is importantto ensure consistency in its assessment and classification.The first step in this process is to agree upon definitionsand guidelines on how to evaluate these lesions, includinghow to choose the area on the biopsy specimens that is bestsuited for evaluation. The next step would be to examinehow histologic features of microscopic esophagitis correlatewith clinical signs and symptoms of NERD. This wouldprovide meaningful histologic measures of efficacy forfuture clinical trials of treatments for GERD and enable theincorporation of endoscopic biopsy assessments into routineclinical practice.

A major strength of the present study, compared withprevious efforts to assess interobserver agreement on thehistologic diagnosis of GERD, was the multiple-site,international basis of the study. Furthermore, the biopsyspecimens were obtained from individuals with well-characterized phenotypes with respect to GERD. A numberof limitations of the study were identified. The tested slideswere of high quality and thus probably not representative ofroutine clinical practice; the evaluation of the slides andphotos was done by highly specialized experts, thus, limitingthe generalizability of the results obtained; the area to beassessed was marked on the photographs (in phase II only);and a training effect from phase I to phase II is likely.

In conclusion, the refined draft consensus criteria for thehistologic recognition of microscopic esophagitis achievedpromising levels of agreement when assessed independentlyby a working group of 5 pathologists using histopathologicphotographs. However, assessment needs to be selected withcare, and criteria on how to choose assessment are needed.Once a reliable set of criteria and guidelines for how toperform the evaluation have been developed and validated,these can be applied in a range of data sets to test correlationswith clinical variables and to determine to what extentpathologists can contribute to the assessment of GERD.Ideally, histologic examination may also become a tool tocompare different therapies for GERD and to betterunderstand the heterogeneous group of NERD patients,especially those with a negative pH-metry; provide usefulsurrogate markers of esophageal injury related to GERD; andaid in the diagnosis of the disease in clinical practice.

Acknowledgment

Dr Anja Becher and Dr Michael Bland from OxfordPharmaGenesis, Oxford, UK, provided writing assistancefunded by AstraZeneca R&D, Mölndal, Sweden.

References[1] Locke III GR, Talley NJ, Fett SL, et al. Prevalence and

clinical spectrum of gastroesophageal reflux: a population-basedstudy in Olmsted County, Minnesota. Gastroenterology 1997;112:1448-56.

[2] Vakil N, Veldhuyzen van Zanten S, Kahrilas P, et al. The Montrealdefinition and classification of gastro-esophageal reflux disease(GERD)—a global evidence-based consensus. Am J Gastroenterol2006;101:1900-20.

[3] Stanghellini V, Cogliandro R, Cogliandro L, et al. Unsolved problemsin the management of patients with gastro-oesophageal reflux disease.Dig Liver Dis 2003;35:843-8.

[4] Voutilainen M, Sipponen P, Mecklin JP, et al. Gastroesophageal refluxdisease: prevalence, clinical, endoscopic and histopathological find-ings in 1,128 consecutive patients referred for endoscopy due todyspeptic and reflux symptoms. Digestion 2000;61:6-13.

[5] Zagari RM, Pozzato P, Nicolini G, et al. Prevalence of asymptomaticendoscopic lesions of the upper gastrointestinal tract. Preliminaryresults of The Loiano-Monghidoro population study. Gastroenterology2002;122:A208.

[6] Ronkainen J, Aro P, Storskrubb T, et al. High prevalence ofgastroesophageal reflux symptoms and esophagitis with or withoutsymptoms in the general adult Swedish population: a Kalixanda studyreport. Scand J Gastroenterol 2005;40:275-85.

[7] Wiener GJ, Morgan TM, Copper JB, et al. Ambulatory 24-houresophageal pH monitoring. Reproducibility and variability of pHparameters. Dig Dis Sci 1988;33:1127-33.

[8] Pandolfino JE, Richter JE, Ours T, et al. Ambulatory esophageal pHmonitoring using a wireless system. Am J Gastroenterol 2003;98:740-9.

[9] Ismail-Beigi F, Horton PF, Pope II CE. Histological consequences ofgastroesophageal reflux in man. Gastroenterology 1970;58:163-74.

[10] Dent J, Brun J, Fendrick AM, et al. An evidence-based appraisal ofreflux disease management—the Genval Workshop Report. Gut 1999;44(Suppl 2):S1-16.

[11] Dent J. Microscopic esophageal mucosal injury in nonerosive refluxdisease. Clin Gastroenterol Hepatol 2007;5:4-16 e1.

[12] Bove M, Vieth M, Dombrowski F, et al. Acid challenge to the humanoesophageal mucosa: effects on epithelial architecture in health anddisease. Dig Dis Sci 2005;50:1488-96.

[13] Calabrese C, Bortolotti M, Fabbri A, et al. Reversibility of GERDultrastructural alterations and relief of symptoms after omeprazoletreatment. Am J Gastroenterol 2005;100:537-42.

[14] Stolte M, Vieth M, Schmitz JM, et al. Effects of long-term treatmentwith proton pump inhibitors in gastro-oesophageal reflux disease onthe histological findings in the lower oesophagus. Scand J Gastro-enterol 2000;35:1125-30.

[15] Vieth M, Kulig M, Leodolter A, et al. Histological effects ofesomeprazole therapy on the squamous epithelium of the distaloesophagus. Aliment Pharmacol Ther 2006;23:313-9.

[16] Vieth M, Fiocca R, Haringsma J, et al. Radial distribution of dilatedintercellular spaces of the esophageal squamous epithelium in patientswith reflux disease exhibiting discrete endoscopic lesions. Dig Dis2004;22:208-12.

[17] Vieth M, Peitz U, Labenz J, et al. What parameters are relevant for thehistological diagnosis of gastroesophageal reflux disease withoutBarrett's mucosa. Dig Dis 2004;22:196-201.

231Esohisto project

[18] Lundell L, Attwood S, Ell C, et al. Comparing laparoscopic anti-refluxsurgery to esomeprazole in the management of patients with chronicgastro-oesophageal reflux disease: a 3-year interim analysis of theLOTUS trial. Gut 2008;57:1207-13.

[19] Haas M. The reliability of reliability. J Manipulative Physiol Ther1991;14:199-208.

[20] Edebo A, Vieth M, Tam W, et al. Radial and axial distributionhistological markers of epithelial damage in reflux disease. Is there aneffect of therapy and correlation to the localisation of mucosal breaks.Gut 2004;53:A117.

[21] Fletcher J, Wirz A, Henry E, et al. Studies of acid exposureimmediately above the gastro-oesophageal squamocolumnar junction:evidence of short segment reflux. Gut 2004;53:168-73.

[22] Cicchetti DV, Feinstein AR. High agreement but low kappa: II.Resolving the paradoxes. J Clin Epidemiol 1990;43:551-8.

[23] el-Zimaity HM, Graham DY, al-Assi MT, et al. Interobserver variationin the histopathological assessment of Helicobacter pylori gastritis.HUM PATHOL 1996;27:35-41.

[24] Offerhaus GJ, Price AB, Haot J, et al. Observer agreement on thegrading of gastric atrophy. Histopathology 1999;34:320-5.

[25] Gronbaek K, Christensen PB, Hamilton-Dutoit S, et al. Interobservervariation in interpretation of serial liver biopsies from patients withchronic hepatitis C. J Viral Hepat 2002;9:443-9.

[26] Guarner J, Herrera-Goepfert R, Mohar A, et al. Interobservervariability in application of the revised Sydney classification forgastritis. HUM PATHOL 1999;30:1431-4.

[27] Zentilin P, Savarino V, Mastracci L, et al. Reassessment of thediagnostic value of histology in patients with GORD, using multiplebiopsy sites and an appropriate control group. Am J Gastroenterol2005;100:2299-306.

[28] Montgomery E, Bronner MP, Goldblum JR, et al. Reproducibility ofthe diagnosis of dysplasia in Barrett esophagus: a reaffirmation.HUM PATHOL 2001;32:368-78.

[29] Rugge M, Correa P, Dixon MF, et al. Gastric mucosal atrophy:interobserver consistency using new criteria for classification andgrading. Aliment Pharmacol Ther 2002;16:1249-59.