Response and remission criteria in major depression – A validation of current practice

lable at ScienceDirect

Journal of Psychiatric Research 44 (2010) 1063e1068

Contents lists avai

Journal of Psychiatric Research

journal homepage: www.elsevier .com/locate/psychires

Response and remission criteria in major depression e A validation ofcurrent practiceq

Michael Riedel a, Hans-Jürgen Möller a, Michael Obermeier a, Rebecca Schennach-Wolff a, Michael Bauer b,Mazda Adli c, Klaus Kronmüller d, Thomas Nickel e, Peter Brieger f, Gerd Laux g, Wolfram Bender h,Isabella Heuser i, Joachim Zeiler j, Wolfgang Gaebel k, Florian Seemüller a,*aDepartment of Psychiatry and Psychotherapy, Ludwig-Maximilians-University Munich, Nussbaumstrasse 7, 80336 Munich, GermanybDepartment of Psychiatry and Psychotherapy, Carl Gustav Carus University Dresden, Technical University Dresden, Fetscherstr. 74, 01307 Dresden, GermanycDepartment of Psychiatry and Psychotherapy, Campus, Charité Mitte (CCM), Charitéplatz 1, 10117 Berlin, GermanydDepartment of Psychiatry and Psychotherapy, University of Heidelberg, Voßstr. 2, 69115 Heidelberg, GermanyeMax Planck Institute of Psychiatry, Kraepelinstr. 2-7, 80804 Munich, GermanyfDepartment of Psychiatry and Psychotherapy, Martin-Luther University Halle-Wittenberg, Julius-Kühn-Str.7, 06097 Halle, GermanygDepartment of Psychiatry and Psychotherapy, Inn-Salzach-Klinikum. Garbersee 7, 83512 Wasserburg, GermanyhDepartment of Psychiatry and Psychotherapy, Isar-Amper-Klinikum Munich East, Vockestr. 72, 85540 Haar, GermanyiDepartment of Psychiatry and Psychotherapy, Campus Charité Benjamin Franklin (CFB), Eschenallee 3, 14050 Berlin, GermanyjDepartment of Psychiatry and Psychotherapy, Auguste-Viktoria-Krankenhaus, Rubensstr. 125, 12157 Berlin, GermanykDepartment of Psychiatry and Psychotherapy, University of Düsseldorf, Bergische Landstr. 2, 40629 Düsseldorf, Germany

a r t i c l e i n f o

Article history:Received 1 December 2009Received in revised form4 March 2010Accepted 16 March 2010

Keywords:RemissionResponseMajor depressionBipolarCut-offInpatientsHAMD-17HMD-21MADRSCGICGIBDIBDI

q The study was performed within the framewoNetwork on Depression, which was funded by theEducation and Research BMBF (01GI0219). The BMBFdesign; in the collection, analysis and interpretationreport; and in the decision to submit the paper for p* Corresponding author. Tel.: þ49 89 5160 5846; fa

E-mail address: [email protected]

0022-3956/$ e see front matter � 2010 Elsevier Ltd.doi:10.1016/j.jpsychires.2010.03.006

a b s t r a c t

Remission and response were suggested as the most relevant outcome criteria for the treatment ofdepression. There is still marked uncertainty as to what cut-offs should be used on current depressionrating scales. The goal of the present study was to compare the validity of different HAMD, MADRS andBDI cut-offs for response and remission.

The naturalistic prospective study was performed in 12 psychiatric hospitals in Germany. All evaluablepatients (n¼ 846) were hospitalized and had to meet DSM-IV criteria for major depressive disorder.Biweekly ratings were assessed using HAMD-21, MADRS and BDI. A CGI-S score of 1 and a CGI-I score ofat least 2 was used as the primary comparative measure of remission and response, respectively.

A HAMD-21 cut-off� 7 (AUC: 0.92), HAMD-17 cut-of� 6 (AUC: 0.90), MADRS cut-off� 7 (AUC: 0.94)and BDI cut-off� 12 (AUC: 0.83) were associated with a maximum of specificity and sensitivity fordefining remission.

A minimum decrease of 47% of the HAMD-21 (AUC: 0.90), �57% for HAMD-17 (AUC: 0.89), � 46% forMADRS (0.91) and a decrease of 47% for the BDI baseline score (AUC: 0.78) best corresponded CGIresponse criteria.

Our data largely confirmed currently used remission and response criteria in naturalistically treatedpatients.

� 2010 Elsevier Ltd. All rights reserved.

rk of the German ResearchGerman Federal Ministry forhad no further role in studyof data; in the writing of theublication.x: þ49 89 5160 5774.chen.de (F. Seemüller).

All rights reserved.

1. Introduction

Leading international drug authorities like the FDA and theEMEA require significant drug placebo differences in the primaryendpoint of at least two randomized placebo-controlled trialsbefore allowing a new drug to be sold on the market. Also, the useof standardized rating scales is strongly recommended and regu-lated by the FDAwithin the Good Clinical Practise Guidelines (GCP).The most widely used and thus the gold standard for the assess-ment of depressive symptoms is the Hamilton depression rating

mailto:[email protected]

www.sciencedirect.com/science/journal/00223956

http://www.elsevier.com/locate/psychires

M. Riedel et al. / Journal of Psychiatric Research 44 (2010) 1063e10681064

scale (Hamilton, 1967). It was primarily developed for inpatientswith major depression, who tend to present with melancholicfeatures, but has been extensively used in outpatient studies aswell. Unfortunately, in the meantime there are many differentversions (HAMD-24, -27 and -29) of the HAMD, the two mostwidely used being the HAMD-21-item and the 17-item versions asoriginally recommended by Hamilton (Hamilton, 1967).

Its main rival is the MADRS (Montgomery and Asberg, 1979)which may possess an even better sensitivity for detectingsymptom change. With respect to self- ratings, the BDI has ach-ieved wide acceptance (Schwab et al., 1967).

The most commonly used analytical method is to comparebaseline and endpoint mean scores, a procedure which lacks theinformation of clinical significance and the information of theillness course. One established approach is to define categories ofclinical significance. In depression research the most widely usedare response and remission. Unfortunately there are manydifferent, divergent and contradictory definitions still in circulation,which often do not differentiate between different rating scaleversions, thus hindering comparisons across studies. For theHAMD, for example, the definition of remission varies betweenthresholds <6 and �10 for the HAMD-21 and HAMD-17-item scale(Nierenberg and DeCecco, 2001). For response, definitions usuallyinclude a percental difference from the respective initial baselinescore starting from � 25% and going up to � 50%, but sometimesthey also include a numerical threshold such as a HAMD< 10 (Risoet al., 1997). So the bizarre situation emerges that one investigator’sremission is another investigator’s response. Compounding theproblem, major depression itself is a very inhomogeneous illnesswith many subtypes and a highly heterogeneous illness course. Soin the end the results of different depression trials are hardlycomparable. Establishing and evaluating distinct study endpointswould eliminate one factor contributing to the enormous outcomevariance of antidepressant treatment trials. The task force of theMack Arthur Foundation Research Network on the Psychobiology ofDepression tried to find consistent criteria for remission, response,recovery and relapse in major depression (Frank et al., 1991; Prienet al., 1991). Even for the most frequently used terms “response,remission, recovery and relapse” no acceptable operationallydefined criteria that could be used in research were found (Prienet al., 1991).

The CGI (Clinical Global Impression) has two basic scalescovering disease severity (the CGI-Severity or CGI-S) and treatment-induced disease improvement (the CGI-Improvement or CGI-I).In contrast to psychopathological rating scales, it consists solely ofone single item covering overall illness severity and improvementon a seven-point Likert scale. The CGI-S requires the clinician to ratethe overall severity of the patient’s illness at the assessment time inrelation to the clinician’s past experience with patients having thesame diagnosis on a scale between 1 and 7 (ranging from1¼ “normal, not ill” to 7¼ “extremely ill”). The CGI-I captures theoverall improvement relative to baseline ranging from (1¼ “verymuch improved” to 7¼ “very much worse”). This scale might thuscapture illness severity and therapeutic improvement froma different one-dimensional global perspective, compared todifferentiated psychopathological rating scales. Furthermore, theCGI has been shown to be a reliable and valid measure of diseaseseverity and to be sensitive to change (Guy, 1976). The CGI has themain advantage that outcome constructs like remission andresponse are very easily transferable to e.g. a CGI-S score of 1 andCGI-I score� 2 (at least “much improved”), whereas HAMD,MADRS and BDI do not have predefined thresholds for response orremission (Bandelow et al., 2006).

In linewith a recent suggestion made by Berk and colleagues wetherefore chose to use the CGI as a validation criterion, analyzing

data from a large naturalistic trial on 843 inpatients with majordepressive episode who were assessed every second week untildischarge. For evaluation of valid cut-offs for response and remis-sion Berk and colleagues associated the corresponding mean valuesof the Young Mania Rating Scale (YMRS) and the MADRS in bipolarpatients with a CGI-Severity of 1 for remission and a CGI-Improvement of at least two for response. We aimed to empiricallyverify remission and response criteria within a sample of depressedinpatients for the HAMD-21, HAMD-17, MADRS and BDI against theCGI using the same thresholds (Berk et al., 2008) computingreceiver operating curves analysis (ROC) and applying bootstraptechniques.

2. Method

2.1. Sample and data collection

The main objective and details of the study protocol aredescribed in detail elsewhere (Seemuller et al., 2010). In brief, datafrom a large prospective, naturalistic, multicenter study (N¼ 1014)were analyzed. The study was part of the German researchnetwork, funded by the German Federal Ministry of Education andResearch (BMBF). Subjects were recruited from seven Germanpsychiatric university or research hospitals (two in Munich, two inBerlin, Tübingen, Düsseldorf, Halle) and five psychiatric districthospitals (Munich, Gabersee, and three in Berlin).

Inclusion criteria were age between 18 and 65 and signedwritten informed consent. Patients had to meet ICD-10 diagnosticcriteria for any major depressive episode (ICD-10: F31.3xe5x, F32,F33, F34, F38, F39) or for a depressive disorder not otherwisespecified (ICD-10). Moreover, for confirmation of the diagnose ofa depressive spectrum disorder according to DSM-IV as well as forthe detection of relevant axis I and axis II comorbidities, theStructured Clinical Interview for DSM-IV (SCID-I and SCID-II) wasused (Wittchen et al., 1997).

Psychopathological symptoms were assessed using the Hamil-ton Depression Rating Scale (HAMD-21) (Baumann, 1976). ItsGerman 17-item version has shown good reliability with a Corn-bachs a ranging from 0.72e0.83 (Baumann, 1976; Maier et al.,1985).

The German translation of the Montgomery Asberg DepressionRating Scale (MADRS) (Schmidtke et al., 1988) has been shown tohave a high internal consistency (Cornbachs a¼ 0.86) and a highsensitivity for change (Schmidtke et al., 1988). Its validity has beendemonstrated by moderate to good correlations with the 17-itemGerman version of the HAMD ranging from 0.51 to 0.89 (Schmidtkeet al., 1988).

The German version of the self-rated Becks Depression Inven-tory (BDI) (Hautzinger, 1991) has a similar internal consistency(Cornbachs a¼ 0.86), good correlations with the self-rated ZungsDepression scale andmoderate to poor correlations with the HAMD(Pearson correlation¼ 0.37) (Hautzinger, 1991).

Ratings were assessed by clinicians who had undergonea minimum of four years’ clinical training in psychiatry. All ratingsfor each patient were assessed by the same clinician. Patients wererated according to the protocol at baseline and every two weeksuntil discharge. Patients were included in the analysis if at least twoassessments were available.

2.2. Treatment

Patients were treated at the discretion of the psychiatrist incharge under consideration of the international clinical guidelinesfor the treatment of depression (APA, WSFBP) (Bauer et al., 2007;American Psychiatric Association, 2000; Deutsche Gesellschaft für

Table 1Mean scores and mean baseline to endpoint change of the respective depressionrating scales across all visits.

Mean (�SD) Change (�SD)

HAMD-21 14.06 (�8.13) 42% (�35)HAMD-17 12.53 (�7.42) 42% (�36)MADRS 17.68 (�10.54) 41% (�36)BDI 16.68 (�11.45) 34% (�47)

Table 3Positive predictive value (PPV), negative predictive value (NPV), sensitivity, speci-ficity, area under the curve (AUC) and 95% bootstrap confidence intervals for ROCanalysis of the optimal cut-off values approximating best remission defined as CGI-Severity of 1.

NPV PPV Sensitivity Specificity AUC 95% CI Cut-off

HAMD-21 1 0.14 0.92 0.78 0.92 [7,9] �7HAMD-17 1 0.14 0.90 0.78 0.92 [6,8] �6MADRS 1 0.17 0.94 0.82 0.94 [6,11] �7BDI 0.99 0.10 0.90 0.64 0.83 [8,12] �11

Table 4Positive predictive value (PPV), negative predictive value (NPV), sensitivity, speci-

M. Riedel et al. / Journal of Psychiatric Research 44 (2010) 1063e1068 1065

Psychiatrie, 2000). In addition, the medication class, their activecompounds, the dosage and the treatment durationwere recorded.Furthermore, the duration and type of other biological treatmentslike ECT, sleep deprivation, TMS and psychotherapy were carefullyrecorded.

2.3. Statistical analyses

Patients were included in the analysis if in addition to thebaseline assessment at least one post-baseline assessment wasavailable. Patients with only one baseline assessment wereexcluded. In addition, patients experiencing manic symptoms(defined as YMRS> 7) were also excluded from analysis.

Response was defined as having at least a CGI-I of 2 (¼muchimproved). Remission was defined as having an overall CGI-S of 1(¼normal or only minimal symptoms). Sensitivity, specificity,negative predictive value (NPV) and positive predictive value (PPV)for response and remission were calculated for each scale usingreceiver operating curves (ROC). The optimal cut-off values werecalculated with the unweighted addition of sensitivity andspecificity.

Confidence intervals were calculated covering the optimal cut-off values using non-parametric bootstrap samples. Therefore thecut-offs’ variances are estimated using 10,000 resamples of theoriginal data by random sampling with replacement.

For further validation the interrelationship between remissionand response definitions on different scales were calculated usingF-coefficients (Cramer’s V).

All statistical analyses were performed using the statisticalsoftware package R 2.9.1.

3. Results

From the original data set of 1014 patients, 1014 patients hadcomplete HAMD ratings, 969 complete BDI ratings and 919complete MADRS ratings. Thus for final analysis data on 3690visits of 846 patients with all three scales were available foranalysis. Patients were aged 45.5�11.9 with 62.2% of the samplecomprising female patients. Total mean score of all scales evalu-ated across all visits with standard deviations are shown in Table 1.The distribution of all CGI-S and CGI-I values across all visits isshown in Table 2.

3.1. HAMD-21

In Table 3 the different cut-offs with the respective values ofsensitivity and specificity as well as positive predictive value andnegative predictive value for HAMD-21 corresponding to a CGI-S of1 are shown. For the 21-item version of the HAMD an optimal cut-

Table 2Number of visits and corresponding CGI-Improvement and CGI-Severity scoresacross all visits.

1 2 3 4 5 6 7

CGI-S 127 430 591 832 962 286 54CGI-I 472 1126 967 523 140 38 1

off of 8 emerged, meaning that a HAMD-21 score of�7 correspondswith a sensitivity of 92% and a specificity of 78% (NPV¼ 100% andPPV¼ 14%) to an overall CGI-S of one. AUC revealed excellentpredictability with an AUC value of 0.92 (Table 3).

Response corresponding to a CGI-I score of 2 (much improved)was best reached with a cut-off �47% on the HAMD-21 scale witha sensitivity of 83% and a specificity of 82% (NPV¼ 83%, PPV¼ 82%)(Table 4). AUC revealed excellent predictability for this responsecriterion with an AUC value of 0.90 (Table 4).

3.2. HAMD-17

For the 17-item version of the HAMD a cut-off of �6 emerged(Table 3). Sensitivity and specificity were 90% and 78% respectively(NPV¼ 100%, PPV¼ 14%). AUC revealed favorable predictability ofthe CGI with an AUC value of 0.92 (Fig. 1 and Table 3).

Response was best reached with a cut-off of a reductionof� 47%. Sensitivity and specificity were 82% and 82% (NPV¼ 83%and PPV¼ 81%) (Table 4). The AUC value was again excellent at 0.89(Fig. 2 and Table 4).

3.3. MADRS

A MADRS score �7 best corresponded to the respective CGIdefinition of remission with a sensitivity of 94% and a specificity of82% (NPV¼ 100%, PPV¼ 17%, AUC¼ 0.94) (Table 3), and a change of�46% on the MADRS corresponded best with a sensitivity of 85%and a specificity of 80% to the proposed response criterion(NPV¼ 82%, PPV 84%, AUC¼ 0.91) (Table 4).

3.4. BDI

On the BDI self-rating scale a cut-off of BDI� 11 emerged forremission with a sensitivity of 90% and a specificity of 64%(NPV¼ 99%, PPV¼ 10%, AUC¼ 0.83) (Table 3).

Response best corresponded to a BDI change of �47% showinga sensitivity of 80% and a specificity of 67% (NPV¼ 69%, PPV¼ 78%,AUC¼ 0.78) (Table 4).

3.5. Interrelation between remission and response definitions ofdifferent scales

Table 5 lists F-coefficients for relations between the differentdefinitions of remission on the respective scale. Table 6 lists

ficity and 95% bootstrap confidence intervals for ROC analysis of the optimal cut-offvalues approximating best response defined as CGI-Improvement of at least 2.

NPV PPV Sensitivity Specificity AUC 95% CI Cut-off

HAMD-21 0.83 0.82 0.83 0.82 0.90 [0.43,0.50] �47%HAMD-17 0.83 0.81 0.82 0.82 0.89 [0.43,0.49] �47%MADRS 0.82 0.84 0.85 0.80 0.91 [0.38,0.50] �46%BDI 0.69 0.78 0.80 0.67 0.78 [0.35,0.50] �47%

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1-Specificity

ytivitisneS

Area under the curve: 0.916

Sens: 78.2%Spec: 90.2%PV+: 99.5%PV-: 13.8%

Fig. 1. ROC curve for HAMD-17 remission.

Table 5F-coefficients for the interrelation of the optimum cut-ff values for remission ondifferent scales.

HAMD-17 BDI MADRS

HAMD-21 0.85 0.46 0.71HAMD-17 0.44 0.69BDI 0.47


F-coefficients for relations between the different definitions ofresponse on the respective scale. A F coefficient measures therelation between two dichotomous variables and can rangebetween 0 (no relation) and 1 (exact relation). All relations in Tables5 and 6 were highly significant (p< 0.001). Notably better relationscould be demonstrated between clinician rating scales, rangingfrom 0.69 to 0.71, than between a clinician rating scale and the BDI(range 0.47e0.48).

3.6. Bipolar vs. unipolar depression

Since bipolar depression was not an exclusion criterion thesample included a small proportion of bipolar subjects (N¼ 60; 7%).

There was no difference in baseline severity between the twosamples, nor in the relative change from baseline to endpoint in anyof the scales (Table 7). The cut-off values for remission and responsecomparing bipolar and unipolar subjects can be found in Tables 8

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1-Specificity

ytivitisneS

Area under the curve: 0.891

Sens: 82.1%Spec: 81.8%PV+: 81.0%PV-: 82.8%

Fig. 2. ROC curve for HAMD-17 response.

and 9, respectively. Cut-offs and corresponding 95% confidenceintervals overlap broadly, suggesting no significant differencebetween the two groups concerning remission or response cut-offs.

4. Discussion

Berk and co-workers recently used a CGI-BP of 1 to empiricallydefine remission on the Young Mania Rating Scale (YMRS) and theMADRS in patients with bipolar disorder. We adopted this methodand assigned it to the most often used two versions of the HAMD,the MADRS and the BDI in unipolar depression (Berk et al., 2008).Thus for the first time the remission and response cut-offs of themost often used scales could be evaluated at the same time ina representative naturalistic sample.

Conceptually, the task force led by Allan Frank defined responseas the time point at which “a period of sufficient improvement isobserved so that the individual is not fully symptomatic.” On theother hand, remission was conceived as “brief period during whichan improvement of sufficient magnitude is observed that theindividual is asymptomatic.” Therefore response is not remissionand should be clearly distinguished. Both categories represent verydifferent concepts. Response might resemble the initiation ofa healing process induced by antidepressant therapy and usuallyprecedes remission by a mean time of 10 to 14 days (Trivedi et al.,2001). According to this analysis response was best reached witha minimum reduction of 47% (HAMD-21), 47% (HAMD-17), 46%(MADRS) and 47% (BDI) for the respective scales. Other definitionsinclude HAMD-17 score� 6e15 or aminimum reduction of 20e75%from the initial baseline score. In total about 46% of all clinicaldepression trials use cut-off scores on a depression rating scale and63% percent use percentage changes from baseline to endpoint(Prien et al., 1991). In most recent years a 50% reduction frombaseline to endpoint in a depression rating scale has achievedacceptance, although several authors claim that it remains unclearas to how this value became the standard (Nierenberg and DeCecco,2001; Bandelow et al., 2006). Up to now this definition has rarelybeen empirically evaluated (Frank et al., 1991; Keller, 2003). Ourresults now confirm the 50% limit for response for HAMD, MADRSand BDI, which has established itself as the gold standard. Althoughthe percentages for HAMD-21 and BDI differ by 2e4 percent pointsfrom the 50% limit, we recommend adhering to the conventionaldefinition of 50% for practical reasons and better comparability.

The situation for remission is probably more complex. Since theearly 1990s remission has becomewidely recognized as the optimaloutcome not only for depression but also for other mood disorders(Keller, 2003;Moller, 2008). Definitions for HAMD-21 vary betweenless than 7, and less than 10 for HAMD-17, between less than 6 or 12

Table 6F-coefficients for the interrelation of the optimum cut-ff values for response ondifferent scales.

HAMD-17 BDI MADRS

HAMD-21 0.89 0.48 0.73HAMD-17 0.47 0.72BDI 0.47

Table 7Comparison of bipolar (N¼ 60) and unipolar subjects (N¼ 786).

Bipolar Unipolar Test p-Value

Mean (SD) Mean (SD)

Male/Female 27/33 293/493 Fisher 0.2693With job/without

job29/23 434/294 Fisher 0.6615

Age 45.27 (10.95) 45.41 (11.98) t-test 0.9242

HAMD-21 Baseline 23.17 (7.74) 24.32 (6.75) t-test 0.2665HAMD-21 Change �60.07 (33.44) �60.49 (30.88) t-test 0.9254HAMD-17 Baseline 20.75 (7.13) 21.87 (6.16) t-test 0.239HAMD-17 Change �64.36 (34.55) �67.33 (29.22) t-test 0.5233MADRS Baseline 30.02 (9.11) 29.54 (7.74) t-test 0.6938MADRS Change �59.49 (35.24) �61.47 (31.86) t-test 0.6768BDI Baseline 26.14 (12.61) 24.98 (10.74) t-test 0.5063BDI Change �53.04 (43.09) �46.83 (45.27) t-test 0.3054

Median (IQR) Median (IQR)No. of prev.

hospitalizations3 (5) 1 (2) Wilcoxon 0

CGI-S Baseline 5 (1) 5 (1) Wilcoxon 0.7378CGI-I Discharge 2 (1) 2 (1) Wilcoxon 0.9954

Table 9Response cut-offs for unipolar and bipolar subjects.

Bipolar Unipolar

Cut-off 95% CI Cut-off 95% CI

HAMD-21 0.43 [0.32;0.52] 0.47 [0.43;0.49]HAMD-17 0.47 [0.31;0.5] 0.47 [0.42;0.51]MADRS 0.4 [0.37;0.45] 0.46 [0.36;0.49]BDI 0.53 [0.29;0.65] 0.45 [0.34;0.48]

M. Riedel et al. / Journal of Psychiatric Research 44 (2010) 1063e1068 1067

for the MADRS, between 8 or less and 15, and for the BDI between 9and 13 (Keller, 2003). Although our results clearly lie within therespective ranges, they are not in full agreement with the currentlymost frequently used criteria for remission (HAMD-17� 7;MADRS� 10; BDI� 8) (Nierenberg and DeCecco, 2001; Keller,2003). The cut-offs of the present analysis revealed a HAMD-21score of� 7, a HAMD-17 of� 6, a MADRS score of� 7, and a BDIscore of �11 as best corresponding to a CGI-S of 1. The corre-sponding AUC values of the ROC analysis of 0.92, 0.92, 0.94 and 0.83showed excellent discriminative capacity. In the context of clinicaltrials a one-point difference can account for reaching or notreaching a significant drug-placebo difference. Given the differentcut-off values we found different samples to be remitter or not. Forexample, with conservative cut-offs (HAMD-17� 7; MADRS� 10;BDI� 8) we found 64.89% HAMD-17, 53.19% MADRS and 43.25%remitters in contrast to 58.04%, 40.64% and 54.14% remitters,respectively, with current cut-offs (HAMD-17�6, MADRS� 7,BDI� 11). Furthermore, one, two or three points less or moremightbe connected with the absence or presence of specific residualsymptoms which should be on a minimum level at study endpoint(Nierenberg et al., 1999, 2009). However, since differences betweenestablished criteria only diverge by 1 to 3 points at maximum andalso for reasons of comparability, we so far recommend using thecurrent widely accepted cut-off values for the HAMD-17, theMADRS and the BDI.

In order to retest the validity of the newly found best-fitting cut-off values, we also looked into the intercorrelation between HAMD,MADRS and BDI remitters independently of the CGI. We thereforeused F-coefficients to estimate the interrelation between remitterson different scales. Results revealed good accordance betweenremission criteria on both HAMD versions and the MADRS rangingfrom 0.69 to 0.71. By contrast, we found only moderate associationsbetween the BDI and HAMD and the BDI andMADRS scales rangingfrom 0.44 to 0.47 (Table 5). This may well be due to principle

Table 8Remission cut-offs for unipolar and bipolar subjects.

Bipolar Unipolar

Cut-off 95% CI Cut-off 95% CI

HAMD-21 <9 [5;9] <7 [6;8]HAMD-17 <7 [5;7] <6 [4;7]MADRS <6 [4;6] <7 [5;10]BDI <7 [2;14] <11 [7;11]

differences between self- and observer-ratings. Results from factoranalysis suggest that self-ratings generally seem to resemble more“a subjective state” rather than being able to give detailed infor-mation regarding psychopathology (Moller and von Zerssen, 1995;Moller, 2000, 2008). In addition, the accuracy of self-ratings is,amongst others, strongly influenced by the overall illness severitywhich in sum argues against its sole use for assessment of symptomseverity in categories like remission or response.

Since the study sample also included a small number of bipolarsubjects we also explored possible differences in the respective cut-off values. As Berk and co-workers found a comparably lowthreshold for remission on the MDRS, we expected a similar resultin the bipolar sub-sample. And indeed, the cut-off below 6 (Table 8)was close to Berk’s MADRS remission cut-off in bipolar depressedsubjects of below 5 (Berk et al., 2008). However, we did not find anysignificant differences between unipolar and bipolar subjects.

An important limitation of the present analysis is that we useda psychometric definition and relied solely on the CGI as the goldstandard for response and remission and defined remissionpsychometrically. One possibility e.g. would have been to includefunctional outcome variables which might have resulted in evenstricter cut-off values. However, a CGI-S of 1 for remission anda CGI-I of 2 or 1 for response has consistently been used forresponse and remission in antidepressant trials since the intro-duction of the CGI in the mid-seventies (Guy, 1976; Keller, 2003). Inaddition, the CGI is known to have a high “face validity”, sincemanyclinicians agree on the CGI criteria which apparently correspond tothe language clinicians use in their daily routine when talkingabout efficacy of a treatment (Bandelow et al., 2006).

Closely connected with this limitation is the fact that each CGI,HAMD and MADRS rating was performed by the identical clinician.This might have contributed to the high AUC values found in ourROC analysis ranging up to 0.94. An independent assessment of theCGI and the other rating scales by two separate interviewers wouldhavemade sure that one rating could not have influenced the other.On the other hand, a separate rating of CGI and observer scaleswould have also increased the overall variance in all ratings, thusincreasing the “background noise” hindering association analyses.A third important limitation is that the specific population ofGerman inpatients, albeit very liberal inclusion and exclusioncriteria, does not permit a too-broad generalizability of our cut-offsto other populations like outpatients, the elderly and adolescents. Itis especially limited to patients with a primary diagnosis of majordepressive episode. It might be that in different populations anddiagnoses (e.g. schizoaffective disorder) different outcome thresh-olds correspond to response and remission. However, the mostimportant limitation of this report is probably the lack of anycontrol group. Thus we were also not able to demonstrate theability of outcome criteria to discriminate between an active drugand placebo.

To conclude, besides mean value courses, clinical depressiontrials should use outcome measures on different levels includingresponder and remitter analysis. In the present study we couldempirically replicate and confirm current definitions for responseas 50% reduction of the respective scales. For remission somewhat


lower cut-offs emerged on observer scales with a HAMD-21 cut-offscore of �7, a HAMD-17 of �6, a MADRS score of �7, and a highercut-off on the self-rating scale with a BDI score of �11. However,since differences to cut-offs currently in use (HAMD-17� 7;MADRS� 10; BDI� 8) were small we recommend adhering to thetraditional established criteria for reasons of comparability.

Thus other researchers should join the reevaluation process ofcurrent definitions to reassure optimal validity of the phenomenawe want to investigate. In this context, long-term outcome withemphasis on relapse rates and functional outcome should also findconsideration in the concept of remission.

Role of Funding Source

The study was performed within the framework of the GermanResearch Network on Depression, which was funded by theGerman Federal Ministry for Education and Research BMBF(01GI0219). The BMBF had no further role in study design; in thecollection, analysis and interpretation of data; in the writing of thereport; and in the decision to submit the paper for publication.

Conflicts of interest

All other authors declare that they have no conflicts of interest.

Contributors

In detail Professor Hans-JürgenMöller designed the study wrotethe protocol together with Christoph Mundt, Florian Holsboer,Peter Brieger, Gerd Laux, Wolfram Bender, Mazda Adli, IsabellaHeuser, Joachim Zeiler, Wolfgang Gaebel. Authors Klaus Kronmül-ler, Michael Bauer, Thomas Nickel, Peter Brieger, Gerd Laux,Wolfram Bender, Mazda Adli, Isabella Heuser, Joachim Zeiler,Wolfgang Gaebel and provided also the infrastructure for recruit-ment of patients. Authors Hans-Jürgen Möller, Michael Riedelcarefully supervised and corrected the manuscript drafts. Rebecca-Schennach Wolff recruited patients at the Munich site, maintainedand supervised the electronic database. Authors Hans-JürgenMöller and Michael Riedel managed the literature searches andanalyses. Michael Obermeier undertook the statistical analysis, andauthors Michael Riedel and Florian Seemüller wrote all drafts of themanuscript. All Authors have seen and worked on the final draft ofthe manuscript, which has been approved by all authors beforesubmission.

Acknowledgement

The network study was conducted in 12 psychiatric hospitals:Berlin Charite Campus Mitte (Andreas Heinz, Mazda Adli, KatjaWiethoff), Berlin Charité Campus Benjamin Franklin (IsabellaHeuser, Gerd Bischof), Berlin Auguste Viktoria Klinik (JoachimZeiler, Robert Fisher, Cornelia Fähser), Berlin St. Hedwig (FlorianStandfest), Berlin St. Joseph (Dorothea Schloth), Düsseldorf (Wolf-gang Gaebel, Joachim Cordes, Arian Mobascher), Gabersee (GerdLaux, Sissi Artmann), Haar (Wolfram Bender, Nicole Theyson), Halle(Andreas Marneros, Dörthe Strube, Yvonne Reinelt, Peter Brieger),Heidelberg (Christoph Mundt, Klaus Kronmüller, Daniela Victor),München LMU (Hans-Jürgen Möller, Ulrich Hegerl, Roland Mergel,Michael Riedel, Florian Seemüller, Florian Wickelmaier, MarkusJäger, Thomas Baghai, Ingrid Borski, Constanze Schorr, RolandBottlender), München MPI (Florian Holsboer, Matthias Majer,Marcus Ising, Thomas Nickel). We would like to thank ThelmaCoutts for her native speaker language revision of the manuscript.

References

American Psychiatric Association. Practice guideline for the treatment of patientswith major depressive disorder (revision). American Journal of Psychiatry2000;157:1e45.

Bandelow B, Baldwin DS, Dolberg OT, Andersen HF, Stein DJ. What is the thresholdfor symptomatic response and remission for major depressive disorder, panicdisorder, social anxiety disorder, and generalized anxiety disorder? Journal ofClinical Psychiatry 2006;67:1428e34.

Bauer M, Bschor T, Pfennig A, Whybrow PC, Angst J, Versiani M, Moller HJ. WorldFederation of Societies of Biological Psychiatry (WFSBP) Guidelines for Biolog-ical Treatment of Unipolar Depressive Disorders in Primary Care. World Journalof Biological Psychiatry 2007;8:67e104.

BaumannU.Methodologic studiesof theHamilton rating scale fordepression (author’stransl). Archiv für Psychiatrie und Nervenkrankheiten 1976;222:359e75.

Berk M, Ng F, WangWV, Calabrese JR, Mitchell PB, Malhi GS, Tohen M. The empiricalredefinition of the psychometric criteria for remission in bipolar disorder.Journal of Affective Disorders 2008;106:153e8.

Deutsche Gesellschaft für Psychiatrie PuN. Praxisleitlinien in Psychiatrie undPsychotherapie, 5: Behandlungsleitlinien affektive Erkrankungen [Practiceguideline for the treatment of affective disorders]. Darmstadt: Steinkopff;2000.

Frank E, Prien RF, Jarrett RB, Keller MB, Kupfer DJ, Lavori PW, Rush AJ,Weissman MM. Conceptualization and rationale for consensus definitions ofterms in major depressive disorder. Remission, recovery, relapse, and recur-rence. Archives of General Psychiatry 1991;48:851e5.

Guy W. Clinical global impressions. ECDEU assessment manual for psychophar-macology 1976; National Institute of Mental Health, Rockville, MD revised.

Hamilton M. Development of a rating scale for primary depressive illness. BritishJournal of Social and Clinical Psychology 1967;6:278e96.

Hautzinger M. The beck depression inventory in clinical practice. Nervenarzt1991;62:689e96.

Keller MB. Past, present, and future directions for defining optimal treatmentoutcome in depression: remission and beyond. Journal of the American MedicalAssociation 2003;289:3152e60.

Maier W, Philipp M, Gerken A. Dimensions of the Hamilton depression scale. Factoranalysis studies. European Archives of Psychiatry and Clinical Neurosciences1985;234:417e22.

Moller HJ. Rating depressed patients: observer- vs self-assessment. EuropeanPsychiatry 2000;15:160e72.

Moller HJ. Standardised rating scales in Psychiatry: methodological basis, theirpossibilities and limitations and descriptions of important rating scales. WorldJournal of Biological Psychiatry 2008:1e21.

Moller HJ, von Zerssen D. Self-rating procedures in the evaluation of antidepres-sants. Psychopathology 1995;28:291e306.

Montgomery SA, Asberg M. A new depression scale designed to be sensitive tochange. British Journal of Psychiatry 1979;134:382e9.

Nierenberg AA, DeCecco LM. Definitions of antidepressant treatment response,remission, nonresponse, partial response, and other relevant outcomes:a focus on treatment-resistant depression. Journal of Clinical Psychiatry2001;62(Suppl 16):5e9.

Nierenberg AA, Husain MM, Trivedi MH, Fava M, Warden D, Wisniewski SR,Miyahara S, Rush AJ. Residual symptoms after remission of major depressivedisorder with citalopram and risk of relapse: a STAR*D report. PsychologicalMedicine 2009:1e10.

Nierenberg AA, Keefe BR, Leslie VC, Alpert JE, Pava JA, Worthington III JJ,Rosenbaum JF, Fava M. Residual symptoms in depressed patients whorespond acutely to fluoxetine. Journal of Clinical Psychiatry1999;60:221e5.

Prien RF, Carpenter LL, Kupfer DJ. The definition and operational criteria for treat-ment outcome of major depressive disorder. A review of the current researchliterature. Archives of General Psychiatry 1991;48:796e800.

Riso LP, Thase ME, Howland RH, Friedman ES, Simons AD, Tu XM. A prospective testof criteria for response, remission, relapse, recovery, and recurrence indepressed patients treated with cognitive behavior therapy. Journal of AffectiveDisorders 1997;43:131e42.

Schmidtke A, Fleckenstein P, Moises W, Beckmann H. Studies of the reli-ability and validity of the German version of the Montgomery-AsbergDepression Rating Scale (MADRS). Schweizer Archiv für Neurologie undPsychiatrie 1988;139:51e65.

Schwab J, Bialow M, Clemmons R, Martin P, Holzer C. The Beck depressioninventory with medical inpatients. Acta Psychiatrica Scandinavica1967;43:255e66.

Seemuller F, Riedel M, Obermeier M, Bauer M, Adli M, Kronmuller K, Holsboer F,Brieger P, Laux G, Bender W, Heuser I, Zeiler J, Gaebel W, Dichgans E,Bottlander R, Musil R, Moller HJ. Outcomes of 1014 naturalistically treatedinpatients with major depressive episode. European Neuro-psychopharmacology; 2010.

Trivedi MH, Rush AJ, Pan JY, Carmody TJ. Which depressed patients respond tonefazodone and when? Journal of Clinical Psychiatry 2001;62:158e63.

Wittchen HU, Wunderlich U, Gruschwitz S, Zaudig M. Strukturiertes KlinischesInterview für DSM-IV. Göttingen: Hogrefe; 1997.

Documents

Response and remission criteria in major depression – A validation of current practice