14
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/263512643 The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease ARTICLE in AGING NEUROPSYCHOLOGY AND COGNITION · JUNE 2014 Impact Factor: 1.07 · DOI: 10.1080/13825585.2014.932325 · Source: PubMed 4 AUTHORS, INCLUDING: Kasper Jørgensen Region Hovedstaden 14 PUBLICATIONS 71 CITATIONS SEE PROFILE Available from: Kasper Jørgensen Retrieved on: 01 September 2015

The Six-item Clock Drawing Test Jorgensen 2014

Embed Size (px)

DESCRIPTION

neuro

Citation preview

Page 1: The Six-item Clock Drawing Test Jorgensen 2014

Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/263512643

Thesix-itemClockDrawingTest–reliabilityandvalidityinmildAlzheimer’sdisease

ARTICLEinAGINGNEUROPSYCHOLOGYANDCOGNITION·JUNE2014

ImpactFactor:1.07·DOI:10.1080/13825585.2014.932325·Source:PubMed

4AUTHORS,INCLUDING:

KasperJørgensen

RegionHovedstaden

14PUBLICATIONS71CITATIONS

SEEPROFILE

Availablefrom:KasperJørgensen

Retrievedon:01September2015

Page 2: The Six-item Clock Drawing Test Jorgensen 2014

This article was downloaded by: [Copenhagen University Library]On: 02 July 2014, At: 06:30Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Aging, Neuropsychology, and Cognition:A Journal on Normal and DysfunctionalDevelopmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/nanc20

The six-item Clock Drawing Test– reliability and validity in mildAlzheimer’s diseaseKasper Jørgensena, Maria K. Kristensenb, Gunhild Waldemara &Asmus Vogelaa Department of Neurology, Danish Dementia Research Centre,Rigshospitalet, Copenhagen, Denmarkb Department of Mental Health, Odense University Clinic, Odense,DenmarkPublished online: 30 Jun 2014.

To cite this article: Kasper Jørgensen, Maria K. Kristensen, Gunhild Waldemar & Asmus Vogel(2014): The six-item Clock Drawing Test – reliability and validity in mild Alzheimer’s disease,Aging, Neuropsychology, and Cognition: A Journal on Normal and Dysfunctional Development, DOI:10.1080/13825585.2014.932325

To link to this article: http://dx.doi.org/10.1080/13825585.2014.932325

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,

Page 3: The Six-item Clock Drawing Test Jorgensen 2014

systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 4: The Six-item Clock Drawing Test Jorgensen 2014

The six-item Clock Drawing Test – reliability and validity in mildAlzheimer’s disease

Kasper Jørgensena*, Maria K. Kristensenb, Gunhild Waldemara and Asmus Vogela

aDepartment of Neurology, Danish Dementia Research Centre, Rigshospitalet, Copenhagen,Denmark; bDepartment of Mental Health, Odense University Clinic, Odense, Denmark

(Received 24 January 2014; accepted 4 June 2014)

This study presents a reliable, short and practical version of the Clock Drawing Test(CDT) for clinical use and examines its diagnostic accuracy in mild Alzheimer’sdisease versus elderly nonpatients. Clock drawings from 231 participants were scoredindependently by four clinical neuropsychologists blind to diagnostic classification.The interrater agreement of individual scoring criteria was analyzed and items withpoor or moderate reliability were excluded. The classification accuracy of the resultingscoring system – the six-item CDT – was examined. We explored the effect of furtherreducing the number of scoring items on classification accuracy and estimated classi-fication accuracy associated with performances deviating from the optimal cutoffscore. At a cutoff of 5/6, the six-item CDT had a sensitivity (SN) of 0.65 and aspecificity of 0.80. Stepwise removal of up to three items reduced SN slightly.Classification accuracy associated with a score of four or less out of six was very high.

Keywords: Clock Drawing Test; cognitive screening; dementia; dementia screening;Alzheimer’s disease

Since the initial study of the reliability and validity of the Clock Drawing Test (CDT) inan elderly sample (Shulman, Shedletsky, & Silver, 1986), a multiplicity of methods foradministration and scoring of the test have been developed. In an early review, Shulman(2000) listed 15 original CDT scoring systems while a review covering the years 1966–2008 identifies an additional seven scoring systems (Pinto & Peters, 2009). A recentreview covering the years 1983–2013 and focusing on dementia samples includes 19 CDTscoring systems noting that some second-generation scoring systems designed for mildcognitive impairment were not included (Mainland, Amodeo, & Shulman, 2013). Mostsystems are variations on a basic theme targeting various aspects of the clock face,numbers, hands, center, and error types. Although some scoring systems (Mendez, Ala,& Underwood, 1992; Shulman et al., 1986; Wolf-Klein, Silverstone, Levy, & Brod, 1989)appear to be widely used, no international consensus exists as to which system should bepreferred. According to Shulman (2000, p. 558), “the more complicated and lengthyscoring systems do not appear to add significant value to the psychometric properties orclinical utility.” A similar conclusion was reached in a recent review article (Mainlandet al., 2013). Some scoring systems are extraordinarily elaborated rendering them imprac-tical for clinical use although large bases of normative data have been collected(Freedman, Leach, Kaplan, Delis, & Shulman, 1994; Nyborn et al., 2013). Interraterreliability of the CDT is in most cases reported on scale level (summarizing all items ina given scoring system), whereas documentation regarding interrater reliability of

*Corresponding author. Email: [email protected]

Aging, Neuropsychology, and Cognition, 2014http://dx.doi.org/10.1080/13825585.2014.932325

© 2014 Taylor & Francis

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 5: The Six-item Clock Drawing Test Jorgensen 2014

individual scoring items is sparse. However, documentation regarding reliability on itemlevel is important as reliability is an essential prerequisite for validity.

The ability of the CDT to assist in identifying dementia is well documented (Mainlandet al., 2013; Pinto & Peters, 2009; Shulman, 2000; Strauss, Sherman, & Spreen, 2006).Generally, studies on patients with moderate and severe dementia report that the CDTworks well as a screening test, whereas studies using patients with mild or questionabledementia report varying levels of sensitivity (SN).

The overall purpose of the present study was to develop a reliable, short, and practicalversion of the CDT for clinical use. To obtain this, we wanted to (1) identify items withhigh interrater agreement, (2) examine the classification accuracy of the resulting scoringsystem in mild Alzheimer’s disease (AD) versus elderly nonpatients, (3) explore the effectof reducing the number of scoring items on the discriminatory power of the CDT, and (4)estimate the classification accuracy associated with performances deviating from theoptimal cutoff score.

Participants and methods

Subjects

Participants included mild AD patients and elderly nonpatients. Patients were consecu-tively recruited from the Copenhagen University Hospital Memory Clinic, an outpatientclinic based in neurology. The clinic offers diagnostic evaluation and treatment ofcognitive disorders and dementia and receives secondary and tertiary referrals fromgeneral practitioners and other hospitals. AD was diagnosed according to the NationalInstitute of Neurological and Communicative Disorders and the Alzheimer’s Disease andRelated Disorders Association criteria (Mckhann et al., 1984). The included patientsfulfilled criteria for probable AD. Routine diagnostic work-up included clinical history,neurological and physical examination, laboratory tests, structural neuroimaging, screen-ing for comorbid conditions, assessment of cognitive functions including the Mini-MentalState Examination (MMSE) and Addenbrooke’s Cognitive Examination (ACE)(Mathuranath, Nestor, Berrios, Rakowicz, & Hodges, 2000), and assessment of instru-mental activities of daily living (Pfeffer, Kurosaki, Harrah Jr., Chance, & Filos, 1982).The diagnostic work-up was supplemented with more specialized assessment methodswhen judged clinically relevant.

All patients had mild dementia (MMSE score ≥20), were 60 years or older, andrecruited from two consecutive, previously described samples. The first sample includedclock drawings from 41 consecutive AD patients and 45 nonpatients recruited by localnewspaper advertisement (Vogel, Gade, Stokholm, & Waldemar, 2005). Participants werepresented with a blank sheet of paper, a pencil, and the instruction “Draw a clock and putin all the numbers” and further instructed to set the hands to “10 minutes past 11.” Thesecond sample consisted of participants enrolled in a validation study of the Danishversion of the ACE (Stokholm, Vogel, Johannsen, & Waldemar, 2009). Clock drawingsfrom 95 consecutive memory clinic AD patients and 50 nonpatients recruited by localnewspaper advertisement were included from this sample. As part of the ACE, partici-pants were instructed to draw a clock and set the hands to “5 minutes to 2,” the remainderof the CDT procedure being the same as described earlier. In both samples, elderlynonpatients were recruited through local newspaper advertisements. The study adheredto the Declaration of Helsinki for research involving human subjects, and writteninformed consent was obtained from all participants. All nonpatients underwent a

2 K. Jørgensen et al.

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 6: The Six-item Clock Drawing Test Jorgensen 2014

comprehensive neuropsychological assessment. Exclusion criteria for all participants werea history of neurological or psychiatric disease (other than AD), alcohol consumptionabove recommended national levels, or use of medicine that could negatively affectcognitive functions. In total, 136 mild AD patients and 95 elderly nonpatients wereincluded in the present study.

Clock Drawing Test

Ideally, the choice of a CDT scoring system as the starting point for the present studyshould be based on recommendations from existing reviews, but apart from the verygeneral principle of “simpler is better,” it was impossible to identify one system as clearlysuperior to others. Instead, a CDT scoring system was developed based on a comprehen-sive 13-item system described by Lin et al. (2003), which in turn integrates salientelements from four previously described systems. The original validation of the Linet al. system found high interrater reliability on scale level and intermediate SN andspecificity (SP) in the classification of moderate AD versus no dementia.

We conducted a pilot study regarding the applicability of the Lin et al. scoringsystem resulting in exclusion of four items. Two items were excluded a priori: (1)“Are the numerals 12, 6, 3 and 9 placed first?” was impossible for the raters to scoreretrospectively and (2) “Are both hands represented as arrows?” was deemed irrele-vant as this was not implied in our instruction. Two more items were excluded afterpreliminary testing and blind scoring of a subsample: (1) “Are all numerals spacedequally and symmetrically?” and (2) “Does each quadrant include 3 numerals?”. Bothitems proved difficult to implement as the current study did not use a predrawn circle.Many clock drawings were markedly asymmetrical, and attempts to divide them intoquadrants produced arbitrary results. Preliminary analysis further revealed thattwo items (1) “Is the hour hand pointing to 10 o'clock?” and (2) “Is the minutehand pointing to numeral 2?” had poor interrater agreement due to the fact that someclock drawings either had hands of equal length or the assumed hour hand beinglonger than the assumed minute hand, creating uncertainty regarding which hand wasthe hour hand and vice versa. Consequently, these two items were modified to “Is onehand (regardless of length) pointing to the designated hour?” and “Is one hand(regardless of length) pointing to the designated minute?” Finally, one item fromthe Lin system, “Does the drawing resemble a clock?” was rephrased as “Do youthink this clock was drawn by a nonpatient?”. The resulting nine-item scoring systemis presented in Table 1. All clock drawings were scored independently and in randomorder by four experienced clinical neuropsychologists blind to classification ofparticipants.

Statistical analysis

Difference in the female/male ratio between the patient and nonpatient subsamples wasanalyzed using the chi-square test for independence. Difference in age between thesubsamples was explored using an independent sample t-test and difference in MMSEperformance was analyzed using the Mann–Whitney U-test. Interrater reliability forindividual scoring items was calculated using Cohen’s kappa measurement of agreement.A mean kappa value per item was calculated based on six pairwise comparisons madepossible by four raters. Based on mean kappa values, scoring items were rank-ordered anditems with poor or moderate reliability (kappa ≤0.75) were excluded from further analysis.

Aging, Neuropsychology, and Cognition 3

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 7: The Six-item Clock Drawing Test Jorgensen 2014

For the retained items (kappa >0.75), interrater reliability on scale level was determinedusing Pearson’s product–moment correlation of the participants’ total CDT scores by thefour raters.

Association between individual scoring items and diagnosis (mild AD vs. nonpa-tient) was analyzed using the chi-square test for independence. To avoid Type 1 errors,a Bonferroni adjustment of the level of significance was applied. The CDT’s perfor-mance as a screening tool for mild AD versus nonpatient status was analyzed bycalculating SN and SP and area under the receiver operating characteristic (ROC)curve (AUC) for all possible cutoff scores. Logistic regression analysis was conductedto predict diagnosis using the six scoring items and age as predictor variables (enteredsimultaneously) and diagnosis as dependent variable. To analyze the performance ofreduced scoring systems, scoring items were rank-ordered according to their contribu-tion to the logistic regression model and items with the smallest contribution wereremoved stepwise. SN, SP, and AUC were recalculated at each step for the remainingcombinations of items.

Classification accuracy associated with performances deviating from the optimalcutoff score was analyzed by calculating likelihood ratios, posttest odds, posttest prob-ability, and predictive validity for those scores where mild AD patients and nonpatientshad overlapping performance. The positive likelihood ratio (LR+) is the probability of apositive test result in mild AD patients/probability of a positive test result in nonpatients(SN/(1 − SP)), the negative likelihood ratio (LR−) is the probability of a negative testresult in mild AD patients/probability of a negative test result in nonpatients ((1 − SN/SP), pretest likelihood equals the base rate of mild AD in the present sample, pretestodds is pretest likelihood/(1 − pretest likelihood), posttest odds is pretest odds × LR+,posttest probability is posttest odds/(1 + posttest odds), positive predictive power (PPV)is true positives/(true positives + false negatives), and negative predictive power (NPV)is true negatives/(true negatives + false positives) (Sackett, 1992). Statistical analyseswere performed using SPSS 19 for Windows with a two-tailed level of significance setat 0.05.

Table 1. Scoring items rank ordered according to interrater agreement.

Rating: if the answer is yes: 1 point; if no: 0 pointMeankappa

(1) Are the numerals inside the circle? 0.96(2) Is one hand (regardless of length) pointing to the designated minute? 0.92(3) Is 12 placed correctly (approximately at the top position of the clock)? 0.90(4) Are both hands present? 0.88(5) Is the sequence 1–12 correct and complete? 0.88(6) Is one hand (regardless of length) pointing to the designated hour? 0.83(7) Is the hour hand shorter than the minute hand? 0.65(8) Score “0” if (1) the sequence is counterclockwise, (2) the time is written as on a

digital clock (“11:10”; “2:05”), (3) the time is written in words. Score “1” in allother cases

0.63

(9) Do you think this clock was drawn by a nonpatient? 0.48All items (mean) 0.79

Note: Items (1)–(6) selected for further analysis.

4 K. Jørgensen et al.

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 8: The Six-item Clock Drawing Test Jorgensen 2014

Results

As expected, patients had a significantly lower MMSE score than nonpatients (Table 2).As patients were significantly older than nonpatients, age was included as a covariate inthe subsequent analyses wherever relevant. The difference in female/male ratio in the twosubsamples was not statistically significant.

Interrater agreement

Interrater agreement (mean kappa) for individual scoring items ranged from 0.48 to 0.96(Table 1). Three items were excluded due to poor interrater agreement, whereas six itemswith high interrater agreement (kappa values >0.75) were retained for further analysis.These were (1) “Numerals inside circle”; (2) “Hand pointing to designated minute”; (3)“12 placed correctly”; (4) “Both hands present”; (5) “Sequence 1–12 correct”; and (6)“Hand pointing to designated hour.” The retained items had kappa values in the range of0.83–0.96 (mean kappa = 0.90). Using the six items as a scale (CDT-6; range 0–6)correlations among the four raters’ total CDT scores ranged from 0.96 to 0.97 (meanr = 0.97, n = 231, P < 0.0005).

Item performance: bivariate analyses

On item level, significant associations were found between four scoring items anddiagnosis (nonpatient vs. mild AD) according to the chi-square test (1, n = 231)(Table 3). Table 3 also shows the difficulty of individual scoring items expressed as theproportion of participants obtaining a score of “1” (“correct”) on each item. For

Table 2. Characteristics of the mild Alzheimer’s disease and nonpatient sample.

Mild AD (n = 136) Nonpatients (n = 95)

Female/male 82/54 56/39Age, mean (SD) 77.4 (7.7) 71.0 (6.3)*Age, range 60–92 60–85MMSE score, mean (SD) 24.3 (2.4) 29.2 (1.1)MMSE score, median (range) 25 (20–30) 29 (25–30)*

Note: AD = Alzheimer’s disease; *significant difference from mild AD (P < 0.0005).

Table 3. Proportion of participants obtaining a score of “1” on individual items (item difficulty).

Mild AD (%) Nonpatient (%) Chi square P

Hand pointing to designated minute 50.7 91.6 40.7 <0.005*Hand pointing to designated hour 70.6 100 31.8 <0.005*Both hands present 76.5 98.9 21.3 <0.005*Sequence 1–12 correct 72.8 95.8 18.7 <0.005*Numerals inside circle 80.9 92.6 5.4 0.02012 placed correctly 91.2 98.9 5.0 0.026

Note: AD = Alzheimer’s disease; *significant differences; level of significance was adjusted to 0.008 to avoidType 1 errors; items are rank ordered according to their chi-squared values.

Aging, Neuropsychology, and Cognition 5

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 9: The Six-item Clock Drawing Test Jorgensen 2014

nonpatients, proportions of subjects obtaining a score of “1” vary from 92% to 100%. Formild AD patients, even the most difficult item (“Hand pointing to designated minute”) iscorrect in half of the patients. The overlap in performance of patients and nonpatients onitem level is so pronounced that no single item discriminates well between the twosubsamples.

Discriminative validity

On scale level, a considerable overlap between the performance of mild AD patientsand nonpatients also exists as the most frequent score in both groups is the max-imum CDT-6 score of 6. Only two nonpatients had a score <5 and none had a score<4. Inspection of the SNs and SPs of the CDT-6 score (data not shown) and theROC curves (Figure 1) reveals that the optimal cutoff score is 5/6 producing a SN of0.65 and a SP of 0.80. The AUC for the CDT-6 is 0.76 (SE = 0.03, 95% CI0.70–0.82).

The CDT-6 scoring items and age were entered into a logistic regression analysis andrank-ordered according to their contribution to the model according to the Wald test (datanot shown). Stepwise removal of the three items with the least discriminative power(“Hand pointing to designated hour”; “Sequence 1–12 correct”; “12 placed correctly”) andadjustment of the cutoff score accordingly (4/5 at five items, 3/4 at four items, etc.)resulted in only a small reduction in SN (from 0.65 to 0.59) and had even less impact onSP (from 0.80 to 0.84) (Table 4). Age was the variable with the strongest predictivepower, but removing age from the regression model had no effect on the rank ordering ofthe remaining variables.

Figure 1. ROC curves of classification performance of the clock drawing test based on fourcombinations of scoring items.

6 K. Jørgensen et al.

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 10: The Six-item Clock Drawing Test Jorgensen 2014

To examine if the discriminative validity of the CDT-6 in mild AD could be improvedby restricting the analysis to cases with a clinically manifest cognitive deterioration, themild AD sample was temporarily split into two subsamples: a “MMSE 25–30” (very mildAD) subsample (n = 69) and a “MMSE 20–24” subsample (n = 67). Using only the“MMSE 20–24” subsample increased SN slightly from 0.65 to 0.72 at a cutoff score of 5/6 (splitting the clinical sample does not affect SP).

Classification accuracy of performances deviating from the optimal cutoff score

The positive likelihood ratio (LR+) of a CDT-6 score of 5 was 3.27 indicating that a scoreof 5 is about three times more likely to be obtained from a patient with mild AD asopposed to a nonpatient (Table 5). Posttest odds were a little higher (4.68) as they arepositively influenced by pretest odds (1.4:1) in favor of mild AD in the present sample.Posttest probability (and PPV) is 0.82 indicating that there is an 82% probability thatcases in this sample with a score of 5 have mild AD. But in a low base rate setting (lowprevalence of mild AD), posttest odds and posttest probability would also be low.Significantly higher classification accuracy was found at a CDT-6 score of 4 out of 6 asthe LR+ increased to 18.51, posttest odds were 26.50, and posttest probability was 0.96. ACDT-6 score of 4 implies a strong probability that the drawing was made by a mild ADsubject. At a CDT-6 score of 3 or less, the probability of mild AD LR+ approachescertainty in the present sample although LR+ and posttest statistics cannot be formallycalculated (as no nonpatients score less than 4).

Table 5. Likelihood ratios, posttest odds, posttest probabilities and predictive validity of multiplelevels of CDT-6 performances.

Score LR+ LR− Posttest odds Posttest probability (=PPV) NPV

≤3 N/A 0.71 N/A N/A 0.504 18.51 0.62 26.50 0.96 0.535 3.27 0.43 4.68 0.82 0.626 1.00 N/A 1.43 0.59 N/A

Note: Base rate of mild Alzheimer’s disease (AD) = 59 %; pretest odds for mild AD = 1.4; LR+ = positivelikelihood ratio; LR− = negative likelihood ratio; PPV = positive predictive power; NPV = negative predictivepower.

Table 4. Cutoff scores, sensitivity, specificity, and area under the curve of the clock drawing testbased on four combinations of scoring items.

CDT version Cutoff SN SP AUC SE 95% CI

CDT-6 5/6 0.65 0.80 0.76 0.03 0.70–0.82CDT-5 4/5 0.63 0.80 0.75 0.03 0.68–0.81CDT-4 3/4 0.63 0.80 0.75 0.03 0.68–0.81CDT-3 2/3 0.59 0.84 0.73 0.03 0.67–0.80

Notes: SN = Sensitivity; SP = Specificity; CDT = clock drawing test; AUC = area under the ROC curve;SE = standard error; CI = confidence interval; CDT-3 = “Hand pointing to minute”, “Numerals inside circle”,“Both hands present”; CDT-4 = CDT-3 + “12 placed correctly”; CDT-5 = CDT-4 + “Sequence 1–12 correct.”

Aging, Neuropsychology, and Cognition 7

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 11: The Six-item Clock Drawing Test Jorgensen 2014

Discussion

The present study used as a starting point the Lin et al.’s 13-item scoring system for theCDT, which after pilot testing and adaption to retrospective scoring of clock drawingsinitially was shortened and modified to 9 items. In the next phase, four independent ratersblind to diagnosis scored 231 clock drawings each and interrater agreement was calculatedon item level. Unexpectedly, simple and seemingly straightforward scoring items (“Doyou think this clock was drawn by a nonpatient”; “Is the hour hand shorter than theminute hand?”) were characterized by poor interrater reliability indicating that trainedclinicians are not able to discriminate, at a glance, between clocks drawn by nonpatientsand mild AD patients. The first of these items may obtain a higher interrater agreement inpatient samples with more pronounced dementia (Korner, Lauritzen, Nilsson, Lolk, &Christensen, 2012). As reliability is essential for validity, three items with poor interrateragreement were excluded. Reducing the scoring system from nine to six items with highinterrater agreement produced a simple, reliable system easily applied in clinical settingsand scored in less than a minute.

According to literature reviews, most of the existing CDT scoring systems have highinterrater reliability (Pinto & Peters, 2009; Shulman, 2000). However, inspection of theindividual studies reveals that interrater reliability in most cases was analyzed on scalelevel rather than on item level (thus potentially low reliability items were not identified)and most studies used only a subsample for the calculation of interrater reliability ratherthan the full sample (see, for instance, Heinik, Solomesh, & Berkman, 2004; Scanlan &Borson, 2001; Tuokko, Hadjistavropoulos, Rae, & O'Rourke, 2000). In the present study,reliability was investigated using the full sample with ratings from four independentneuropsychologists.

In the present study, the discriminative validity of the six-item CDT regarding mildAD versus elderly nonpatients can be described as intermediate (SN 0.65; SP 0.80) due toa considerable performance overlap between subsamples. The MMSE scores of the mildAD patients were in the range from 20 to 30 indicating that some patients must belong tothe “very mild” end of the mild AD spectrum where cognitive deficits are subtle.Evidently, the six-item CDT has a ceiling effect for both nonpatients and mild ADpatients. Focusing only on the “MMSE 20–24” subsample with evident cognitive deficits,the SN of the six-item CDT could be increased to 0.72. This is not ideal but our results arefairly consistent with results of previous studies regarding the discriminative validity ofthe CDT in mild AD patients versus nonpatients (Chiu, Li, Lin, Chiu, & Liu, 2008;Connor, Seward, Bauer, Golden, & Salmon, 2005; Esteban-Santillan, Praditsuwan, Ueda,& Geldmacher, 1998; Lee, Swanwick, Coen, & Lawlor, 1996; Powlishta et al., 2002). Notsurprisingly, discriminative validity tends to be intermediate to low in studies includingpatients with “very mild” or “questionable” AD. Attempts to use the CDT as a screeningmethod for mild cognitive impairment have been largely unsuccessful (Forti, Olivelli,Rietti, Maltoni, & Ravaglia, 2010; Lee et al., 2008; Yamamoto et al., 2004). One of thereasons for the moderate discriminative validity of the CDT found in the present study isthe heterogeneity of the cognitive profiles of mild AD patients. At this stage of thedisease, not all patients have developed clinically significant deficits in visuospatialprocessing or planning (Stopford, Snowden, Thompson, & Neary, 2008). We are, how-ever, not able to explore this issue as only selected patients underwent formal neuropsy-chological assessment.

We found that the six-item CDT may be reduced to five, four, or even three items(“Hand pointing to designated minute”; “Numerals inside circle”; “Both hands present”)

8 K. Jørgensen et al.

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 12: The Six-item Clock Drawing Test Jorgensen 2014

with little loss of discriminative power. But a CDTwith only three items has its drawbacksas the excluded items often provide relevant clinical (qualitative) information at the cost ofvery little time and effort. Using only three items to discriminate between two groups alsoraises the problem of restriction of range. We advocate that the six-item CDT is utilized asa continuous scale enhancing the discriminatory power of the test. Most diagnosticvalidity studies of the CDT use a traditional dichotomous approach focussing on classi-fication accuracy at the optimal cutoff score but ignoring discriminative informationassociated with performances deviating from the cutoff. This simplification may berelevant for practical reasons, but performances deviating from the cutoff score areassociated with a higher degree of classification accuracy than a performance just belowor above the cutoff. Likelihood ratios for multiple levels of performance provide estimatesof how likely it is that a test result at any point of the scale was obtained from a patient asopposed to a nonpatient. As a rule of thumb, when using the six-item CDT to identifymild AD, a score of 6 is inconclusive, a score of 5 implies a weak probability of mild AD,a score of 4 involves a strong probability of mild AD, and a score of 3 or less equalsalmost certainty. This does not imply that the cutoff should be adjusted to 4 or 3 as SNwould suffer.

The item “Hand pointing to designated minute” was the single most powerful itemregarding discriminatory ability, whereas the value of the item “Hand pointing to desig-nated hour” was questionable. Placement of the minute hand on an analog clock face isrelatively difficult as minutes are converted to a duodecimal system (5 minutes equals “1”,10 minutes equals “2,” etc.) whereas hours are unconverted. The transformation ofminutes requires access to semantic conceptual knowledge about the outline of analogclocks, and failure to perform this operation may reflect deterioration of semantic memory(Cacho et al., 2005; Kitabayashi et al., 2001; Lessig, Scanlan, Nazemi, & Borson, 2008;Leyhe, Saur, Eschweiler, & Milian, 2009). This “minute hand phenomenon” (Leyhe,Milian, Muller, Eschweiler, & Saur, 2009) makes the CDT an interesting screening testfor AD. We have not analyzed the added value (incremental validity) of the CDT in adiagnostic setting – such as the Copenhagen University Hospital Memory Clinic – thatroutinely includes the MMSE, but it can be speculated that the CDT with its emphasis onvisuospatial construction and planning complements the MMSE that touches only cursoryon these cognitive domains.

The limitations of the present study are as follows. (1) The retrospective study designinvolving two slightly different instructions for the CDT (“set the hands to 5 minutes to 2”and “10 minutes past 11”). (2) Patients and nonpatients were not matched for age andeducation. (3) The CDT was not administered using a predrawn circle possibly reducingthe interrater agreement of some scoring items. (4) We cannot rule out the possibility thatsome patients with non-AD dementia were erroneously included as our patient sampleonly met diagnostic criteria for probable AD (i.e., without histopathological evidence).Consequently, the results of this study may not be specific to mild AD. (5) Althoughparticipants with a history of psychiatric disease were excluded from this study andpatients with manifest psychiatric symptoms were referred to specialized psychiatricassessment, there was no systematic assessment of psychiatric or behavioral symptomsin the patient sample. The lack of data regarding (subtle) psychiatric or behavioralsymptoms may in some cases have compromised the validity of the AD diagnosis. (6)As the CDT was included in the primary cognitive assessment of the mild AD partici-pants, there may be a risk of circular evidence concerning diagnostic validity. However, asCDT performance contributes to only 3% of the total ACE score, the primary impact ofthe CDT on diagnostic classification was small.

Aging, Neuropsychology, and Cognition 9

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 13: The Six-item Clock Drawing Test Jorgensen 2014

The main results of this study are (1) a highly reliable, short, and practical CDT forclinical use is provided; (2) only six items applied from the comprehensive scoring systemused as the starting point were found to have adequate interrater reliability; (3) theclassification accuracy of the resulting six-item scoring system in mild AD versus elderlynonpatients was intermediate; and (4) performances deviating from the optimal cutoffscore were associated with high classification accuracy.

AcknowledgmentsThe authors are grateful to the neuropsychologists Camilla W. Overbeck, Ida Stuart, Bodil DahlHenriksen, and Jytte Dock for blind scoring of clock drawings. The authors have no financialinterest or benefit arising from the direct applications of this research.

FundingThe Danish Dementia Research Centre would like to thank the Ministry of Health and the HealthInsurance Fund for financial support.

ReferencesCacho, J., Garcia-Garcia, R., Fernandez-Calvo, B., Gamazo, S., Rodriguez-Perez, R., Almeida, A.,

& Contador, I. (2005). Improvement pattern in the clock drawing test in early Alzheimer’sdisease. European Neurology, 53, 140–145.

Chiu, Y. C., Li, C. L., Lin, K. N., Chiu, Y. F., & Liu, H. C. (2008). Sensitivity and specificity of theclock drawing test, incorporating Rouleau scoring system, as a screening instrument for ques-tionable and mild dementia: Scale development. International Journal of Nursing Studies, 45,75–84.

Connor, D. J., Seward, J. D., Bauer, J. A., Golden, K. S., & Salmon, D. P. (2005). Performance ofthree clock scoring systems across different ranges of dementia severity. Alzheimer Disease &Associated Disorders, 19, 119–127.

Esteban-Santillan, C., Praditsuwan, R., Ueda, H., & Geldmacher, D. S. (1998). Clock drawing test invery mild Alzheimer’s disease. Journal of American Geriatrics Society, 46, 1266–1269.

Forti, P., Olivelli, V., Rietti, E., Maltoni, B., & Ravaglia, G. (2010). Diagnostic performance of anexecutive clock drawing task (CLOX) as a screening test for mild cognitive impairment inelderly persons with cognitive complaints. Dementia and Geriatric Cognitive Disorders, 30,20–27.

Freedman, M., Leach, L., Kaplan, E., Delis, D., & Shulman, K. I. (1994). Clock-drawing: Aneuropsychological analysis. New York, NY: Oxford University Press.

Heinik, J., Solomesh, I., & Berkman, P. (2004). Correlation between the CAMCOG, the MMSE,and three clock drawing tests in a specialized outpatient psychogeriatric service. Archives ofGerontology and Geriatrics, 38, 77–84.

Kitabayashi, Y., Ueda, H., Narumoto, J., Nakamura, K., Kita, H., & Fukui, K. (2001). Qualitativeanalyses of clock drawings in Alzheimer’s disease and vascular dementia. Psychiatry andClinical Neurosciences, 55, 485–491.

Korner, E. A., Lauritzen, L., Nilsson, F. M., Lolk, A., & Christensen, P. (2012). Simple scoring ofthe clock-drawing test for dementia screening. Danish Medical Journal, 59, A4365.

Lee, H., Swanwick, G. R., Coen, R. F., & Lawlor, B. A. (1996). Use of the clock drawing task in thediagnosis of mild and very mild Alzheimer’s disease. International Psychogeriatrics, 8,469–476.

Lee, K. S., Kim, E. A., Hong, C. H., Lee, D. W., Oh, B. H., & Cheong, H. K. (2008). Clock drawingtest in mild cognitive impairment: Quantitative analysis of four scoring methods and qualitativeanalysis. Dementia and Geriatric Cognitive Disorders, 26, 483–489.

Lessig, M. C., Scanlan, J. M., Nazemi, H., & Borson, S. (2008). Time that tells: Critical clock-drawing errors for dementia screening. International Psychogeriatrics, 20, 459–470.

10 K. Jørgensen et al.

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4

Page 14: The Six-item Clock Drawing Test Jorgensen 2014

Leyhe, T., Milian, M., Muller, S., Eschweiler, G. W., & Saur, R. (2009). The minute handphenomenon in the clock test of patients with early Alzheimer disease. Journal of GeriatricPsychiatry and Neurology, 22, 119–129.

Leyhe, T., Saur, R., Eschweiler, G. W., & Milian, M. (2009). Clock test deficits are associated withsemantic memory impairment in Alzheimer disease. Journal of Geriatric Psychiatry andNeurology, 22, 235–245.

Lin, K. N., Wang, P. N., Chen, C., Chiu, Y. H., Kuo, C. C., Chuang, Y. Y., & Liu, H. (2003). Thethree-item clock-drawing test: A simplified screening test for Alzheimer’s disease. EuropeanNeurology, 49, 53–58.

Mainland, B. J., Amodeo, S., & Shulman, K. I. (2013). Multiple clock drawing scoring systems:Simpler is better. International Journal of Geriatric Psychiatry, 29, 127–136.

Mathuranath, P. S., Nestor, P. J., Berrios, G. E., Rakowicz, W., & Hodges, J. R. (2000). A briefcognitive test battery to differentiate Alzheimer’s disease and frontotemporal dementia.Neurology, 55, 1613–1620.

Mckhann, G., Drachman, D., Folstein, M., Katzman, R., Price, D., & Stadlan, E. M. (1984). Clinicaldiagnosis of Alzheimer’s disease: Report of the NINCDS-ADRDA work group under theauspices of department of health and human services task force on Alzheimer’s disease.Neurology, 34, 939–944.

Mendez, M. F., Ala, T., & Underwood, K. L. (1992). Development of scoring criteria for the clockdrawing task in Alzheimer’s disease. Journal of American Geriatrics Society, 40, 1095–1099.

Nyborn, J. A., Himali, J. J., Beiser, A. S., Devine, S. A., Du, Y., Kaplan, E., ... Au, R. (2013). TheFramingham heart study clock drawing performance: Normative data from the offspring cohort.Experimental Aging Research, 39, 80–108.

Pfeffer, R. I., Kurosaki, T. T., Harrah Jr., C. H., Chance, J. M., & Filos, S. (1982). Measurement offunctional activities in older adults in the community. Journal of Gerontology, 37, 323–329.

Pinto, E., & Peters, R. (2009). Literature review of the clock drawing test as a tool for cognitivescreening. Dementia and Geriatric Cognitive Disorders, 27, 201–213.

Powlishta, K. K., Von Dras, D. D., Stanford, A., Carr, D. B., Tsering, C., Miller, J. P., & Morris, J.C. (2002). The clock drawing test is a poor screen for very mild dementia. Neurology, 59,898–903.

Sackett, D. L. (1992). The rational clinical examination. A primer on the precision and accuracy ofthe clinical examination. JAMA: The Journal of the American Medical Association, 267,2638–2644.

Scanlan, J., & Borson, S. (2001). The mini-cog: Receiver operating characteristics with expert andnaive raters. International Journal of Geriatric Psychiatry, 16, 216–222.

Shulman, K. I. (2000). Clock-drawing: Is it the ideal cognitive screening test? International Journalof Geriatric Psychiatry, 15, 548–561.

Shulman, K. I., Shedletsky, R., & Silver, I. L. (1986). The challenge of time: Clock-drawing andcognitive function in the elderly. International Journal of Geriatric Psychiatry, 1, 135–140.

Stokholm, J., Vogel, A., Johannsen, P., & Waldemar, G. (2009). Validation of the DanishAddenbrooke’s Cognitive Examination as a screening test in a memory clinic. Dementia andGeriatric Cognitive Disorders, 27, 361–365.

Stopford, C. L., Snowden, J. S., Thompson, J. C., & Neary, D. (2008). Variability in cognitivepresentation of Alzheimer’s disease. Cortex, 44, 185–195.

Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A compendium of neuropsychological tests.Administration, norms, and commentary (3rd ed.). New York, NY: Oxford University Press.

Tuokko, H., Hadjistavropoulos, T., Rae, S., & O’Rourke, N. (2000). A comparison of alternativeapproaches to the scoring of clock drawing. Archives of Clinical Neuropsychology, 15, 137–148.

Vogel, A., Gade, A., Stokholm, J., & Waldemar, G. (2005). Semantic memory impairment in theearliest phases of Alzheimer’s disease. Dementia and Geriatric Cognitive Disorders, 19, 75–81.

Wolf-Klein, G. P., Silverstone, F. A., Levy, A. P., & Brod, M. S. (1989). Screening for Alzheimer’sdisease by clock drawing. Journal of American Geriatrics Society, 37, 730–734.

Yamamoto, S., Mogi, N., Umegaki, H., Suzuki, Y., Ando, F., Shimokata, H., & Iguchi, A. (2004).The clock drawing test as a valid screening method for mild cognitive impairment. Dementiaand Geriatric Cognitive Disorders, 18, 172–179.

Aging, Neuropsychology, and Cognition 11

Dow

nloa

ded

by [

Cop

enha

gen

Uni

vers

ity L

ibra

ry]

at 0

6:30

02

July

201

4