11
Patient-Reported Outcomes in Rheumatoid Arthritis Assessing the Equivalence of Electronic and Paper Data Collection Brian Tiplady, 1,2 Kirsteen Goodman, 3 Geraldine Cummings, 4 Dawn Lyle, 4 Robert Carrington, 5 Clare Battersby 6 and Stuart H. Ralston 3 1 PRO Consulting, Twickenham, UK 2 Department of Anaesthesia, Critical Care and Pain Medicine, Royal Infirmary of Edinburgh, University of Edinburgh, Edinburgh, UK 3 Edinburgh Clinical Trials Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, University of Edinburgh, Edinburgh, UK 4 Wellcome Trust Clinical Research Facility, Western General Hospital, Edinburgh, UK 5 AstraZeneca R&D Charnwood, Loughborough, UK 6 AstraZeneca R&D Alderley Park, Macclesfield, Cheshire, UK Abstract Background: The questionnaires used in clinical research have often been developed and validated using paper, and in such cases it is necessary to show that the electronic versions are equivalent to the originals. Objective: To determine if electronic versions of questionnaires assessing se- verity and impact of rheumatoid arthritis (RA) are equivalent to the original paper versions. Methods: Patients (n = 43; 31 female) aged 3283 years (25 aged <60 years) took part in a single session during which they completed paper and electro- nic assessments in randomized order, with an interval of 45 minutes between the two modes. Electronic assessments were set up on a Palm Ò TX handheld device. Assessments included measures of pain, fatigue, disability, and health status. Results: Scores were similar between the two modes. All effect sizes for elec- tronic-paper differences were <0.2, and there was no overall tendency for one mode to show higher scores than the other. Intraclass correlation coefficients (ICCs) ranged from 0.72 to 0.96, and were generally similar to reported retest reliabilities of the scales in their paper versions. The Disability Index of the Health Assessment Questionnaire (HAQ-DI) showed an ICC of 0.96. ICCs for the EQ-5D health status scale were utility: 0.79; profile: 0.91; visual ana- log scale: 0.75. Most patients reported that both modes were easy to use. In general, patients preferred the electronic version over the paper, and this was true for the older as well as the younger patients. ORIGINAL RESEARCH ARTICLE Patient 2010; 3 (3): 133-143 1178-1653/10/0003-0133/$49.95/0 ª 2010 Adis Data Information BV. All rights reserved.

Patient-Reported Outcomes in Rheumatoid Arthritis

Embed Size (px)

Citation preview

Patient-Reported Outcomes inRheumatoid ArthritisAssessing the Equivalence of Electronic and Paper Data Collection

Brian Tiplady,1,2 Kirsteen Goodman,3 Geraldine Cummings,4 Dawn Lyle,4 Robert Carrington,5

Clare Battersby6 and Stuart H. Ralston3

1 PRO Consulting, Twickenham, UK

2 Department of Anaesthesia, Critical Care and PainMedicine, Royal Infirmary of Edinburgh, University of

Edinburgh, Edinburgh, UK

3 Edinburgh Clinical Trials Unit, Institute of Genetics and Molecular Medicine, Western General Hospital,

University of Edinburgh, Edinburgh, UK

4 Wellcome Trust Clinical Research Facility, Western General Hospital, Edinburgh, UK

5 AstraZeneca R&D Charnwood, Loughborough, UK

6 AstraZeneca R&D Alderley Park, Macclesfield, Cheshire, UK

Abstract Background: The questionnaires used in clinical research have often been

developed and validated using paper, and in such cases it is necessary to show

that the electronic versions are equivalent to the originals.

Objective: To determine if electronic versions of questionnaires assessing se-

verity and impact of rheumatoid arthritis (RA) are equivalent to the original

paper versions.

Methods: Patients (n= 43; 31 female) aged 32–83 years (25 aged <60 years)

took part in a single session during which they completed paper and electro-

nic assessments in randomized order, with an interval of 45 minutes between

the two modes. Electronic assessments were set up on a Palm� TX handheld

device. Assessments included measures of pain, fatigue, disability, and health

status.

Results: Scores were similar between the two modes. All effect sizes for elec-

tronic-paper differences were <0.2, and there was no overall tendency for one

mode to show higher scores than the other. Intraclass correlation coefficients

(ICCs) ranged from 0.72 to 0.96, and were generally similar to reported retest

reliabilities of the scales in their paper versions. The Disability Index of the

Health Assessment Questionnaire (HAQ-DI) showed an ICC of 0.96. ICCs

for the EQ-5D health status scale were utility: 0.79; profile: 0.91; visual ana-

log scale: 0.75. Most patients reported that both modes were easy to use. In

general, patients preferred the electronic version over the paper, and this was

true for the older as well as the younger patients.

ORIGINAL RESEARCH ARTICLEPatient 2010; 3 (3): 133-143

1178-1653/10/0003-0133/$49.95/0

ª 2010 Adis Data Information BV. All rights reserved.

Conclusions: Electronic versions of questionnaires assessing severity and im-

pact of RA provide data that correspond closely to those of paper originals,

are easy to use, and are acceptable to patients.

Background

Electronic methods are increasingly beingused to collect diary and questionnaire data di-rectly from patients (electronic patient-reportedoutcomes [ePRO]). There are many benefits withePRO. Only valid, in-range data options canbe entered. Missing data within a questionnairecan be prevented. Feedback and reminders canbe given to patients to help them comply withstudy procedures. All entries are date and timestamped, to help ensure data integrity. Data canbe immediately transmitted to a secure centralserver, allowing rapid review of results. Questionbranching can be invisible to the patient, makingit easier for the patient than a using paper ques-tionnaire. Electronic methods have been found tobe generally acceptable to patients, who oftenprefer them to paper questionnaires.[1-3]

The questionnaires used in clinical researchhave often been developed and validated usingpaper, and in such cases it is necessary to showthat the electronic versions are equivalent tothe originals. In general, migration from paper toelectronic modes does not lead to problems ofequivalence. There is a substantial literaturecomparing electronic and paper questionnairesand this has been reviewed by Gwaltney et al.,[4]

who presented a meta-analysis of 46 studiescomparing 278 distinct assessment instruments.They concluded that computer and paper ad-ministration of PROs produce closely similar re-sults across text-based electronic platforms andthat this was the case independent of scale length,screen size, and those changes made during mi-gration that were reported.

This general conclusion does not, of course,guarantee that a specific migration will lead to avalid ePROmeasure. It is necessary to review anychanges that have been made in developing theelectronic version and to assess the potential forthese to affect the way the patient completes the

questionnaire. The most common changes arethose that result from the available screen size,for example presenting questions one at a time,rather than in groups, or presenting instructionsor introductory material separately from thequestions. Shields et al.[5] suggested a hierarchicalframework for evaluating the potential impact ofsuch factors, and thus of the degree of validationwork that may be needed for a specific ePROimplementation. A similar approach has beenadvocated in the recent guidelines from the In-ternational Society for Pharmacoeconomics andOutcomes Research (ISPOR).[6]

PROs are particularly useful in a chroniccondition such as rheumatoid arthritis (RA), andmay offer important advantages over physician-rated measures.[7] There are a variety of instru-ments available for collecting PROs in RA. Theseinclude assessments of quality of life (QOL) spe-cifically focused on RA; generic assessments ofhealth-related QOL (HR-QOL); measures fo-cused on specific areas such as pain and fatigue;assessments of disability; and global assessments.A battery of ePRO assessments for RA has beenimplemented on a handheld computer by in-vivodata inc., and two of the scales included, theEQ-5D[8,9] and the Health Assessment Ques-tionnaire – Disability Index (HAQ-DI),[10,11] hadbeen changed sufficiently that it was consideredappropriate to carry out a study to evaluate theequivalence of the electronic and paper modes.The opportunity was taken to include all scalesin the battery in order to obtain additional evi-dence for scales that had not been significantlymodified.

Methods

Study Design

The study used a two-period within-subjectsdesign comparing electronic and paper modes.

134 Tiplady et al.

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

Patients took part in a single session in whichthey completed all assessments in the first mode,then in the second mode, with an interval of45 minutes in between. Half the patients com-pleted electronic then paper, half paper thenelectronic, the sequence being allocated to pa-tients using a blocked randomization. The orderof questionnaires within the session was ran-domized between patients, but was the same forboth modes for a particular patient. At the endof the session, patients completed a question-naire evaluating familiarity with, and attitudesand preferences towards, electronic and paperadministration.

Patients

Patients (n = 43; 12 male, 31 female) with adiagnosis of RA took part in the study and wererecruited from rheumatology outpatient clinics atthe Western General Hospital, Edinburgh, UK.They were aged between 32 and 83 years, with 25(58%) aged <60 years, and 18 (42%) aged ‡60 years(median age 57 years). Patients were excluded ifthey had any condition other than RA likely tocause pain or fatigue, to affect QOL, or to impairfunctioning; if they had a diagnosis of any psy-chiatric condition; or if they were considered bythe responsible investigator to be unlikely to beable to comply with study procedures. Allpatients gave written informed consent to parti-cipate in the study, which was approved by theLothian Research Ethics Committee.

Assessments

Electronic assessments were carried out onPalm� TX devices (screen size 75 · 55mm) pro-grammed by invivodata inc. (Pittsburgh, PA,USA). One question at a time was presented onthe screen. Patients responded by tapping on thescreen with a stylus. The selection was high-lighted, and could be changed if necessary bytapping on an alternative option. The patientconfirmed the choice and moved to the nextscreen by tapping a ‘Next’ button on the screen.

Paper assessments used the layouts describedby the scale developers. The questionnaires in-

cluded in both paper and electronic versions aredescribed in the following sections.

Health Assessment Questionnaire – Disability Index(HAQ-DI)

The HAQ-DI instrument has three types ofquestion.

The first type collects data concerning a set ofactivities within eight domains. Each activity israted on a four-point scale: (i) without any diffi-culty; (ii) with some difficulty; (iii) with muchdifficulty; (iv) unable to do.

The second type of question asks whether aidsor devices have been used to perform any of theabove activities. A selection of devices is offered,and also an ‘other’ option, in which the re-spondent writes the aid or device used.

The third type of question asks whether therespondent has required help from anotherperson in performing activities in the stateddomains.[12]

Two main modifications were made in im-plementing the ePRO version. The first was toeliminate free-text entry. The scoring scheme forthe HAQ-DI only requires a ‘yes’ or ‘no’ recordfor assistance in each domain, so it was possibleto make this change without modifying the scalelogic.

The second change was to strengthen the scalestructure by grouping all questions relating to aparticular domain together. This compensatedfor the fact that patients cannot review previousanswers as easily when a single question perscreen is displayed. Two questions from thedomain ‘grip’ are illustrated in figure 1. As canbe seen, each screen is self-contained with res-pect to necessary instructions and time-frameinformation.

EQ-5D

The EQ-5D has two parts. The first part con-sists of five questions, each of which has threeresponse options. The second part is an overallhealth status evaluation in which the patientmakes a mark on a 200mm vertical scale. Thebottom of the scale is labeled ‘0 – worst imagin-able health state’ and the top is marked ‘100 –best imaginable health state’. In between, the

Patient-Reported Outcomes in Rheumatoid Arthritis 135

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

scale is marked with 100 subdivisions, with each10th point labeled (10, 20, etc.) so that the patientcan see the exact numerical value that corre-sponds to the marked value.[9]

Implementation of the five questions on theePRO device was simple. In all cases the questionand response text fitted easily in the availablescreen area. The health status scale was lessstraightforward. The scale clearly had to be re-duced in size, and this would make it impossibleto see the exact numerical value of the scaleposition. This could have changed the patients’responses, so a numeric read-out of the valuewas added to the display (figure 2). In this way,the information made available to the patientwas similar in both modes. A similar approachto administering the EQ-5D was adopted byRamachandran et al.[13]

Brief Pain Inventory (BPI)

All questions in the Brief Pain Inventory (BPI)have 11 responses, except the first (yes/no) ques-tion, and two questions (location of pain andmedication) that are not included in the scoringscheme, and were omitted from the electronicversion. Each set of response options is laid out asnumbers (0–10 or, in one case, 0–100%) in ahorizontal row, with text anchors at the left andright ends.[14]

In the ePRO version, the questions were pre-sented one at a time. The horizontal arrangementof the response options was maintained, and allnecessary information fitted easily in the avail-able screen space.

McGill Pain Questionnaire (MPQ-SF)

The McGill Pain Questionnaire (MPQ-SF)obtains current ratings for 15 pain words, each ofwhich is scored as ‘none’, ‘mild’, ‘moderate’, ‘se-vere’. In the paper version, pain words are pre-sented one per line. Response options are laidout to the right as horizontal rows of tick boxeswith labels on a header line above the responses.Two additional questions rate overall pain usinga visual analog scale (VAS) and a six-pointscale.[15,16]

The electronic version presented one questionper screen. The heading ‘current pain’ was pre-sented at the top of each screen, then the specificpain word, then the response options arrangedvertically with ‘none’ at the top and ‘severe’ at thebottom. Thus, there was a change in the align-ment of the response options from horizontal tovertical in the ePRO version. The paper VAS was10 cm long, and this was reduced to about 40mm

? ?

Yes

Without ANY difficulty

GRIPOVER THE PAST WEEK:

Are you able to: open cardoors, open jars (that havebeen previously opened)and turn taps on and off?

GRIPOVER THE PAST WEEK:

Do you use any AIDS OREQUIPMENT for GRIP,for example a jar opener

(for jars previouslyopened)?

With SOME difficulty

With MUCH difficulty

UNABLE to do

No

Fig. 1. Two screen shots from the handheld implementation of theHealth Assessment Questionnaire – Disability Index (HAQ-DI) ‘grip’domain. Patients tapped on the appropriate option to select it, andcould change their response by tapping on a different option beforemoving to the next question by tapping the right arrow.

?

Best imaginablehealth state

Worst imaginablehealth state

100

90

80

70

60

50

40

30

20

1063

0

Fig. 2. Screen shot of the handheld implementation of the healthstatus scale from the EQ-5D. Patients tapped on the scale to indicatetheir current state, and could adjust the selection by tapping ordragging the cursor. The numeric score was displayed in the textbox.

136 Tiplady et al.

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

to fit onto the ePRO screen. Responses weremade by tapping on the screen. A cursor thenappeared that could be adjusted to the desiredposition. Scores were recorded as percentage ofscale length, to correspond with the scoring of the10 cm paper scale in millimeters.

Functional Assessment of Chronic Illness Therapy –

Fatigue (FACIT-F)

The fatigue section of the Functional Assess-ment of Chronic Illness Therapy (FACIT-F)questionnaire has 13 questions, each with fiveresponse options (0–4). Response options arelaid out to the right of each question in hori-zontal rows, with response labels presented asheaders.[17,18]

The electronic version presented one questionper screen. The question was presented at the topof each screen, with the numbered response op-tions below arranged vertically, with 0 at the topand 4 at the bottom. Thus, there was a change inthe alignment of the response options from hor-izontal to vertical in the ePRO version.

SF-36

The SF-36 instrument asks questions that varyin layout. For example, one group of questionsasks about limitations to activities, with the threeresponse options ‘yes, limited a lot’, ‘yes, limited alittle’, ‘no, not limited at all’, while another grouppresents a series of statements with the possibleresponses ‘definitely true’, ‘mostly true’, ‘don’tknow’, ‘mostly false’, ‘definitely false’. Responseoptions are the same within a group of questions,and are laid out in horizontal rows of tick boxesto the right of the question text. Definitions of theresponse options are shown as headers.[19]

The electronic version presented one ques-tion per screen, with the question text at the topof each screen and response options arrangedvertically, with the left-hand option from the pa-per scale appearing at the top and the right-hand option at the bottom. Thus, there was againa change in the alignment of the response op-tions from horizontal to vertical in the ePROversion.

Subject’s Assessment of Rheumatoid Arthritis (SARA)

In the Subject’s Assessment of RheumatoidArthritis (SARA) questionnaire, the patient isfirst asked about morning stiffness. On the paperversion there are two questions side by side, oneasking for average duration of morning stiffness,the other asking if stiffness usually lasts all day.Only one of these should be filled in. There thenfollow three questions using a horizontal VAS.

In the ePRO version, in order to make thequestion sequence as logical as possible, thequestion about all-day stiffness was asked as a‘yes/no’ question first, and if the response was‘no’, the duration was obtained. VASs were im-plemented in the same way as for the MPQ-SF.

Data Analysis

Summary Outcome Measures

HAQ-DI

The HAQ-DI was computed as described byBruce and Fries[12] to give a scale from zero (nodisability) to three (completely disabled).

BPI Composite Index

The BPI Composite Index was calculated asthe mean of four items scoring worst, average,and least pain in the past 24 hours, and painnow.[20]

EQ-5D

The utility score was a weighted mean of thefive question scores using the weights derivedfrom a UK population.[21] A profile score wasalso calculated as the arithmetic mean of the fivequestion scores, each being given the values 1, 2,or 3.

FACIT-F

A score was calculated for the fatigue items onthe FACIT-F as the arithmetic mean of thequestion scores, with scores being reversed asappropriate so a greater score always representeda worse health state.

MPQ-SF

Three summary scores (intensity, affective,and total) of the MPQ-SF were calculated as thearithmetic mean of the responses to the corre-sponding descriptor words.

Patient-Reported Outcomes in Rheumatoid Arthritis 137

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

SF-36

SF-36 physical (PCS) and mental componentscores (MCS) were derived using the formulaedescribed by Spritzer.[22]

SARA

An aggregate stiffness score for the SARAwascalculated by giving a maximum score to re-spondents with all-day stiffness, otherwise takingthe duration of stiffness. The scores were thenranked for the entire dataset and the ranked va-lues analyzed.[23]

Individual Scale Items

A number of individual scale items were alsoanalyzed: the health status scale from the EQ-5D;‘pain’ and ‘global’ VAS ratings from the HAQ-DI; the ‘present pain’ VAS from the MPQ; andthe ‘pain’, ‘global’, and ‘fatigue’ VAS ratingsfrom the SARA. The score from each scale wastaken as the percentage of scale length, corre-sponding to millimeters along the 100mm scale inthe case of paper versions.

Statistical Analysis

For all of the above numeric scores, agreementbetween paper and electronic versions was as-sessed with the intraclass correlation coefficient(ICC), using the absolute agreement form de-scribed by McGraw and Wong[24] [ICC(A,1)Case 3, p35]. Statistical analyses were carried outusing SAS software (version 9.1). Differencesbetween the scores for paper and electronic ad-ministration were also calculated, together withconfidence intervals. The sample standard de-viations of the two modes were also computed toallow the electronic-paper difference to be ex-pressed as effect sizes.[25]

Results

Completion of the paper questionnaire took11–69 minutes (mean 29.3) and for the electronicquestionnaire, 10–58 minutes (mean 26.9).

Data for the agreement between electronic andpaper composite measures are shown in table I.

Table I. Agreement between paper and electronic assessments of composite variables

Questionnaire/itemor subscale

n Paper

[mean (SD)]

Electronic

[mean (SD)]

ICC

BPI

Composite index 42a 3.55 (2.03) 3.74 (2.06) 0.93

EQ-5D

Utility 43 0.612 (0.239) 0.608 (0.249) 0.79

Profile 43 1.65 (0.35) 1.64 (0.35) 0.91

FACIT-F

Fatigue 43 1.70 (0.95) 1.61 (0.93) 0.93

HAQ

Disability index 43 1.37 (0.73) 1.34 (0.72) 0.96

MPQ-SF

Intensity 43 0.790 (0.605) 0.831 (0.585) 0.72

Affective 43 0.686 (0.774) 0.738 (0.744) 0.86

Overall 43 0.761 (0.602) 0.806 (0.592) 0.79

SF-36

PCS 43 -1.57 (1.03) -1.52 (1.03) 0.92

MCS 43 -0.40 (1.22) -0.29 (1.33) 0.89

SARA

Aggregate stiffness 43 43.4 (25.1) 43.7 (24.8) 0.95

a Due to an error in administration, paper data are missing for one patient.

BPI =Brief Pain Inventory; FACIT-F =Functional Assessment of Chronic Illness Therapy – fatigue; HAQ =Health Assessment Questionnaire;

ICC = intraclass correlation coefficient; MCS =mental component score; MPQ-SF =McGill Pain Questionnaire – Short Form; PCS =physicalcomponent score; SARA =Subject’s Assessment of Rheumatoid Arthritis; SD = standard deviation.

138 Tiplady et al.

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

The ICC varied between 0.72 and 0.96 (mean0.88). Corresponding data for scores from singleVAS questions are shown in table II. For theseitems, ICCs were in the range 0.75 to 0.91 (mean0.83).

Tables I and II also show data for the meanratings using paper and electronic modes in ori-ginal scale units. The differences between the twomodes are shown in figure 3, rescaled as effectsizes (i.e. differences were divided by the standarddeviation of the scores for the patient group inpaper mode). All effect sizes for electronic-paperdifferences were <0.2. The 95% confidence inter-val (CI) of the electronic-paper difference alwaysincluded zero and, in the majority of cases, the95% CIs were completely contained within aneffect size of –0.25.

Responses to the questionnaire completed atthe end of the study showed that most patientsfound both modes either very easy or fairly easyto use (paper 91%; electronic 95%). All patientsfound both modes acceptable. Patient pre-ferences are shown in figure 4. More patientspreferred electronic than preferred paper andthis was true for the older patients as well asfor the younger. A similar pattern was foundfor experience with technology: those patientswho were less familiar or less comfortable withtechnology also preferred the electronic modeto paper.

Discussion

Interpretation of the data on agreement re-quires consideration of what constitutes accep-table agreement in general, a topic that is just asrelevant for reliability of measures within a par-ticular mode (e.g. test-retest reliability of a paperinstrument) as it is for agreement betweenmodes.

Gwaltney et al.,[4] in their presentation of ameta-analysis of electronic-paper equivalence,suggested that an ICC of at least 0.75 is an ap-propriate criterion for good agreement betweenmodes. Other authors have suggested a value ofat least 0.7 for group comparisons and 0.9 forindividual comparisons.[26] One of the 11 com-posite measures (MPQ intensity) and one ofthe seven single-item measures (EQ-5D healthstatus) was £0.75. Both were >0.7. Our findingsare similar to the data summarized by Gwaltneyet al.,[4] where 94% of measures had an ICC>0.75. The mean values for ICC found hereare also very similar to the mean value of 0.90reported by Gwaltney et al.[4] The ICCs forthe single-item measures tend to be slightlylower than the composite measures. This is likelyto be due to the greater intrinsic reliability ofcomposite measures rather than any mode-specific issues. Thus, in general, the agreementbetween electronic and paper measures was asexpected.

Table II. Agreement between paper and electronic assessments of single-item (visual analog scale [VAS]) variables

Questionnaire/itemor subscale

n Paper

[mean (SD)]

Electronic

[mean (SD)]

ICC

EQ-5D

Health thermometer 43 64.2 (22.5) 64.5 (19.2) 0.75

HAQ

Pain VAS 43 38.9 (26.3) 40.3 (24.1) 0.91

Global VAS 42 32.7 (26.0) 32.5 (22.4) 0.81

MPQ-SF

Present pain VAS 43 39.7 (24.8) 35.7 (23.6) 0.83

SARA

Pain VAS 43 35.9 (24.3) 39.9 (24.5) 0.87

Global VAS 42 29.3 (23.1) 29.7 (22.4) 0.89

Fatigue VAS 43 46.0 (27.6) 49.3 (29.1) 0.77

HAQ =Health Assessment Questionnaire; ICC = intraclass correlation coefficient; MPQ-SF =McGill Pain Questionnaire – Short Form;

SARA =Subject’s Assessment of Rheumatoid Arthritis; SD = standard deviation.

Patient-Reported Outcomes in Rheumatoid Arthritis 139

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

Two measures were of particular importancein this study, the HAQ-DI and the health statusVAS from the EQ-5D (EQ-VAS), as they hadbeen significantly modified in migration to elec-tronic mode. As indicated above, these mod-ifications were designed to maintain the scoringmodel of the scales.

The HAQ-DI had the highest ICC of anymeasure in the study (0.96), indicating near-perfect agreement between the two modes. Thisindicated that the changes made in migration hadnot affected the nature of the information col-lected. The electronic version was intended topreserve the logic of the original scoring schemeas exactly as possible, and this aim was clearlyachieved. In particular, the elimination of free-text entry from the HAQ-DI did not lead to anyloss of precision in the instrument.

Free-text entry is problematic in PRO instru-ments in general, not only for electronic im-plementations. Free text needs to be transcribed,interpreted, and scored. At best, these are tediousand error-prone processes. At worst, entries maybe illegible, ambiguous, or impossible to inter-pret; in fact, several errors with free text werefound in the paper entries for this scale. In anumber of cases, entries were made in the ‘other’categories, which were either in the wrong sec-tion, or referred to items that had their own spe-cific questions. In two cases these could not beclassified and no allocation to specific categoriescould be made. These were a small amount ofdata, and clearly did not have much impact onoverall data quality, given the high ICC obtained,but it illustrates the difficulties that can arisewith this type of data entry. Thus free-text entry

0Effect size difference (electronic − paper)

0.2−0.2−0.4

SARA fatigue VAS

SARA global VAS

SARA pain VAS

MPQ present pain VAS

HAQ global VAS

HAQ pain VAS

EQ-5D health thermometer

SARA aggregate stiffness

SF-36 aggregate mental

SF-36 aggregate physical

MPQ overall

MPQ affective

MPQ intensity

HAQ-DI

FACIT-F fatigue

EQ-5D profile

EQ-5D utility

BPI composite index

0.4

0 0.2−0.2−0.4 0.4

a

b

Fig. 3. Differences between scores for the paper and electronic versions of patient-reported outcome instruments for (a) summary measuresand (b) individual scale items. Data are expressed as mean effect sizes (ratio of effect to population standard deviation), with 95% confidenceintervals. BPI =brief pain inventory; FACIT-F =Functional Assessment of Chronic Illness Therapy – fatigue; HAQ-DI =Health AssessmentQuestionnaire – Disability Index; MPQ =McGill Pain Questionnaire; SARA =Subject’s Assessment of Rheumatoid Arthritis; VAS = visualanalog scale.

140 Tiplady et al.

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

should be avoided in PRO instruments, both pa-per and electronic, wherever practicable.

The other measure of particular interest, theEQ-VAS, showed a much lower ICC of 0.75,though still within an acceptable range. Previouspublished work has shown a substantially highertest-retest reliability for the HAQ-DI than for theEQ-VAS using paper assessments. One group[27]

has compared both measures within a study onRA patients. They found ICCs of 0.92–0.94 forthe HAQ-DI in patients tested at intervals of2 weeks or 3 months who reported that there hadbeen no change in their conditions between assess-ments. For the EQ-VAS, the ICCs were 0.70–0.85.Other studies have used only one of thesemeasures, but found similar values: HAQ-DI cor-relations in the range of 0.91–0.98,[28-30] and EQ-VAS correlations in the range 0.65–0.85.[31-34]

Ramachandran et al.[13] have compared an elec-tronic (tablet) version of the EQ-VASwith a paperversion, using a horizontal rather than a verticalscale layout, but with a similar numeric displayto that used here. The ICC for this comparisonwas 0.75, the same as that found here. Bushnellet al.[35] reported ICCs of 0.77–0.82 for a compar-ison of electronic and paper EQ-VAS, but gaveno details of the electronic version of the scale.

The lower ICC for the paper-electronic com-parison of the EQ-VAS compared with HAQ-DIis thus likely to reflect a lower overall reliabilityof the scale itself, rather than a problem with the

electronic implementation. There are severalpossible reasons for the lower reliability of theEQ-VAS. One is that the EQ-VAS is a single-itemmeasure, while the HAQ-DI is a compositemeasure. In general, one would expect compositemeasures to be more reliable, and the results fromthe present study (tables I and II) suggest a trendin this direction, though not a large one. Anotherpossible reason is that the HAQ-DI records spe-cific factual information about the patient’s lifeand condition, while the EQ-VAS rating is moreone of opinion about general health state. Thegeneral and composite nature of the health statusassessment may also account for the reliabilityfor the EQ-VAS being generally lower than thatfor other VAS, particularly the pain scales, whichrate more specific information, and which haveICCs in the range 0.83–0.91.

In a number of cases, instruments in the studyrequired a change in the orientation of responseoptions from horizontal to vertical. Several of theinstruments with this shift in orientation showedICCs >0.9, including the HAQ-DI (already dis-cussed), the EQ-5D profile, FACIT-F, and SF-36(aggregate physical). The correlations were,in general, similar to the paper test-retest reli-abilities reported for these scales.[36-39] This sug-gests that shift in response orientation is not initself an issue in scale reliability.[40]

Patients in this study found electronic question-naires easy to use and generally preferred them topaper. This is in agreement with other studies withePRO applications set up on handheld computers,tablets, and conventional desktop PCs.[1,41-46] Nopatient gave unwillingness to use electronic as-sessments as a reason for declining to take part inthe study. Age seemed to have little effect on userpreferences. Comparatively little published workhas explicitly addressed this issue, but reportssuggest that ePRO is just as practicable in olderpatients as in the young.[1,47] Patients with little orno previous computer experience have also beenshown to be able to use ePRO effectively.[48-50]

Conclusions

The results from the HAQ-DI show that theelectronic implementation with modified layout

0Paper

Aged ≥60 yearsAged <60 years

No preference Electronic

20

40

Per

cent

age

of p

atie

nts 60

80

Fig. 4. Patient preferences for electronic and paper scales as afunction of age.

Patient-Reported Outcomes in Rheumatoid Arthritis 141

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

performed in a virtually identical fashion to theoriginal scale. In all cases, the agreements be-tween electronic and paper versions were in thesame range as test-retest reliabilities for thepaper scales, indicating that the electronic scaleswere functioning as intended. Electronic ques-tionnaires were shown to be acceptable and easyto use, with the majority of patients preferringthem to paper. A wide range of ePROs maytherefore be employed in large-scale clinical stu-dies in RA, with important benefits for dataquality and efficiency of study management.

Acknowledgments

This study was financially supported by AstraZeneca.Robert Carrington and Clare Battersby are employed by, andown shares in, AstraZeneca, who commissioned the clinicalstudy on which this article is based. Brian Tiplady is employedby PRO Consulting, a division of invivodata inc., the studysponsor; owns shares in AstraZeneca; has acted as consultantfor various companies; and will receive royalties from aforthcoming book on ePRO. Professor Stuart Ralston acts asa consultant for Novartis and Merck Pharmaceuticals.

References1. Drummond HE, Ghosh S, Ferguson A, et al. Electronic

quality of life questionnaires: a comparison of pen-basedelectronic questionnaires with conventional paper in agastrointestinal study. Qual Life Res 1995; 4: 21-6

2. Crawley JA, Kleinman L, Dominitz J. User preferences forcomputer administration of quality of life instruments.Drug Inf J 2000; 34: 137-44

3. Ring AE, Cheong KA, Watkins CL, et al. A randomizedstudy of electronic diary versus paper and pencil collectionof patient-reported outcomes in patients with non-smallcell lung cancer. Patient 2008; 1 (2): 105-13

4. Gwaltney CJ, Shields AL, Shiffman S. Equivalence of elec-tronic and paper-and-pencil administration of patient-reported outcome measures: a meta-analytic review. ValueHealth 2008; 11 (2): 322-33

5. Shields A, Gwaltney C, Tiplady B, et al. Grasping the FDA’sPRO guidance. Appl Clin Trials 2006; 15 (8): 69-72

6. Coons SJ, Gwaltney CJ, Hays RD, et al. Recommendationson evidence needed to support measurement equivalencebetween electronic and paper-based patient-reported out-come (PRO) measures: ISPOR ePRO good research prac-tices task force report. Value Health 2009; 12 (4): 419-29

7. Cohen SB, Strand V, Aguilar D, et al. Patient- versus phy-sician-reported outcomes in rheumatoid arthritis patientstreated with recombinant interleukin-1 receptor antagonist(anakinra) therapy. Rheumatology 2004; 43 (6): 704-11

8. EuroQol Group. EuroQol: a new facility for the measure-ment of health-related quality of life. Health Policy 1990;16 (3): 199-208

9. Hurst NP, Jobanputra P, Hunter M, et al. Validity ofEuroqol – a generic health status instrument – in patientswith rheumatoid arthritis. Economic andHealth OutcomesResearch Group. Br J Rheumatol 1994; 33 (7): 655-62

10. Fries JF, Spitz P, Kraines RG, et al. Measurement of patientoutcome in arthritis. Arthritis Rheum 1980; 23 (2): 137-45

11. Ramey DA, Fries JF, Singh G. The Health AssessmentQuestionnaire 1995: status and review. In: Spilker B, edi-tor. Quality of life and pharmacoeconomics in clinicaltrials. Philadelphia (PA): Lippincott Williams & Wilkins,1996: 227-37

12. Bruce B, Fries JF. The Health Assessment Questionnaire(HAQ). Clin Exp Rheumatol 2005; 23 (5 Suppl. 39): S14-8

13. Ramachandran S, Lundy J, Coons S. Testing the measure-ment equivalence of paper and touch-screen versions of theEQ-5D visual analog scale (EQ VAS). Qual Life Res 2008;17 (8): 1117-20

14. Cleeland CS, Ryan KM. Pain assessment: global use ofthe Brief Pain Inventory. Ann Acad Med Singapore 1994;23 (2): 129-38

15. Melzack R. The short-form McGill Pain Questionnaire.Pain 1987; 30 (2): 191-7

16. Melzack R. TheMcGill pain questionnaire: from descriptionto measurement. Anesthesiology 2005; 103 (1): 199-202

17. Yellen SB, Cella DF, Webster K, et al. Measuring fatigueand other anemia-related symptoms with the FunctionalAssessment of Cancer Therapy (FACT) measurement sys-tem. J Pain Symptom Manage 1997; 13 (2): 63-74

18. Mallinson T, Cella D, Cashy J, et al. Giving meaning tomeasure: linking self-reported fatigue and function to per-formance of everyday activities. J Pain Symptom Manage2006; 31 (3): 229-41

19. Ware Jr JE, Sherbourne CD. The MOS 36-item short-formhealth survey (SF-36): I. Conceptual framework and itemselection. Med Care 1992; 30 (6): 473-83

20. Mystakidou K,Mendoza T, Tsilika E, et al. Greek brief paininventory: validation and utility in cancer pain. Oncology2001; 60 (1): 35-42

21. Dolan P. Modeling valuations for EuroQol health states.Med Care 1997; 35 (11): 1095-108

22. Spritzer K. SAS code for scoring 36-item health survey ver-sion 2.0 standard form (not acute!) [online]. Available fromURL: http://gim.med.ucla.edu/FacultyPages/Hays/UTILS/sf36v2-4-public.sas [Accessed 2007 Sep 4]

23. Conover WJ, Iman R. Rank transformations as a bridgebetween parametric and nonparametric statistics. Am Stat1981; 35: 124-9

24. McGraw KO, Wong SP. Forming inferences about someintraclass correlation coefficients. Psychol Methods 1996;1 (1): 30-46

25. Vacha-Haase T, Thompson B. How to estimate and interpretvarious effect sizes. J Couns Psychol 2004; 51 (4): 473-81

26. Haywood KL, Garratt AM, Dziedzic K, et al. Genericmeasures of health-related quality of life in ankylosingspondylitis: reliability, validity and responsiveness. Rheu-matology 2002; 41 (12): 1380-7

27. Hurst NP, Kind P, Ruta D, et al. Measuring health-relatedquality of life in rheumatoid arthritis: validity, responsive-ness and reliability of EuroQol (EQ-5D). Br J Rheumatol1997; 36 (5): 551-9

142 Tiplady et al.

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)

28. Kirwan JR, Reeback JS. Stanford Health AssessmentQuestionnaire modified to assess disability in Britishpatients with rheumatoid arthritis. Br J Rheumatol 1986;25 (2): 206-9

29. Sullivan FM, Eagers RC, Lynch K, et al. Assessment ofdisability caused by rheumatic diseases in general practice.Ann Rheum Dis 1987; 46 (8): 598-600

30. Ekdahl C, Eberhardt K, Andersson SI, et al. Assessing dis-ability in patients with rheumatoid arthritis: use of aSwedish version of the Stanford Health Assessment Ques-tionnaire. Scand J Rheumatol 1988; 17 (4): 263-71

31. Guillemin F, Briancon S, Pourel J. Validity and discriminantability of the HAQ Functional Index in early rheumatoidarthritis. Disabil Rehabil 1992; 14 (2): 71-7

32. Harper R, Brazier JE, Waterhouse JC, et al. Comparison ofoutcome measures for patients with chronic obstructivepulmonary disease (COPD) in an outpatient setting.Thorax 1997; 52 (10): 879-87

33. Fransen M, Edmonds J. Reliability and validity of theEuroQol in patients with osteoarthritis of the knee. Rheu-matology (Oxford) 1999; 38 (9): 807-13

34. Konig HH, Ulshofer A, Gregor M, et al. Validation of theEuroQol questionnaire in patients with inflammatory boweldisease. Eur J Gastroenterol Hepatol 2002; 14 (11): 1205-15

35. Bushnell DM, Reilly MC, Galani C, et al. Validation ofelectronic data capture of the Irritable Bowel Syndrome –Quality of LifeMeasure, theWork Productivity andActivityImpairment Questionnaire for Irritable Bowel Syndromeand the EuroQol. Value Health 2006; 9 (2): 98-105

36. Daut RL, Cleeland CS, Flanery RC. Development of theWisconsin Brief Pain Questionnaire to assess pain in cancerand other diseases. Pain 1983; 17 (2): 197-210

37. Ware Jr JE. SF-36 health survey update. Spine 2000; 25 (24):3130-9

38. Hagell P, Hoglund A, Reimer J, et al. Measuring fatigue inParkinson’s disease: a psychometric study of two briefgeneric fatigue questionnaires. J Pain Symptom Manage2006; 32 (5): 420-32

39. Strand LI, Ljunggren AE, Bogen B, et al. The Short-FormMcGill Pain Questionnaire as an outcome measure: test-retest reliability and responsiveness to change. Eur J Pain2008; 12 (7): 917-25

40. Hanscom B, Lurie JD, Homa K, et al. Computerized ques-tionnaires and the quality of survey data. Spine 2002;27 (16): 1797-801

41. Velikova G, Wright EP, Smith AB, et al. Automated col-lection of quality-of-life data: a comparison of paper andcomputer touch-screen questionnaires. J Clin Oncol 1999;17 (3): 998-1007

42. Bushnell DM, Martin ML, Parasuraman B. Electronic ver-sus paper questionnaires: a further comparison in personswith asthma. J Asthma 2003; 40 (7): 751-62

43. Cook MR, Gerkovich MM, Graham C, et al. Effects of thenicotine patch on performance during the first week ofsmoking cessation. Nicotine Tob Res 2003; 5 (2): 169-80

44. Gaertner J, Elsner F, Pollmann-Dahmen K, et al. Electronicpain diary: a randomized crossover study. J Pain SymptomManage 2004; 28 (3): 259-67

45. Kvien TKK, Mowinckel P, Heiberg T, et al. Performance ofhealth status measures with a pen-based personal digitalassistant. Ann Rheum Dis 2005; 64 (10): 1480-4

46. Richter JG, Becker A, Koch T, et al. Self-assessments ofpatients via tablet PC in routine patient care: comparisonwith standardised paper questionnaires. Ann Rheum Dis2008; 67: 1739-41

47. Yarnold PR, Stewart MJ, Stille FC, et al. Assessing func-tional status of elderly adults via microcomputer. PerceptMot Skills 1996; 82 (2): 689-90

48. Begg A, Drummond G, Tiplady B. Assessment of post-surgical recovery after discharge using a pen computerdiary. Anaesthesia 2003; 58 (11): 1101-5

49. Kurt R, Bogner HR, Straton JB, et al. Computer-assistedassessment of depression and function in older primarycare patients. Comput Methods Programs Biomed 2004;73 (2): 165-71

50. Millsopp L, Frackleton S, Lowe D, et al. A feasibility studyof computer-assisted health-related quality of life datacollection in patients with oral and oropharyngeal cancer.Int J Oral Maxillofac Surg 2006; 35 (8): 761-4

Correspondence: Professor Brian Tiplady, 8 Braid Crescent,Edinburgh EH10 6AU, UK.E-mail: [email protected]

Patient-Reported Outcomes in Rheumatoid Arthritis 143

ª 2010 Adis Data Information BV. All rights reserved. Patient 2010; 3 (3)