11
REVIEW ARTICLE Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence Stefanie Schmidt, MPH a,b,c , Montse Ferrer, PhD, MD a,c,d, *, Marta Gonz alez, PhD e,f , Nerea Gonz alez, PhD f,g , Jos e Maria Valderas, PhD, MD h , Jordi Alonso, PhD, MD a,b,c , Antonio Escobar, PhD, MD e,f , Kalliopi Vrotsou, MSc f,i , EMPRO Group y a IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain b Universitat Pompeu Fabra, Barcelona, Spain c CIBER Epidemiolog ıa y Salud P ublica, Barcelona, Spain d Universitat Aut onoma de Barcelona, Barcelona, Spain e Research Unit, University Hospital of Basurto, Bilbao, Spain f Health Services Research on Chronic Patients Network (REDISSEC), Barcelona, Spain g Research Unit, Hospital of Galdakao-Usansolo, Usansolo, Spain h Health Services and Policy Research Group, Department of Primary Care Health Sciences, University of Oxford, Oxford, UK i Research Unit, Primary Care-Organization of Integrated Health Services, Gipuzkoa, Spain Background: The aim of this study was to perform a standardized and systematic evaluation of the avail- able evidence on multi-item shoulder-specific patient-reported outcome measures that are applicable to a wide spectrum of disorders. Materials and methods: A systematic review was conducted in PubMed to identify articles with informa- tion regarding the development process, metric properties, and administration issues of shoulder-specific patient-reported outcome measures. Two experts independently reviewed all the articles identified for one instrument and applied the EMPRO (Evaluating Measures of Patient Reported Outcomes) tool, which was designed to assess the quality of attributes in a standardized way. An overall EMPRO score and 6 attribute-specific scores were calculated (range, 0-100) to describe the quality of instrument perfor- mance. Ethical committee approval: not applicable. *Reprint requests: Montse Ferrer, PhD, MD, Health Services Research Group, IMIM (Hospital del Mar Medical Research Institute), Doctor Aiguader, 88, 08003 Barcelona, Spain. E-mail address: [email protected] (M. Ferrer). y The EMPRO (Evaluating Measures of Patient Reported Outcomes) Group participants are as follows: Jordi Alonso; Montse Ferrer; Stefanie Schmidt; Olatz Garin; Gemma Vilagut; Angels Pont; Yolanda Pardo; Gabriela Barbaglia; Pere Castellvi; Carlos Garc ıa-Forero; Ana Redondo; Virginia Becerra; Ester Villalonga; Mireya Garcia Duran; Sonia Rojas; Oriol Cunillera; Jos e Mar ıa Ramada Rodilla (IMIM [Hospital del Mar Medical Research Institute]); Luis Rajmil and Silvia L opez (Catalan Agency for Health Information, Assessment and Quality); Michael Herdman (Insight Consulting & Research SL); Jos e M. Valderas (University of Oxford); Pablo Rebollo (BAP LA-SER Outcomes); Juan Ignacio Arrar as (Hospital de Navarra); Aida Ribera (Hospital Universitario Vall d’Hebron); Nerea Gonz alez and Miren Orive (Hospital of Galdakao); Gabriela Medin (Hos- pital General Universitario Gregorio Mara~ n on); Amado Ribero (Fundaci on Canaria de Investigaci on y Salud); Susana Garc ıa and Ir ıa Mel endez (Hospital Sant Joan de D eu); Marcela Cortes (Iberoamerican Cochrane Network); and Carlota las Hayas (Universidad de Deusto). J Shoulder Elbow Surg (2014) 23, 434-444 www.elsevier.com/locate/ymse 1058-2746/$ - see front matter Ó 2014 Journal of Shoulder and Elbow Surgery Board of Trustees. http://dx.doi.org/10.1016/j.jse.2013.09.029

Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Embed Size (px)

Citation preview

Page 1: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Ethical committ

*Reprint req

Group, IMIM (

Aiguader, 88, 08

E-mail addrey The EMPRO

Group participan

Schmidt; Olatz

Gabriela Barbag

Virginia Becerra

Cunillera; Jos�e M

J Shoulder Elbow Surg (2014) 23, 434-444

1058-2746/$ - s

http://dx.doi.org

www.elsevier.com/locate/ymse

REVIEW ARTICLE

Evaluation of shoulder-specific patient-reported outcomemeasures: a systematic and standardized comparisonof available evidence

Stefanie Schmidt, MPHa,b,c, Montse Ferrer, PhD, MDa,c,d,*, Marta Gonz�alez, PhDe,f,Nerea Gonz�alez, PhDf,g, Jos�e Maria Valderas, PhD, MDh, Jordi Alonso, PhD, MDa,b,c,Antonio Escobar, PhD, MDe,f, Kalliopi Vrotsou, MScf,i, EMPRO Groupy

aIMIM (Hospital del Mar Medical Research Institute), Barcelona, SpainbUniversitat Pompeu Fabra, Barcelona, SpaincCIBER Epidemiolog�ıa y Salud P�ublica, Barcelona, SpaindUniversitat Aut�onoma de Barcelona, Barcelona, SpaineResearch Unit, University Hospital of Basurto, Bilbao, SpainfHealth Services Research on Chronic Patients Network (REDISSEC), Barcelona, SpaingResearch Unit, Hospital of Galdakao-Usansolo, Usansolo, SpainhHealth Services and Policy Research Group, Department of Primary Care Health Sciences, University of Oxford,Oxford, UKiResearch Unit, Primary Care-Organization of Integrated Health Services, Gipuzkoa, Spain

Background: The aim of this study was to perform a standardized and systematic evaluation of the avail-able evidence on multi-item shoulder-specific patient-reported outcome measures that are applicable to awide spectrum of disorders.Materials and methods: A systematic review was conducted in PubMed to identify articles with informa-tion regarding the development process, metric properties, and administration issues of shoulder-specificpatient-reported outcome measures. Two experts independently reviewed all the articles identified forone instrument and applied the EMPRO (Evaluating Measures of Patient Reported Outcomes) tool,which was designed to assess the quality of attributes in a standardized way. An overall EMPRO scoreand 6 attribute-specific scores were calculated (range, 0-100) to describe the quality of instrument perfor-mance.

ee approval: not applicable.

uests: Montse Ferrer, PhD, MD, Health Services Research

Hospital del Mar Medical Research Institute), Doctor

003 Barcelona, Spain.

ss: [email protected] (M. Ferrer).

(Evaluating Measures of Patient Reported Outcomes)

ts are as follows: Jordi Alonso; Montse Ferrer; Stefanie

Garin; Gemma Vilagut; Angels Pont; Yolanda Pardo;

lia; Pere Castellvi; Carlos Garc�ıa-Forero; Ana Redondo;

; Ester Villalonga; Mireya Garcia Duran; Sonia Rojas; Oriol

ar�ıa Ramada Rodilla (IMIM [Hospital del Mar Medical

Research Institute]); Luis Rajmil and Silvia L�opez (Catalan Agency for

Health Information, Assessment and Quality); Michael Herdman (Insight

Consulting & Research SL); Jos�e M. Valderas (University of Oxford); Pablo

Rebollo (BAP LA-SER Outcomes); Juan Ignacio Arrar�as (Hospital de

Navarra); Aida Ribera (Hospital Universitario Vall d’Hebron); Nerea

Gonz�alez and Miren Orive (Hospital of Galdakao); Gabriela Medin (Hos-

pital General Universitario Gregorio Mara~n�on); Amado Ribero (Fundaci�onCanaria de Investigaci�on y Salud); Susana Garc�ıa and Ir�ıa Mel�endez

(Hospital Sant Joan de D�eu); Marcela Cortes (Iberoamerican Cochrane

Network); and Carlota las Hayas (Universidad de Deusto).

ee front matter � 2014 Journal of Shoulder and Elbow Surgery Board of Trustees.

/10.1016/j.jse.2013.09.029

Page 2: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Standardized comparison of shoulder disorder instruments 435

Results: We identified 11 instruments and 112 articles (2-30 articles per instrument). The AmericanShoulder and Elbow Surgeons (ASES) shoulder assessment, Simple Shoulder Test (SST), and OxfordShoulder Score (OSS) were the best rated, with overall scores of 77.4 points, 72.6 points, and 69.7 points,respectively. They have been shown to be valid, reliable, and responsive, with a low administration burden.Acceptable results were also found for the Flexilevel Scale of Shoulder Function, Shoulder Pain andDisability Index, and Dutch Shoulder Disability Questionnaire, but some of their attributes need furtherevaluation.Conclusions: Current evidence supports the use of the ASES, SST, or OSS. We recommend the SST forlongitudinal studies or clinical trials, the Dutch Shoulder Disability Questionnaire for clinical practice tominimize administration burden, and the ASES or OSS to discriminate among patients’ or groups’ evalu-ations at one point of time.Level of evidence: Validation of Outcome Instruments, Systematic Review.� 2014 Journal of Shoulder and Elbow Surgery Board of Trustees.

Keywords: Shoulder pain; disability evaluation; quality of life; questionnaires; outcome assessment; psy-

chometrics; validation studies

The shoulder is one of the most complex joints of thehuman body. Shoulder-related disorders account for sub-stantial medical, economic, and social costs21,43,46 andcomprise a wide spectrum of problems. Shoulder disordersare mostly accompanied by pain and restricted movement ofthe arm or shoulder that lead to difficulties in performingcertain activities.1,21,34 Recent research suggests that shoul-der pain not only affects function during work and leisure-time activities but also may interfere with psychologicaland social well-being.30 A systematic review showed that theestimated prevalence of shoulder pain in the general popu-lation varies greatly among studies, with a lifetime preva-lence from 7% to 67%.24 In fact, shoulder or neck pain is oneof the most frequent work-related complaints and a frequentreason for work absence.26 Data from a prospective studyconducted in the Netherlands showed that 30% of theworkers diagnosed with a new episode of shoulder pain re-ported taking sick leave during the 6-month follow-up timebecause of the shoulder disorder.19

The impact of shoulder disorders can be assessed indifferent ways. Traditionally, the assessment has beenperformed locally by focusing on the functional aspects ofthe pathology and evaluating the range of motion, strength,or pain.3 However, especially because the value of patient-reported outcome (PRO) measures is becoming recognizedand widely used in medical research, this approach ischanging. Nowadays, research aims to determine theoverall impact this problem has on daily life activities andhow the psychological well-being of the patient isaffected.3 PRO instruments provide subjective informationgiven by the patient himself or herself. PROs generallyfocus on the assessment of physical function, psychosocialissues, or general health-related quality of life, trying tocapture the possible effect of a condition, a disease, or anintervention by incorporating the experience and perceptionof the patient.4,41 Numerous generic and disease-specificPRO measures exist.13 Several share a similar purpose,

content, and applicability, yet slight differences might exist,calling for the need to evaluate those instruments consid-ering their strengths and weaknesses. For example, some ofthe PRO measures have been designed for the whole upperextremity; others, specifically for the shoulder. Some in-struments are shoulder disease-specific (eg, rotator cuffdisease or osteoarthritis) or population specific (eg,wheelchair users),9,25,48 whereas others are independent ofthe underlying condition. Therefore, it is a complicated taskto select the correct PRO measure for a specific purpose,considering among all those available.

PRO measurement requires reliable and valid in-struments, which must be adequately selected based on theindividual study purpose, setting, and available resources.Direct comparison among instruments regarding theirperformance characteristics, such as measurement model,metric properties, and administration issues, can facilitatethis task. Efforts have been made to classify or evaluateshoulder-specific PRO measures,2,3,16,27,29,33,37,38 but so far,neither has the whole spectrum of the performance char-acteristics been examined nor has a direct comparisonamong shoulder-specific PRO measures been undertaken.

The EMPRO (Evaluating Measures of Patient ReportedOutcomes) tool was developed to facilitate a standardized,comprehensive, and comparative evaluation of PRO mea-sures.42 It combines 3 fundamental requirements: (1) well-described and established quality attributes for assessment,(2) expert reviewers to conduct the assessment, and (3)scores that allow direct comparisons among outcomemeasures. The EMPRO tool is based on an exhaustive se-ries of recommendations regarding the ideal attributes ofPRO measures.40 It has been shown to be valid and usefulin the evaluation of generic patient-reported outcomemeasures,42 as well as for specific pathologies such as heartfailure12 and localized prostate cancer.39

The aim of this study was to perform a standardized andsystematic evaluation of the available evidence on the

Page 3: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

436 S. Schmidt et al.

development process, metric properties, and administrationissues of multi-item shoulder-specific PRO measures thatare applicable to a wide spectrum of shoulder disorders.Our results should help clinicians and researchers to selectthe best-performing shoulder-specific PRO measure.

Materials and methods

Identification of shoulder-specific PRO measuresand their relevant information

We carried out a systematic literature review in the PubMeddatabase (March 2011) to obtain all the available published evi-dence. We combined keywords using Medical Subject Headings(MeSH) terms and free-text entries: (Shoulder or Shoulder Joint orShoulder Pain or Rotator Cuff) and (Quality of Life or Ques-tionnaires or Disability Evaluation or Cross-Cultural Comparison)(Appendix 1, available on the journal’s website at www.jshoulderelbow.org). Articles were eligible for inclusion if theycontained information on the development process, the metricproperties, or the administration issues of multi-item shoulder-specific PRO measures. We excluded articles about PRO measuresdesigned for musculoskeletal conditions in general, the upperextremity as a whole, specific shoulder conditions (eg, osteoar-thritis or instability), specific populations (eg, wheelchair users orathletes), and systemic diseases (eg, breast or oral cancer). Wefurthermore excluded research protocols, congress abstracts, sec-ondary research articles, and articles not available in English.

In a 3-step process, titles, abstracts, and full-text articles wereindependently reviewed by 2 investigators (one trained in healthsciences [S.S.] and the other trained in medical statistics [K.V.],both holding a Master of Public Health). A third researcher (M.F.)was appointed to mediate and resolve possible discrepancies ineach of the steps. In addition, we examined the bibliographicreference lists in the articles selected for full review manually tocomplete the search.

EMPRO tool

The EMPRO tool was designed to measure the quality of PROmeasures and is composed of 8 attributes and 39 items.42 It assesseshow well the development process of the outcome measure wasdesigned and how it is described (conceptual and measurementmodel), how well the instrument performs in terms of metricproperties (reliability, validity, responsiveness to change, andinterpretability), and it’s administrative issues (burden, alternativemodes of administration, and cross-cultural and linguistic adapta-tions). The EMPRO tool is a valid and reliable tool that has beenused successfully in the comparison of both generic and condition-specific PRO measures.12,39,42

All EMPRO attributes and items are accompanied by a shortdescription to explain what the expert should focus on, as well asto facilitate the understanding of the intended meaning of eachitem in the evaluation process to guarantee standardization.Agreement with each item is measured on a 4-point Likert scale,from 4 (strongly agree) to 1 (strongly disagree). Experts can checkthe ‘‘no information’’ box in case of insufficient information. Fiveitems allow a reply of ‘‘not applicable.’’ Experts are asked to

provide detailed comments to justify their ratings on each item.These comments aid in the interpretation of the EMPRO scores.

Standardized and systematic EMPRO evaluation

Each shoulder-specific PRO measure was assigned to 2 differentexperts, who had been identified and invited because of theirexpertise and experience in PRO measurement (6 belonged to theEMPRO tool development working group and 16 had previouslybeen accredited as EMPRO experts by undergoing a trainingcourse). To minimize the potential for bias, the experts were notauthors and had not been involved in the development, evaluation,or adaptation process of any of the instruments evaluated.

The EMPRO evaluation process consisted of 2 consecutiverounds. In the first round, every expert evaluated the assignedshoulder-specific PRO measure independently by reviewing theprovided full-text articles that were identified in the systematicliterature review and then applying the EMPRO tool.42 In thesecond round, each expert was provided with the rating results ofthe other reviewer. In case of discrepancies, they were invited toresolve them through discussion to reach a final consensus. A thirdreviewer was available to settle discrepancies if needed.

Calculation of EMPRO scores

The attribute-specific scores were obtained by calculating theresponse mean of the applicable items when at least 50% of themwere rated. Items for which the option ‘‘no information’’ had beenselected were assigned a score of 1 (lowest possible score). Theresponse means were then linearly transformed to a range of 0 to100 (worst to best). Separate subscores for reliability and burdenwere calculated because these attributes are divided into 2 com-ponents: ‘‘internal consistency’’ and ‘‘reproducibility’’ for reli-ability and ‘‘respondent’’ and ‘‘administrative’’ for burden. Forreliability, the highest subscore was then chosen to represent theattribute score. In addition, we calculated an overall score thatconsisted of the mean of the 5 metric-related attributes: concep-tual and measurement model, reliability, validity, responsivenessto change, and interpretability. The overall score was onlycalculated when at least 3 of these 5 attributes had a rating (0 wasassigned to attributes with insufficient information). EMPROscores were considered reasonably acceptable12 if they reached atleast 50 points (half of the maximum score). We calculated theweighted k statistics for ordinal response scales to assess thedegree of agreement between experts in the EMPRO ratings. Theagreement coefficient is interpreted as follows: less-than-chanceagreement, 0 or less; slight agreement, 0.01 to 0.20; fair agree-ment, 0.21 to 0.40; moderate agreement, 0.41 to 0.60; substantialagreement, 0.61 to 0.80; and almost perfect agreement, 0.81 to0.99.45 Analysis was performed with SPSS statistics software,version 12 (SPSS, Chicago, IL, USA), and graphics weredesigned with Microsoft Excel 2003 (Microsoft, Redmond, WA,USA).

Results

We identified 2,325 articles in our systematic literaturesearch (Fig. 1). After the title review, we excluded 1,726

Page 4: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Review of titles(n = 2325)

1726 articles excluded

224 articles excluded:- No PRO measure used (111)

- Generic PRO measure used (40)- Not original article format (33)

- Measure other than for shoulder pathology used (30)- Article without metric property information (8)

- Article not available in English (2)

Review of abstracts(n = 599)

Review of articles (n = 375)[52 instruments]

274 articles excluded [41 instruments]:- Measure for specific shoulder condition [11]

- Measure not patient-reported [9]- Measure not shoulder-specific [5]

- Measure for musculoskeletal conditions [3]- Whole upper extremity measure [3]- Measure for special population [3]- Measure for systemic disease [3]

- Measure not available in English [3]- Not multi-item measure [1]

Articles identified by hand-search (n = 11)

Articles identified and used in the EMPRO evaluation (n = 112)

[11 instruments]

Figure 1 Flowchart of systematic literature review. Informationabout the number of articles included and excluded at each step ispresented.

Standardized comparison of shoulder disorder instruments 437

articles because they were not topic related. Abstractswere reviewed, and a further 224 articles were excluded:111 did not contain any PRO measure; 40 used onlygeneric PRO measures; 33 were secondary research arti-cles; 30 included disease-specific outcome measures otherthan shoulder disorder measures; 8 were lacking infor-mation regarding the development process, metric prop-erties, or administration issues; and 2 articles were writtenin a language other than English. We identified 375 arti-cles with information concerning 52 different instruments.After application of the defined exclusion criteria, 274articles related to 41 instruments were excluded, mostlybecause they were only applicable to patients with ashoulder-specific condition (11 instruments) or they werenot patient-reported (9) or not shoulder specific (5). Byreviewing the bibliographic lists of the identified articles,we included 11 additional articles that met the inclusioncriteria. Finally, 112 articles provided information aboutthe development process, metric properties, or adminis-tration issues of 11 shoulder-specific PRO measures at theend of the review process.

Eleven shoulder-specific PRO measures, together withinformation regarding their performance characteristics,

were identified and evaluated with the EMPRO tool(Table I). The number of published articles found varied permeasure from 2 to 30. The instruments were developedbetween 1987 and 2003 and are applicable to a variety ofshoulder disorders. Of the 11 instruments, 7 are unidi-mensional; the others include 2 to 7 dimensions. Theircontent is mainly related to pain and function and isassessed by the evaluation of daily life activities. Theoutcome measures with a broader focus may additionallyinclude psychosocial issues (appetite or social contacts) orsatisfaction with shoulder performance. Answer options arebased on dichotomous scales (yes/no answer options) orLikert, numeric, or visual analog scales. The number ofitems included varies from 5 to 30. They take between 3and 10 minutes to complete, and the time framework rangesfrom the last 24 hours to the last month.

Agreement between pairs of experts on the first inde-pendent evaluations was moderate to substantial (weightedk coefficient � 0.4) for most instruments, whereas for 3instruments, the agreement was fair (0.26 for FlexilevelScale of Shoulder Function [FLEX-SF] and 0.25 for UnitedKingdom Shoulder Disability Questionnaire) or slight (0.17for Subjective Shoulder Rating System [SSRS]). Thedetailed EMPRO results after consensus are presented inTable II and summarized graphically in Figure 2. FinalEMPRO scores were determined by a consensus ratingbetween the 2 experts for every outcome measure; in mostcases, the third reviewer was not needed for discrepancyresolution. The overall summary scores oscillated between77.4 and 26.7 points. Thereby, 6 of the 11 shoulder-specificPRO measures scored above the threshold of 50 points, thuspresenting acceptable overall results: the American Shoul-der and Elbow Surgeons shoulder assessmentdpatient self-evaluation section (ASES-p), the Simple Shoulder Test(SST), the Oxford Shoulder Score (OSS), the FLEX-SF, theShoulder Pain and Disability Index (SPADI), and theDutch Shoulder Disability Questionnaire (SDQ-NL).Appendix 2 (available on the journal’s website at www.jshoulderelbow.org) shows the articles used in theEMPRO evaluation.

The scores for the conceptual and measurement modelranged from 81 to 14.3 points, whereby the ASES-p (81points) and the OSS, FLEX-SF, and SDQ-NL (66.7 pointsfor each) reached the highest scores. Four instrumentsscored below 50 points, whereas for the Penn ShoulderScore (PSS), we could not find sufficient information tocalculate this attribute. Eight measures were judged to bereliable, with reliability scores ranging from 83.3 points(SPADI) to 50 points (Shoulder Rating Questionnaire). TheSDQ-NL and the SSRS scored low (41.6 points), and forthe United Kingdom Shoulder Disability Questionnaire, wecould not find sufficient information to calculate a reli-ability score. Validity scores in general were quite high.The SDQ-NL reached the highest rating (93.4 points),followed by the ASES-p, FLEX-SF, and SST (all �80points). In addition, the OSS and the SPADI were shown

Page 5: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Table I Summarized characteristics of identified shoulder disorder–specific instruments

Instrument Author (year) EMPRO

articles

Purpose of development Shoulder disorder Response options and

comments

Time framework No. of items

(time to

complete)

Dimensions (No. of items)

ASES-p Richards et al35

(1994)

30 Standardized form for

assessment of shoulder

function

Variety of shoulder

disorders

Visual analog scale (pain

item), 4-point Likert scales

(activities of daily living)

Score range, 0-100 (worst to

best)

Not restricted to

any period

11 (<5 min) Pain (1) Function (10)

FLEX-SF Cook et al6

(2003)

2 To develop adaptive scale

that combines measurement

precision with low response

burden

Variety of shoulder

disorders

6-point Likert scales

Consists of 3 testlets: easy,

medium, and hard

Patient completes 1 of 3

testlets based on his or her

response on initial

screening question

Score range, 0-60 (worst to

best)

Not restricted to

any period

15 (NI) d

OSS Dawson et al10

(1996)

17 To assess outcomes after

shoulder operation

Shoulder operations

(not stabilization)

5-point Likert scales

Score range, 12-60 (best to

worst)

New scoring system

recommended: 0-48 (worst

to best)11

Last month 12 (<4 min) d

PSS Leggin and

Iannotti22

(1999)

5 To develop region-specific

shoulder outcome measure

Variety of shoulder

disorders

0- to 3-point or 0-to 10-point

scales

Score range, 0-100 (worst to

best)

Not restricted to

any period

24 (<10 min) Pain (3)

Function (20)

Satisfaction (1)

SDQ-NL (also known

as van der Heijden

shoulder disability

questionnaire)

van der Heijden44

(2000)

6 To evaluate functional

disability for clinical trial

patients

Soft tissue shoulder

disorders

Yes/no answer options

All items are pain related

Score range, 0-100 (best to

worst)

Last 24 h 16 (3 min) d

SDQ-UK (also known

as Croft shoulder

disability

questionnaire)

Croft et al8

(1994)

3 To assess restriction in everyday

activities resulting from

shoulder symptoms

Shoulder pain Yes/no answer options

Score range, 0-100 (best to

worst)

Last 24 h 22 (NI) d

SPADI Roach et al36

(1991)

26 To measure pain and disability

associated with shoulder

pathology

Shoulder pain Initially visual analog

scales

Later, scales were transformed

to numeric scales to be

suitable for telephone

administration

Score range, 0-100 (best to

worst)47

Last week 13 (5-8 min) Pain (5)

Function (8)

438

S.Sch

midtet

al.

Page 6: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

SRQ(alsoknown

asL’Insalata

Self-Administered

Questionnaire

[SAQ])

L’Insalata

etal20

(1997)

6Designed

toassess

symptoms

andfunctionofshoulder

Variety

ofshoulder

disorders

5-pointLikertscales,visual

analogscale(global

assessment)

Non-graded

questionto

select

2areasin

whichpatient

believesimprovementis

most

important

Score

range,

17-100(w

orst

tobest)

Last

month

21(5-10min)

Global

assessment(1)

Pain(4)

Activitiesofdaily

living

(6)

Work

(5)

Recreational

andathletic

activities

(3)

Satisfaction(1)

Improvement(1)

SSI

Patte

32(1987)

2Disabilityoutcomeassessment

forfunctioninganddaily

activities

Variety

ofshoulder

disorders

Yes/noansw

eroptions

d30(7

min)

d

SSRS

KohnandGeyer

18

(1997)

3Disabilityoutcomeassessment

forfunctioninganddaily

activities

Variety

ofshoulder

disorders

0-to

5-pointor0-to

35-point

scales

Score

range,

0-100(w

orstto

best)

d5(<

3min)

d

SST(alsoknownas

Patte

score)

Lippitt

etal23

(1993)

12

Function-based

outcome

assessmenttool

Variety

ofshoulder

disorders

Yes/noansw

eroptions

Score

range,

0-12(w

orstto

best)

Notrestricted

to

anyperiod

12(<

3min)

d

NI,Noinform

ation;SDQ-UK,United

Kingdom

Shoulder

DisabilityQuestionnaire;SRQ,Shoulder

RatingQuestion

naire;SSI,Shoulder

Severity

Index.

Standardized comparison of shoulder disorder instruments 439

to be valid instruments (75 points and 66.6 points, re-spectively). The SSRS, as well as the Shoulder RatingQuestionnaire, scored below the threshold. For the PSS, wecould not find sufficient information to calculate a score.The responsiveness-to-change attribute scores were alsohigh and ranged from 100 points (SST and SDQ-NL) to33.3 points for the FLEX-SF, which received its worstresult for this attribute. Seven of the 11 instruments pre-sented information to evaluate their interpretability, butonly 4 presented acceptable results: the ASES-p and theOSS (66.7 points), as well as the SST and the FLEX-SF(55.6 points).

For the burden attribute (Table II), the SDQ-NL reachedthe maximum score (100 points), whereas the ASES-p,OSS, SDQ-UK, and SSRS also presented acceptableEMPRO scores (66.7-91.7 points), meaning that they pre-sent both a low respondent and administrative burden. Theattribute of alternative forms of administration was onlyapplicable for the FLEX-SF and the SPADI, which devel-oped a computer adaptive test version7 and a telephone-interview version,47 respectively. For the other evaluatedshoulder-specific PRO measures, only the original self-administered paper version exists. Finally, the attribute ofcross-cultural and linguistic adaptation (3 items) was notconsidered in this report because our study did not aimto assess the quality of country-specific versions.Articles reporting on the metric properties of these adaptedversions (eg, Arabic,49 Italian,31 German,15 Portuguese,17

and Turkish5 ASES-p versions) were considered in ourEMPRO evaluation but were not evaluated separately.

Discussion

In this study, we assessed the quality of multi-item shoul-der-specific PRO measures that are designed for patientswith a wide spectrum of shoulder disorders by systemati-cally evaluating conceptual, metric, and administrativecharacteristics. Twenty-two experts in PRO measurementassessed the 11 identified outcome measures, and the bestrated according to the EMPRO standard criteria were theASES-p, SST, and OSS. Acceptable results were also foundfor 3 other questionnaires, the FLEX-SF, SPADI, and SDQ-NL. All 6 of these instruments are relatively short and easyto administer and should be serious candidate options fora wide range of purposes and settings, specially the first 3instruments.

The ASES-p obtained the best overall score (around80 points), followed by the SST and OSS (both around70 points). TheASES-pwas always among the top 3 outcomemeasures in the 5 attributes that were used for the overallscore calculation, except for the responsiveness attribute,where it obtained the fourth place because of the littleinformation available on stable group comparison. TheASES-p repeatedly scored above 70 points, except for

Page 7: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Table II Expert ratings of each EMPRO item and attribute for every identified shoulder disorder–specific instrument

Attribute ASES-p FLEX-SF OSS PSS SDQ-NL SDQ-UK SPADI SRQ SSI SSRS SST

Concept and measurement model 81 66.7 66.7 66.7 47.6 52.4 52.4 14.3 28.6 52.4Concept of measurement stated þþþþ þþþþ þþþþ þþþþ þþþþ þþþþ þþþþ þþþþ þþþþ þþ þþþþObtaining and combining items described þþþþ þþþþ þþ – þþþþ þþþ þþ þþ – þþ þþRationality for dimensionality and scales þþþþ þþþ þþ – þþ þþ þþ þþ – þþ þþInvolvement of target population – þþþþ þþþþ – þþ þþþ þ þþþþ – – þþScale variability described and adequate þþþþ – þþþþ þþ þþþ þþ þþþ þþ þ þþ þþþþLevel of measurement described þþþ þþþ þþ – þþ þ þþ þþ þ þþ þþProcedures for deriving scores þþþþ þþ þþþ þþþ þþþþ þþ þþþþ þþ þ þþ þþ

Reliability: global score 75 66.7 62.5 55.6 41.7 83.3 50 66.7 41.7 75Reliability: internal consistency 75 66.7 62.5 55.6 41.7 83.3 50 58.3

Data collection methods described þþþþ þþþ þþþ þþþþ þþþ – þþþþ þþ – – þþþCronbach alpha adequate þþþþ þþþþ þþþþ þþþ þþþþ – þþþþ þþþ – – þþþIRT estimates provided – þþ – þ – – þþþ – – – þþþTesting in different populations þþþþ NA NA NA – – þþþ þþþþ – – þþ

Reliability: reproducibility 75 58.3 58.3 50 66.6 50 66.7 41.7 75Data collection methods described þþþþ þþ þþþ þþ – – þþþ þþ þþþ þþ þþþþTest-retest and time interval adequate þþþþ þþþþ þþþ þþþ – þþþþ þþþþ þþþþ þþþþ þþþ þþþþReproducibility coefficients adequate þþþþ þþþþ þþþþ þþþþ – – þþþþ þþþ þþþþ þþþ þþþþIRT estimates provided – – – – – – – – – – –

Validity 86.7 83.3 75 93.3 50 66.7 25 50 40 80Content validity adequate þþþ þþþþ þþþ – þþþþ þþ þþ þþ þ þþ þþConstruct/criterion validity adequate þþþþ þþþ þþþ þþþ þþþþ þþþ þþþ þþ þþ þþ þþþþSample composition described þþþþ þþþ þþþ – þþþþ þþþ þþþ þ þþþ þþ þþþþPrior hypothesis stated þþþ þþþþ þþþþ þþþ þþþ þþ þþþþ þþ þþþþ þþ þþþRationale for criterion validity NA NA NA NA NA NA NA NA NA NA NATested in different populations þþþþ NA NA NA þþþþ NA þþþ NA NA þþþ þþþþ

Responsiveness to change 77.8 33.3 77.8 44.4 100 88.9 77.8 77.8 44.4 66.7 100Adequacy of methods þþþþ þþ þþþ þþ þþþþ þþþ þþþ þþ þþþ þþ þþþþDescription of estimated magnitude of change þþþþ þþ þþþþ þþþþ þþþþ þþþþ þþþþ þþþþ þþþ þþþ þþþþComparison of stable and unstable groups þþ þþ þþþ þ þþþþ þþþþ þþþ þþþþ þ þþþþ þþþþ

Interpretability 66.7 55.6 66.7 33.3 22.2 11.1 0 55.6Rationale of external criteria þþþ þþþ þþþ þþ – – þþ þþ þ – þþþDescription of interpretation strategies þþþ þþ þþ þþ – – þþ þ þ – þþHow data should be reported stated þþþ þþþ þþþþ þþ – – þ þ – – þþþ

Overall score 77.4 61.1 69.7 26.7 60.3 37.3 60.5 43.3 35.1 35.4 72.6Burden scoreBurden I: respondent 55.6 88.9 11.1 100 77.8 22.2 22.2 11.1 66.7 88.9

Skills and time needed þþþ – þþþ þþ þþþþ þþþþ þþ þþ þþ þþþþ þþþþImpact on respondents þþ þþþ þþþþ þ þþþþ þþþ þþ þþ þ þþþþ þþþþNot suitable circumstances þþþ – þþþþ – þþþþ þþþ – þ – – þþþ

440

S.Sch

midtet

al.

Page 8: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Burden

II:administrative

91.7

16.7

66.7

75

100

58.3

50

33.3

25

50

41.7

Resources

required

þþþ

þþþþ

þþþþ

þþþþ

þþþþ

þþþ

þþþ

þþþ

þþþþ

þTimerequired

þþþþ

––

þþþþ

þþþþ

þþþþ

–þþ

þþþþ

––

Trainingandexpertise

needed

þþþþ

–þþ

þþ–

þþþþ

þþþ

–þ

þ–

–Burden

ofscore

calculation

þþþþ

þþþþ

þþþ

þþþþ

þþ–

þþþþ

þþþþ

þþþ

þþþþ

þþAlternativeform

sofadministration

66.7

83.3

Metriccharacteristicsofalternativeform

sNA

þþþ

NA

NA

NA

NA

þþþþ

NA

NA

NA

NA

Comparabilityofalternativeform

sNA

þþþ

NA

NA

NA

NA

þþþ

NA

NA

NA

NA

Ascore

ofþþ

þþindicates4(strongly

agree);þþ

þ,3;þþ

,2;þ,

1(strongly

disagree);and–,noinform

ation.Thehigher

theagreem

ent,thebettertherating.

NA,Notapplicable;SDQ-UK,United

Kingdom

Shoulder

DisabilityQuestionnaire;SRQ,Shoulder

RatingQuestion

naire;SSI,Shoulder

Severity

Index;IRT,

Item

Response

Theory.

Standardized comparison of shoulder disorder instruments 441

interpretability (66.7 points). It uses a minimal clinicallyimportant difference for score interpretation, which wasestimated to be 6.5 points.28 The SST scored among the top 3in reliability, responsiveness to change, and interpretability.In contrast, it scored low (52.4 points) for the attribute ofconceptual and measurement model because insufficientinformation was found about its development process,involvement of the target population, andmeasurement level.An anchor-based strategy is proposed for its interpretation bylinking its scores with different levels of disease severity.14

The OSS was among the top 3 in conceptual and measure-ment model and in interpretability, and it also reached goodresults for validity and responsiveness. Its reliability wasbelow 70 points because some aspects of methods (such asdata collection or time interval for test-retest evaluation)could be either improved or better described. Because these 3instruments are similar in content, number of items, andadministration time, the choice among them could be madebased on their dimensionality or answer options: the ASES-pis bidimensional and permits separate scores to be obtainedfor pain and function, using a visual analog scale for pain andLikert scales to assess function; the SST and OSS are uni-dimensional, with dichotomous and Likert response options,respectively.

The FLEX-SF, SPADI, and SDQ-NL shared an overallscore around 60 points. These 3 instruments presentedacceptable results for all attribute-specific scores except 1:the FLEX-SF failed on responsiveness, the SPADI oninterpretability, and the SDQ-NL on reliability. Regardingthe FLEX-SF,6 its primary feature comes from its structureof 3 different testlets designed to minimize the respondent’sburden. Each testletdeasy, medium, and harddconsists of15 items that can then be flexibly administered offeringeach patient only adequate questions, although the initialscreening question could require a higher administrativeburden. In addition, a computer adaptive test version7 hasbeen developed and evaluated to facilitate data adminis-tration in large studies, although it requires greater re-sources such as hardware and software. Nevertheless, thelow expert ratings on the responsiveness attribute deservecomment: Although high standardized coefficients werereported, it was not clear which methods were used in thelongitudinal design to obtain them.

The SPADI36 is a commonly used instrument but itclearly requires further research for interpretability. TheSPADI’s answer options initially consisted of visual analogscales but were later transformed to numerical scales withthe purpose of making it suitable for telephone adminis-tration, an alternative version which was also judged to bereliable and valid.47 The SDQ-NL requires further reli-ability testing. However, it could be a very good option formeasuring change over time in longitudinal studies orclinical surveillance, not only because of its excellentresponsiveness but also because of its low respondentburden (average time needed to complete <3 minutes andeasy yes/no answer options).44

Page 9: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Figure 2 Overall ranking of instruments and their attribute-specific EMPRO scores. EMPRO scores ranged from 0 to 100 (worst to best).Instruments included the following: ASES-p; FLEX-SF; OSS; PSS; SDQ-NL; United Kingdom Shoulder Disability Questionnaire (SDQ-UK); SPADI; Shoulder Rating Questionnaire (SRQ); Shoulder Severity Index (SSI); SSRS; and SST.

442 S. Schmidt et al.

Our study has some limitations that deserve discussion.First, the basis of the EMPRO evaluation is the informationretrieved from a systematic literature review conducted onlyin the PubMed database. Although PubMed is the leadingdatabase in health sciences, wemay have failed to identify allthe eligible shoulder-specific PRO measures or all the pub-lished articles with their specific information on the devel-opment process, metric properties, and administration issues.However, our sensitive search strategy, as well as the addi-tional hand search of identified articles, may haveminimizedthis problem. Second, because the EMPRO assessment isbased on the published evidence, it is affected by the quantityand quality of this available information. A lack of evidenceon a few items or attributes penalizes the EMPRO results

because the worst possible rating is assigned in these cases.Nevertheless, to avoid a strong penalization, the score of theEMPRO attribute was not obtained if more than half of theitems were missing. Most of the evaluated instruments werepenalized because of missing information on the interpret-ability attribute, pointing out the necessity of developinginterpretability strategies as a facilitator for the extension ofthese measures beyond the research setting. Third, theEMPRO ratings may have been biased by the individualexpertise of the evaluators. However, the review by pairs, theconsensus round, and the instructions on EMPRO items mayhave attenuated this concern. Finally, because our objectivewas to conduct an EMPRO evaluation of the instrument,studies conducted with country-specific versions were

Page 10: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

Standardized comparison of shoulder disorder instruments 443

considered but not evaluated separately because it was notfeasible. Despite the noise that these country-specific ver-sions could have introduced in the final ratings, we think thatadding them reflects the whole spectrum of currently avail-able evidence.

To our knowledge, this is the first study that provides astandardized and reliable expert-based evaluation of theavailable shoulder-specific PRO measures used in patientswith different disorders. The basis of our assessment is theavailable published information retrieved in a systematicliterature review. Each outcome measure was independentlyreviewed by 2 experts who reached final ratings byconsensus. Our findings can be of interest in clinical practiceaswell as in research to help in selecting the correct shoulder-specific PRO measure for a certain purpose, facilitating de-cision making for individual patient care, or improvingpatient-doctor communication by understanding how thepatient feels and acts in daily life. Wewould like to highlightthat we excluded specific shoulder condition measures (eg,osteoarthritis or instability) from our evaluation because theywere beyond the scope of this article. Nevertheless, their usemight be more adequate in some situations. Furthermore, wewould like to add that instruments developed to be respondedto by a clinician (eg, the Constant shoulder score) were notincluded because the EMPRO tool is specifically designedfor PRO measures and contains certain items that are onlyapplicable to this purpose.

Conclusions

The evidence presented suggests that the ASES-p, SST,and OSS are the first options for measuring function anddisability in patients with shoulder disorders. These in-struments have been shown to be highly reliable, valid,and responsive, with an acceptable conceptual and mea-surement model, interpretability, and low administrativeburden. The use of the FLEX-SF, SPADI, and SDQ-NLcan also be recommended because they presentedacceptable properties for most of the attributes. Choosingamong these instruments will mainly depend on particularstudy requirements. For use in longitudinal studies orclinical trials, where responsiveness to change andreproducibility are the maximum priority, the SST wouldbe recommended. In clinical practice, for patient sur-veillance, the SDQ-NL might be preferred to minimizerespondent and administrative burden, but further infor-mation on its reliability is needed. To discriminate amongpatients’ or groups’ evaluations at one point of time, theASES-p or OSS could be the most reliable and validoption. Our results may facilitate the decision-makingprocess regarding the correct instrument selection and itsuse and interpretation for a certain study purpose orsetting. Nevertheless, more research on the metric prop-erties of these instruments is necessary because some of

the evaluations were based on a small number of articles.In addition, there is room for improvement in the overallscore even for the best instruments currently available.

Disclaimer

This study was funded by the Department of Health,Government of the Basque Country, Spain (Project No.2010111156).

The authors, their immediate families, and anyresearch foundations with which they are affiliated havenot received any financial payments or other benefits fromany commercial entity related to the subject of this article.

Supplementary data

Supplementary data related to this article can be foundonline at http://dx.doi.org/10.1016/j.jse.2013.09.029.

References

1. Allander E. Prevalence, incidence, and remission rates of some common

rheumatic diseases or syndromes. Scand J Rheumatol 1974;3:145-53.

2. Angst F, Pap G, Mannion AF, Herren DB, Aeschlimann A,

Schwyzer HK, et al. Comprehensive assessment of clinical outcome

and quality of life after total shoulder arthroplasty: usefulness and

validity of subjective outcome measures. Arthritis Rheum 2004;51:

819-28. http://dx.doi.org/10.1002/art.20688

3. Beaton DE, Richards RR. Measuring function of the shoulder. A cross-

sectional comparison of five questionnaires. J Bone Joint Surg Am

1996;78:882-90.

4. Black N. Patient reported outcome measures could help transform

healthcare. BMJ 2013;346:f167. http://dx.doi.org/10.1136/bmj.f167

5. Celik D, Atalar AC, Demirhan M, Dirican A. Translation, cultural

adaptation, validity and reliability of the Turkish ASES questionnaire.

Knee Surg Sports Traumatol Arthrosc 2013;21:2184-9. http://dx.doi.

org/10.1007/s00167-012-2183-3

6. Cook KF, Roddey TS, Gartsman GM, Olson SL. Development and

psychometric evaluation of the Flexilevel Scale of Shoulder Function.

Med Care 2003;41:823-35.

7. Cook KF, Roddey TS, O’Malley KJ, Gartsman GM. Development of a

Flexilevel Scale for use with computer-adaptive testing for assessing

shoulder function. J Shoulder Elbow Surg 2005;14:90S-4S. http://dx.

doi.org/10.1016/j.jse.2004.09.024

8. Croft P, Pope D, Zonca M, O’Neill T, Silman A. Measurement of

shoulder related disability: results of a validation study. Ann Rheum

Dis 1994;53:525-8.

9. Curtis KA, Roach KE, Applegate EB, Amar T, Benbow CS,

Genecco TD, et al. Reliability and validity of the Wheelchair User’s

Shoulder Pain Index (WUSPI). Paraplegia 1995;33:595-601.

10. Dawson J, Fitzpatrick R, Carr A. Questionnaire on the perceptions of

patients about shoulder surgery. J Bone Joint Surg Br 1996;78:593-600.

11. Dawson J, Rogers K, Fitzpatrick R, Carr A. The Oxford shoulder score

revisited. Arch Orthop Trauma Surg 2009;129:119-23. http://dx.doi.

org/10.1007/s00402-007-0549-7

Page 11: Evaluation of shoulder-specific patient-reported outcome measures: a systematic and standardized comparison of available evidence

444 S. Schmidt et al.

12. Garin O, Herdman M, Vilagut G, Ferrer M, Ribera A, Rajmil L, et al.

Assessing health-related quality of life in patients with heart failure: a

systematic, standardized comparison of available measures. Heart Fail

Rev. In press 2013. http://dx.doi.org/10.1007/s10741-013-9394-7

13. Garratt A, Schmidt L, Mackintosh A, Fitzpatrick R. Quality of life mea-

surement: bibliographic study of patient assessed health outcome mea-

sures. BMJ 2002;324:1417. http://dx.doi.org/10.1136/bmj.324.7351.1417

14. Godfrey J, Hamman R, Lowenstein S, Briggs K, Kocher M. Reli-

ability, validity, and responsiveness of the simple shoulder test: psy-

chometric properties by age and injury type. J Shoulder Elbow Surg

2007;16:260-7. http://dx.doi.org/10.1016/j.jse.2006.07.003

15. John M, Angst F, Awiszus F, King GJ, MacDermid JC, Simmen BR.

The American Shoulder and Elbow Surgeons elbow questionnaire:

cross-cultural adaptation into German and evaluation of its psycho-

metric properties. J Hand Ther 2010;23:301-13. http://dx.doi.org/10.

1016/j.jht.2010.03.001

16. Kirkley A, Griffin S, Dainty K. Scoring systems for the functional

assessment of the shoulder. Arthroscopy 2003;19:1109-20. http://dx.

doi.org/10.1016/j.arthro.2003.10.030

17. Knaut LA, Moser AD, Melo SA, Richards RR. Translation and cul-

tural adaptation to the Portuguese language of the American Shoulder

and Elbow Surgeons Standardized Shoulder assessment form (ASES)

for evaluation of shoulder function. Rev Bras Reumatol 2010;50:176-

89. http://dx.doi.org/10.1590/S0482-50042010000200007

18. Kohn D, Geyer M. The subjective shoulder rating system. Arch Orthop

Trauma Surg 1997;116:324-8.

19. Kuijpers T, van der Windt DA, van der Heijden GJ, Twisk JW,

Vergouwe Y, Bouter LM. A prediction rule for shoulder pain related

sick leave: a prospective cohort study. BMC Musculoskelet Disord

2006;7:97. http://dx.doi.org/10.1186/1471-2474-7-97

20. L’Insalata JC, Warren RF, Cohen SB, Altchek DW, Peterson MG. A

self-administered questionnaire for assessment of symptoms and

function of the shoulder. J Bone Joint Surg Am 1997;79:738-48.

21. Largacha M, Parsons IM, Campbell B, Titelman RM, Smith KL,

Matsen F III. Deficits in shoulder function and general health asso-

ciated with sixteen common shoulder diagnoses: a study of 2674 pa-

tients. J Shoulder Elbow Surg 2006;15:30-9. http://dx.doi.org/10.1016/

j.jse.2005.04.006

22. Leggin BG, Iannotti J. Shoulder outcome measurement. In: Iannotti J,

Williams G, editors. Disorders of the shoulder: diagnosis and manage-

ment. Philadelphia: Lippincott, Williams &Wilkins; 1999. p. 1024-40.

23. Lippitt S, Harryman D, Matsen F. A practical tool for evaluating

function: the simple shoulder test. In: Matsen F, Hawkins R, editors.

The shoulder: a balance of mobility and stability. Rosemont (IL):

American Academy of Orthopaedic Surgeons; 1993. p. 501-18.

24. Luime JJ, Koes BW, Hendriksen IJ, Burdorf A, Verhagen AP,

Miedema HS, et al. Prevalence and incidence of shoulder pain in the

general population; a systematic review. Scand J Rheumatol 2004;33:

73-81. http://dx.doi.org/10.1080/03009740310004667

25. McClure P, Michener L. Measures of adult shoulder function: the

American Shoulder and Elbow Surgeons Standardized Shoulder Form

Patient Self-Report Section (ASES), Disabilities of the Arm, Shoulder,

and Hand (DASH), Shoulder Disability Questionnaire, Shoulder Pain

and Disability Index (SPADI), and Simple Shoulder Test. Arthritis

Care Res 2003;S49:50-8. http://dx.doi.org/10.1002/art.11404

26. Mehlum IS, Kjuus H, Veiersted KB, Wergeland E. Self-reported work-

related health problems from the Oslo Health Study. Occup Med

(Lond) 2006;56:371-9. http://dx.doi.org/10.1093/occmed/kql034

27. Michener LA, Leggin BG. A review of self-report scales for the

assessment of functional limitation and disability of the shoulder. J

Hand Ther 2001;14:68-76.

28. Michener LA, McClure PW, Sennett BJ. American Shoulder and

Elbow Surgeons Standardized Shoulder Assessment Form, patient

self-report section: reliability, validity, and responsiveness. J Shoulder

Elbow Surg 2002;11:587-94. http://dx.doi.org/10.1067/mse.2002.

127096

29. Oh JH, Jo KH, Kim WS, Gong HS, Han SG, Kim YH. Comparative

evaluation of the measurement properties of various shoulder outcome

instruments. Am J Sports Med 2009;37:1161-8. http://dx.doi.org/10.

1177/0363546508330135

30. Paananen M, Taimela S, Auvinen J, Tammelin T, Zitting P,

Karppinen J. Impact of self-reported musculoskeletal pain on health-

related quality of life among young adults. Pain Med 2011;12:9-17.

http://dx.doi.org/10.1111/j.1526-4637.2010.01029.x

31. Padua R, Padua L, Ceccarelli E, Bondi R, Alviti F, Castagna A. Italian

version of ASES questionnaire for shoulder assessment: cross-cultural

adaptation and validation. Musculoskelet Surg 2010;94(Suppl 1):S85-

90. http://dx.doi.org/10.1007/s12306-010-0064-9

32. Patte D. Directions for the use of the index severity for painful and/or

chronically disabled shoulders. First Open Congress of the European

Society of Surgery of the Shoulder and Elbow; Paris; 1987. p. 36-41.

33. Placzek JD, Lukens SC, Badalanmenti S, Roubal PJ, Freeman DC,

Walleman KM, et al. Shoulder outcome measures: a comparison of 6

functional tests. Am J Sports Med 2004;32:1270-7. http://dx.doi.org/

10.1177/0363546503262193

34. Pope DP, Croft PR, Pritchard CM, Silman AJ. Prevalence of shoulder

pain in the community: the influence of case definition. Ann Rheum

Dis 1997;56:308-12.

35. Richards RR, An KN, Bigliani LU, Friedman RJ, Gartsman GM,

Gristina AG, et al. A standardized method for the assessment of

shoulder function. J Shoulder Elbow Surg 1994;3:347-52.

36. RoachKE,Budiman-MakE, SongsiridejN,LertratanakulY.Development

of a shoulder pain and disability index. Arthritis Care Res 1991;4:143-9.

37. Roe Y, Soberg HL, Bautz-Holter E, Ostensjo S. A systematic review of

measures of shoulder pain and functioning using the International Clas-

sification of Functioning, Disability and Health (ICF). BMC Muscu-

loskelet Disord 2013;14:73. http://dx.doi.org/10.1186/1471-2474-14-73

38. Romeo AA, Bach BR Jr, O’Halloran KL. Scoring systems for shoulder

conditions. Am J Sports Med 1996;24:472-6.

39. Schmidt S, Pardo Y, Vilagut G, Garin O, Pont A, Cunillera O, et al.

Systematic evaluation of specific quality of life instruments for pros-

tate cancer. ISPOR 14th Annual European Congress. Rational Health

Care Decision Making in Challenging Economic Times. 5-8

November 2011, Madrid, Spain. Value Health 2011;14:A458.

40. Scientific Advisory Committee of the Medical Outcomes Trust.

Assessing health status and quality-of-life instruments: attributes and

review criteria. Qual Life Res 2002;11:193-205.

41. Testa MA, Simonson DC. Assessment of quality-of-life outcomes. N

Engl J Med 1996;334:835-40.

42. Valderas JM, Ferrer M, Mendivil J, Garin O, Rajmil L, Herdman M,

et al. Development of EMPRO: a tool for the standardized assessment

of patient-reported outcome measures. Value Health 2008;11:700-8.

http://dx.doi.org/10.1111/j.1524-4733.2007.00309.x

43. van der Heijden GJ. Shoulder disorders: a state-of-the-art review.

Baillieres Best Pract Res Clin Rheumatol 1999;13:287-309.

44. van der Heijden GJ, Leffers P, Bouter LM. Shoulder disability ques-

tionnaire design and responsiveness of a functional status measure. J

Clin Epidemiol 2000;53:29-38.

45. Viera AJ, Garrett JM. Understanding interobserver agreement: the

kappa statistic. Fam Med 2005;37:360-3.

46. VirtaL, Joranger P,Brox JI, ErikssonR.Costs of shoulder pain and resource

use in primary health care: a cost-of-illness study in Sweden. BMC Mus-

culoskelet Disord 2012;13:17. http://dx.doi.org/10.1186/1471-2474-13-17

47. Williams JW Jr, Holleman DR Jr, Simel DL. Measuring shoulder

function with the Shoulder Pain and Disability Index. J Rheumatol

1995;22:727-32.

48. Wright RW, Baumgarten KM. Shoulder outcomes measures. J Am

Acad Orthop Surg 2010;18:436-44.

49. Yahia A, Guermazi M, Khmekhem M, Ghroubi S, Ayedi K,

Elleuch MH. Translation into Arabic and validation of the ASES index

in assessment of shoulder disabilities. Ann Phys Rehabil Med 2011;

54:54-72. http://dx.doi.org/10.1016/j.rehab.2010.12.002