Upload
jefferson-traebert
View
214
Download
2
Embed Size (px)
Citation preview
A number of scales have been developed that
measure the functional and psychosocial impacts
of oral disorders. The scores derived from these
scales are usually used to compare the ‘oral health-
related quality of life’ (OHRQoL) of groups defined
by age, gender, socioeconomic status, education
and ethnicity. The assumption underlying such
comparisons is that of measurement equivalence.
That is, it is assumed that the scale and the items
comprising the scale function in the same way
across groups defined by these sociodemographic
variables. If a scale and its items are not equivalent
then any differences detected may be an artefact of
the measurement process rather than a reflection of
actual sub-group differences in the underlying trait
or construct being measured by the scale (1). Lack
of equivalence may occur when one or more items
in a scale is interpreted differently by different
groups or there are group variations in the
relevance of the concepts the items represent. One
way of establishing the equivalence of items across
the sub-groups participating in a study is through
*Correction added after online publication January 14, 2010: Lindye Foster Page was changed to Lyndie Foster Page.
Community Dent Oral Epidemiol 2010; 38: 129–135All rights reserved
� 2010 John Wiley & Sons A/S
Differential item functioning in aBrazilian–Portuguese version ofthe Child PerceptionsQuestionnaire (CPQ11-14)Traebert J, de Lacerda JT, Thomson WM, Foster Page L, Locker D. Differentialitem functioning in a Brazilian–Portuguese version of the Child PerceptionsQuestionnaire (CPQ11-14). Community Dent Oral Epidemiol 2010.� 2010 John Wiley & Sons A ⁄ S.
Abstract – Objective: To determine whether a Portuguese language version ofthe Child Perceptions Questionnaire for 11–14-year-olds (CPQ11-14) showeddifferential item functioning (DIF) when compared with the original Englishlanguage version. Methods: CPQ11-14 data from a school-based Brazilian study(n = 138) was compared with CPQ11-14 data collected as part of a school-basedstudy conducted in New Zealand (n = 322). In order to detect DIF, ordinallogistic regression analysis was performed with each CPQ11-14 item as thedependent variable. The independent variables were language group (Englishversus Portuguese), the CPQ11-14 sub-scale score of which the item was a part,and an interaction term for language*sub-scale score. Nonuniform DIF wasdeemed to be present if the interaction term was significant. Moderate to largeuniform DIF was deemed to be present if after removing the interaction termthe b coefficient (log odds ratio) for language group was significant andnumerically greater than 0.64. Analyses were also undertaken to detect pseudo-DIF. Results: Nonuniform DIF was found in five items and moderate to largeuniform DIF in an additional four items. Analyses using ‘purified’ sub-scalescores indicated that little of the DIF detected was pseudo-DIF. A comparison ofthe language groups using DIF affected and DIF-free overall and subscaleCPQ11-14 scores revealed that the DIF detected had only a marginal effect on thedifferences between language groups in scores. Conclusion: Oral health-relatedquality of life questionnaires, particularly those that have been translated, needto be assessed for DIF and its likely impact on group comparisons.
Jefferson Traebert1, Josimari Telino de
Lacerda2, W. Murray Thomson3, Lyndie
Foster Page3* and David Locker4
1Grupo de Pesquisa em Saude Bucal
Coletiva, Universidade do Sul de Santa
Catarina, Tubarao, SC, Brazil,2Departamento de Saude Publica,
Universidade Federal de Santa Catarina,
Florianopolis, SC, Brazil, 3Department of
Oral Sciences, University of Otago, Dunedin,
Otago, New Zealand, 4Community Dental
Health Services Research Unit, Faculty of
Dentistry, University of Toronto, Toronto,
ON, Canada
Key words: differential item functioning;measurement equivalence; oral health-related quality of life measures; test bias
Prof. Traebert, Jefferson Av. Jose AcacioMoreira, 787 Dehon Tubarao Santa CatarinaBrazil 88704900,Tel: 55 48 36213363Fax: 55 48 36213363e-mail: [email protected]
Submitted 18 December 2008;accepted 31 October 2009
doi: 10.1111/j.1600-0528.2009.00525.x 129
the use of differential item functioning (DIF) (1).
While DIF has been used to assess the assumption
of equivalence with respect to the quality of life
instruments used in medicine, it has not yet been
used with respect to the instruments in common
use in oral health research. Consequently, research
is needed to ensure that OHRQoL scales and items
do function in the same way irrespective of age,
gender, socioeconomic status, education and eth-
nicity. This is particularly important given the
current emphasis on disparities in oral health.
Measurement equivalence is also of concern
where a measure has been developed in one
language or culture and is translated and adapted
for use in another language or culture. Most
measures of OHRQoL were developed in English-
speaking countries such as the UK, USA, Australia
and Canada and almost all have been translated
and used in studies involving European, Asian,
Middle Eastern and Central and South American
populations (2, 3). Although most investigators
follow strict guidelines for the translation of ques-
tionnaires, differences between the original and
translated version can still occur (4), either because
items have been poorly translated or because exact
equivalence of words, meaning or concept may be
difficult to achieve. Where this happens, the trans-
lated items may not function in the same way as
their English language counterparts. This can be
ascertained through an examination of DIF.
Differential item functioning analysis has its
origin in educational psychology and is used to
determine whether the questions on a test have the
same level of difficulty for individuals from differ-
ent social groups who are equivalent in terms of
intelligence or aptitude. A simple definition of DIF
as applied to health status instruments is as
follows: ‘An item in a scale exhibits DIF if
responses to the item differ across groups (such
as, gender or race) after controlling for an estimate
of the construct the scale is intended to measure’
(1). For example, if an item in a scale intended to
measure physical functioning does not exhibit DIF,
then people with the same level of physical
functioning should respond similarly to that item,
irrespective of group membership. If an item does
exhibit DIF then people from different groups but
with the same level of physical functioning would
have different responses to the item (1).
Two types of DIF can occur, uniform and
nonuniform DIF. Uniform DIF indicates that the
responses of a group to an item are systematically
higher or lower than the responses of a
comparison group across the full range of the
construct being measured. Nonuniform DIF is
present if there is an interaction between group
membership and construct level so that the direc-
tion of the DIF varies across the range of the
construct. For example, if responses are systemat-
ically higher at low levels of the trait but system-
atically lower at high levels of the trait then an
item manifests nonuniform DIF (5, 6). An item can
exhibit nonuniform and uniform DIF at the same
time and this has been referred to as ‘nonuniform
asymetrical DIF’ (5). However, since the ultimate
aim of DIF analysis is to develop a DIF-free scale;
that is, one that exhibits homogenous item func-
tioning (7), once an item is found to manifest one
form of DIF it is unnecessary for all practical
purposes to test for other forms of DIF (1).
A number of analytic methods have been used to
detect DIF in health status questionnaires that have
been translated or used in heterogeneous popula-
tions that include different racial or ethnic groups.
The most common methods used are three-way
contingency table analysis, logistic regression and
statistical techniques derived from Item Response
Theory (IRT) (1). Logistic regression can be used to
detect nonuniform and uniform DIF, can be used
with items that have dichotomous or polytomous
response formats, and criteria are available for the
estimation of the magnitude of the DIF (8).
The study reported in this paper used ordinal
logistic regression to assess the item equivalence of
the English language version of the Child Percep-
tions Questionnaire for 11–14 year olds (CPQ11-14)
(9) and a Portuguese language version developed
in Brazil (10). The CPQ11-14 is a generic measure of
the functional and psychosocial impacts of oral
disorders in children of that age. Children are
asked if, in the past 3 months, they have experi-
enced the problems described by 37 items in four
domains: oral symptoms (OS-6 items), functional
limitations (FL-9 items); emotional well-being (EW-
9 items) and social well-being (SW-13 items). The
response options are: never (0), once ⁄ twice (1),
sometimes (2), often (3) and every day ⁄ almost
every day (4). The Portuguese language version
had psychometric properties comparable to the
original English language version when used in
Canada (10). The primary purpose of the paper is
to illustrate the use of ordinal logistic regression
analysis in DIF detection in order to ascertain the
conceptual and operational comparability of the
English language instrument and its Portuguese
counterpart.
130
Traebert et al.
Methods
Study population and designTwo data sets were used in the analysis: (i) a
population-based study of Brazilian children who
completed the Brazilian–Portuguese language ver-
sion of the CPQ11-14; and, (ii) a population-based
study of New Zealand children who completed the
original English language version. The primary
objective of both studies was to assess the oral
health-related quality of life of children using the
CPQ11-14.
The Brazilian data was obtained as part of a cross-
sectional study involving all 12–14 year-old school-
children (n = 138) from two schools in a deprived
area of the city of Florianopolis in the Southern
Brazilian State of Santa Catarina. Children com-
pleted the CPQ11-14 in their own classrooms, before
having a dental examination. Parental consent was
obtained for the participation of each child in the
research. The research proposal was approved by
the Ethics Committee for Research at the Univer-
sidade Federal de Santa Catarina.
The New Zealand study involved a simple
random sample of 322 12- and 13-year-old children
of non-Maori origin enroled in the Taranaki District
Health Board’s school dental service (11). Each child
completed the CPQ11-14 in the dental clinic waiting
room just prior to having a dental examination.
Consent was obtained from both parents and
children before proceeding, and ethical approval
was obtained from the Taranaki Ethics Committee.
Data analysisDetecting nonuniform and uniform DIF . The two
data sets were merged and analyses undertaken to
compare the item responses of the Brazilian and
New Zealand children. The null hypothesis exam-
ined in each comparison was that there was no
association between responses to the CPQ11-14
items and language group after controlling for an
estimate of the construct measured by the sub-scale
that included the item. An estimate of the construct
was obtained by summing the responses to the
items in each subscale. For example, we examined
whether or not there was an association between
responses to the item ‘In the past 3 months, how
often have you had pain in your teeth, lips, jaws or
mouth?’ and language group after controlling for
the Oral Symptoms (OS) sub-scale score. The item
concerning pain was one of six comprising the OS
sub-scale. All of the other items in the OS sub-scale
were examined in the same way.
Since the CPQ11-14 items are scored on an ordinal
scale, the analytic procedure used was Ordinal
Logistic Regression (OLR) (12, 13). The main
assumption of this technique is that the odds ratio
for a covariate, such as language, is constant for all
categories of the ordinal outcome variable; in this
case, the item response. Given the distribution of
responses to the individual items, with responses
to the lower scored categories being more probable,
the negative log–log link function was employed.
The software used in the analysis was spss 15.0.
The analytic strategy used was that suggested by
Teresi and Fleishman (1) and Petersen et al. (4). For
each item the response to the item was first
modelled as a logit-linear function of a dichoto-
mous variable denoting language group, the sub-
scale score and an interaction term which is the
product of language group and the sub-scale score.
If the interaction term was significant this provided
evidence of nonuniform DIF. If the interaction term
was not significant it was then removed and the
analysis repeated with the remaining two variables
in the model, language group and sub-scale score.
Moderate-to-large uniform DIF was considered to
be present if the language group variable was
significant and the regression coefficient was
numerically larger than 0.64; that is the odds ratio
was outside the interval 0.53–1.89. This standard
was supplied by the Educational Testing Service
(14) as adapted by Bjorner et al. (7). In the analyses,
the English-speaking group was coded 1 and the
Brazilian group was coded 2 and was assigned by
spss to be the reference category. Consequently,
when the regression coefficient (log odds ratio) for
language group was positive, the English language
respondents had an increased likelihood of a
higher score on the item in question, meaning they
reported more frequent impacts. When it was
negative, the English language respondents had
an increased likelihood of a lower score on the item
and reported less frequent impacts.
Since the analysis of items within each sub-scale
involved multiple comparisons, the P-value was
adjusted to account for the number of analyses per
sub-scale. Since the OS sub-scale has six items a
P-value was considered to be significant if it was
less than 0.05 ⁄ 12 (6 items and a test for nonuniform
and uniform DIF for each) or 0.004. For the other
sub-scales the P-values were set as follows: FL –
0.003; EW – 0.003 and SW – 0.002.
Detecting pseudo-DIF . Where a sub-scale contains
more than one item with significant DIF, true DIF
in one item can produce DIF in other items (4, 7).
131
Differential item functioning in a questionnaire
This is known as ‘pseudo-DIF’. This means that the
sub-scale score might not be the best control
variable to use in the analysis. An ideal control
would be a valid measure of the underlying
construct that is not derived from the items being
tested for DIF. Such measures rarely exist (4). A
compromise is to construct a ‘purified’ scale score
from the items that are not affected by DIF and use
this score as the control variable in the regression
analysis (1). Consequently, all items showing DIF
were reassessed in OLR analyses in which the
control variables were sub-scale scores obtained by
summing the responses to the items that did not
manifest DIF in the initial analyses.
Results
Characteristics of the participantsThe age and gender characteristics of the study
groups are given in Table 1. The groups were
broadly comparable in terms of gender but there
were some differences in the age distributions.
However, the mean ages were identical, both being
12.7 years. Independent samples t-tests indicated
that the Brazilian sample had a significantly higher
overall CPQ11-14 score than the New Zealand
sample and significantly higher scores for all four
sub-scales (Table 2).
Nonuniform and uniform DIFThe interaction term for language and sub-scale
score was significant, indicating nonuniform DIF,
for five items; one from the FL sub-scale, three from
the EW sub-scale and 1 from the SW sub-scale.
These items were FL-‘Taken longer than others to
complete a meal’ (P < 0.001); EW – ‘Felt unsure of
yourself’ (P < 0.001), ‘Felt shy or embarrassed’
(P = 0.001), and ‘Worried that you are not as
good-looking as others’ (P = 0.002); SW – ‘Other
children teased or called you names’ (P = 0.002).
These items are italicized in Table 3.
Table 3 also presents the results of the uniform
DIF analyses. Four of the 32 items tested met the
criteria for moderate to large DIF; one from the OS
sub-scale, two from the FL sub-scale and one from
the SW sub-scale. The b coefficients (log odds
ratios) and P-values for these items are shown in
bold. Two of the b coefficients (log odds ratios)
were positive and two negative.
Pseudo-DIFSince only one of the OS items manifested DIF,
pseudo-DIF detection was not undertaken. For the
FL sub-scale a ‘purified’ score was constructed by
summing the responses to the six DIF-free items.
The three items showing DIF were then reassessed
using the DIF-free sub-scale score as the control. All
still showed evidence of DIF, with only a modest
reduction in the b coefficients (log odds ratios) for
those manifesting uniform DIF. The same proce-
dure was used with the emotional well-being
sub-scale, where three items had nonuniform DIF.
The P-values for these three items remained signif-
icant. In the analysis of items from the SW sub-scale,
the item with uniform DIF remained significant but
the interaction term for the item with nonuniform
DIF failed to reach statistical significance.
Table 1. Gender and age distribution of subjects bylanguage ⁄ country
English (New Zealand) Portuguese (Brazil)
n (%) n (%)
GenderMale 167 (51.9) 63 (45.6)Female 155 (48.1) 75 (54.4)
Age (years)12 86 (26.7) 52 (37.7)13 236 (73.3) 74 (53.6)14 – 12 (8.7)Total 322 (100.0) 138 (100.0)
Values are represented as n (%).
Table 2. Mean CPQ11-14 overall and sub-scale scores by language ⁄ country
CPQ11-14
English (New Zealand) Portuguese (Brazil)
Mean (SD) Median Mean (SD) Median
Oral symptoms 4.8 (2.8)a 4.0 5.7 (4.0) 5.0Functional limitations 5.6 (4.3)b 5.0 7.9 (5.8) 7.0Emotional well-being 3.5 (4.7)b 2.0 7.5 (6.4) 7.0Social well-being 4.0 (5.4)b 2.0 7.0 (6.8) 5.0Overall 17.8 (13.9)b 14.0 28.1 (20.0) 24.5
aDifference between mean scores for English and Portuguese language groups: P < 0.05 – Independent samples t-tests.bDifference between mean scores for English and Portuguese language groups: P < 0.001 – Independent samples t-tests.
132
Traebert et al.
Comparison of DIF-free CPQ11-14 scoresIn order to assess the practical, as opposed to the
statistical, significance of the DIF observed,
CPQ11-14 scores were calculated excluding those
items with moderate to large uniform DIF. The
mean scores of the language groups were then
compared. The overall score and all four sub-scale
scores remained significantly different, with the
Brazilian sample having higher mean scores
(Table 4). An effect size statistic for the overall
score, calculated from the difference in means
divided by the pooled standard deviation, changed
only marginally, from 0.65 to 0.62.
Discussion
The primary purpose of this paper is to illustrate
the use of DIF analysis in assessing the measure-
ment equivalence of items in an oral health
outcome questionnaire. Although we used DIF
analysis to assess a translated version of the CPQ11-
14, the same analytic approach can be used to assess
whether or not its items function in the same way
across sub-groups from the same population.
Using ordinal logistic regression analysis we
found evidence of both uniform and nonuniform
DIF in some items of the Brazilian-Portuguese
version of the CPQ11-14. For nonuniform DIF, the
Table 3. Results of uniform DIF analyses
b P-value
Oral symptoms (OS)Pain in your teeth, lips,jaws or mouth
)0.245 0.074
Bleeding gums )0.144 0.337Sores in your mouth 0.221 0.173Bad breath 0.166 0.227Food stuck in or betweenyour teeth
0.452 <0.001
Food stuck in the top ofyour mouth
0.855 <0.001
Functional limitations (FL)Breathed through yourmouth
0.881 <0.001
Taken longer than others toeat a meal
– –
Had trouble sleeping 0.136 0.457Difficult to bite or chewfood like apples, corn onthe cob or steak
)0.305 0.055
Difficult to open yourmouth wide
0.188 0.414
Difficult to say any words )0.425 0.025Difficult to eat foods youwould like to eat
)0.409 0.030
Difficult to drink with astraw
)0.501 0.217
Difficult to drink or eat hotor cold foods
)1.091 <0.001
Emotional well-being (EW)Felt irritable or frustrated 0.538 0.004Felt unsure of yourself – –Felt shy or embarrassed – –Been concerned what otherpeople think about yourteeth, lips, mouth or jaws
)0.305 0.067
Worried that you are not asgood-looking as others
– –
Been upset )0.430 0.016Felt nervous or afraid 0.018 0.923Worried what you are notas health as others
0.064 0.724
orried that you are differentthan other people
0.070 0.742
Social well-being (SW)Missed school because ofpain, appointments, orsurgery
)0.179 0.358
Had a hard time payingattention in school
0.296 0.182
Had difficulty doing yourhomework
)0.224 0.295
Not wanted to speak orread out loud in class
)0.048 0.797
Avoid taking part inactivities like sports, clubs,drama, music, school trips
0.340 0.191
Not wanted to talk to otherchildren
)0.521 0.026
Table 3. (Continued)
b P-value
Avoided smiling orlaughing when aroundother children
)0.708 <0.001
Had difficult playing amusical instrument
)0.563 0.054
Not wanted to spend timewith other children
)0.637 0.006
Argued with other childrenor your family
0.221 0.196
Other children teased you orcalled you names
– –
Other children made youfeel left out
0.187 0.426
Other children asked youquestions about yourteeth, lips, jaws or mouth
0.118 0.526
Regression coefficients (log odds ratios) and P-values inbold indicate items with moderate to large uniform DIF.Item in italics manifest nonuniform DIF: Note thatregression coefficients (log odds ratios) and P-valuesare not given for these items. Also note that analyses todetect uniform DIF were not undertaken for these items.
133
Differential item functioning in a questionnaire
P-value of the interaction term was used to detect
its presence but there are no standards for assess-
ing the magnitude of the DIF observed. Following
Petersen et al. (4), we deemed moderate-to-large
uniform DIF to be present if the language
group variable was statistically significant after
accounting for multiple comparisons and the
absolute magnitude of the logs odds ratio was
greater than 0.64. Other investigators using this
approach have used different criteria to identify
items that manifest DIF, such as changes in -2 log
likelihood values or changes in pseudo-R2 values,
but do not provide benchmarks for judging
whether the DIF observed is small and unlikely to
be of practical significant or large with potential
practical implications (6).
Once significant DIF is detected its causes need
to be identified. Petersen et al. (4) have reviewed a
number of potential causes and how they might be
investigated. DIF in an item can occur due to
random variation, although the significance levels
we used in the analyses reported here suggest that
this is unlikely. It may be due to ‘pseudo-DIF’; that
is, the lack of independence between an item and
the scale score used as a control variable means
that DIF in one item may be caused by DIF in other
items. We investigated this possibility by using
‘purified’ scale scores derived from DIF-free items
and re-assessed all items that had manifested DIF
in the original analysis. Most of the items mani-
festing DIF continued to do so.
Differential item functioning can also be caused
by confounding. Since DIF can also occur in
relation to variables such as gender, age, ethnicity,
socioeconomic status and clinical status, popula-
tions included in cross-cultural studies should be
comparable with respect to these variables insofar
as this is feasible (7). Alternatively, where DIF is
detected the analyses can be repeated controlling
for these potential confounders. Although the
Brazilian and New Zealand populations included
in the study were derived from school-based
surveys and similar in gender and age they may
well have differed according to other variables,
such as socioeconomic status, that could have
confounded the association between item
responses and language group. The Brazilian chil-
dren, for example, came from schools in a deprived
area whereas the New Zealand children comprised
a socioeconomically representative sample of the
Taranaki population. These differences may ex-
plain why mean CPQ scores were higher in the
Brazilian population even after removing items
that showed moderate to large language-based
uniform DIF. It is also possible that some of the
differences were due to DIF related to socioeco-
nomic status. This can only be examined if data on
SES are collected, using measures that are broadly
comparable.
If ‘pseudo-DIF’ and confounding can be elimi-
nated as causes, then the DIF observed is probably
due to problems with translation, called linguistic
DIF, to cross-cultural biases or true cross-cultural
differences that are independent of other sociode-
mographic factors.
It is generally recommended that items mani-
festing DIF should be excluded when comparing
results from a translated and the original English
language version of a questionnaire (7). Depending
upon the number of items excluded, this may have
some effect on the psychometric properties of the
instrument and its measurement sensitivity. Alter-
natively, the translation could be reviewed to see if
a more accurate rendition of the items can be
achieved. Cross-cultural differences in language
use, concepts and meanings may preclude such a
solution. For example, Saub et al. (15) report that
many of the items in the Oral Health Impact Profile
(16) were difficult to translate into the Malay
language. In such circumstances, a balance needs
to be achieved between creating an instrument that
functions in an identical manner to the English
language original and one that works well in
another language (7).
Research should be undertaken to assess the
nature, extent and impact of DIF in all measures of
Table 4. Mean (SD) DIF-free CPQ11-14 overall and sub-scale scores by language ⁄ country
CPQ11-14 English (New Zealand) Portuguese (Brazil) P-valuea
Oral symptoms 4.4 (2.6) 5.4 (3.7) <0.001Functional limitations 1.9 (2.6) 3.7 (3.8) <0.001Emotional well-being 2.2 (3.0) 4.9 (4.4) <0.001Social well-being 3.7 (5.0) 6.3 (6.3) <0.001Overall 12.2 (10.6) 20.2 (15.6) <0.001
aIndependent samples t-tests.
134
Traebert et al.
OHRQoL. Since DIF has been identified in a broad
range of patient-based outcome measures, includ-
ing those that assess functional status, health-
related quality of life, satisfaction with care and
mental and cognitive functioning (17), it would be
unreasonable to suppose that patient-based mea-
sures of oral health outcomes are DIF-free.
Although the work of Petersen et al. (4) and Bjorner
et al. (7) suggests that DIF detection using methods
such as ordinal logistic regression and contingency
table analysis is relatively straightforward, there
are a number of neglected issues that need further
research into their impact on DIF detection rates
(8). These include model assumptions, model fit,
and the impact of differences in the distribution of
the latent variable in the groups being compared.
For example, most methods for detecting DIF
assume that the scale being assessed is unidimen-
sional. Since multidimensionality can give rise to
the appearance of DIF, assessing this unidimen-
sionality assumption is important. Exploratory and
confirmatory factor analyses have not been under-
taken with most oral health outcome measures so
that the dimensionality of these scales and their
component sub-scales is not known. Consequently,
further work regarding the construct validity of
these measures is needed as a prelude to wide-
spread adoption of DIF analysis.
To conclude, the results of this study indicate
that several of the items in the CPQ11-14 exhibited
differential functioning when Portuguese and
English language versions were compared that
remained after using ‘purified’ sub-scale scores as
control variables. Whether this DIF is linguistic in
origin or due to confounding by other variables
that produce DIF needs to be explored further.
AcknowledgementsWhen undertaking this research, J. Traebert was sup-ported a post-doctoral scholarship by CAPES, Ministryof Education, Brazilian Federal Government. The authorsare grateful to the New Zealand Dental AssociationResearch Foundation and the Taranaki District HealthBoard for funding the New Zealand data collection, andto the iwi of Taranaki for their support.
References1. Teresi JA, Fleishman JA. Differential item functioning
and health assessment. Qual Life Res 2007;16:33–42.
2. Herdman M, Fox-Rushby J, Badia X. ‘Equivalence‘and the translation and adaptation of health-related quality of life questionnaires. Qual Life Res1997;6:237–47.
3. Castro RAL, Portela MC, Leao AT. Cross culturaladaptation of quality of life indices for oral health.Cad Saude Publica 2007;23:2275–84.
4. Petersen MA, Groenvold M, Bjorner JB, Aaronson N,Conroy T, Cull A et al. Use of differentialitem functioning to assess the equivalence oftranslations of a questionnaire. Qual Life Res2003;12:373–85.
5. Hidalgo MD, Gomez J. Nonuniform DIF detectionusing discriminant logistic analysis and multinomiallogistic regression: a comparision for polytomousitems. Qual Quant 2006;40:805–23.
6. Crane PK, Gibbons LE, Ocepek-Welikson K, Cook K,Cella D, Narasimhalu K et al. A comparison of threesets of criteria for determining the presence ofdifferential item functioning using ordinal logisticregression. Qual Life Res 2007;16:69–84.
7. Bjorner J, Kreiner S, Ware J, Damsgaard M, Bech P.Differential item functioning in the Danishtranslation of the SF-36. J Clin Epidemiol1998;51:1189–202.
8. Teresi JA. Different approaches to differential itemfunctioning in health applications: advantages, dis-advantages and some neglected topics. Med Care2006;44:S152–70.
9. Jokovic A, Locker D, Stephens A, Kenny D, TompsonB, Guyatt G. Validity and reliability of a question-naire for measuring child oral-health-related qualityof life. J Dent Res 2002;81:459–63.
10. Goursand D, Paiva SM, Zarzar PM, Ramos-Jorge ML,Cornacchia ML, Pordeus IA. et al. Cross-culturaladaptation of the Child Perceptions Questionnaire11-14 (CPQ11-14) for the Brazilian Portuguese lan-guage. Health Qual Life Outcomes 2008;6:2.
11. Foster Page LA, Thomson WM, Jokovic A, Locker D.Validation of the Child Perceptions Questionnaire(CPQ11-14). J Dent Res 2005;84:649–52.
12. Swaminathan H, Rogers HJ. Detecting differentialitem functioning using logistic regression proce-dures. J Educ Meas 1990;27:361–70.
13. French AW, Miller TR. Logistic regression and its usein detecting differential item functioning in polyt-omous items. J Educ Meas 1996;33:315–32.
14. Zieky M. Practical questions in the use of DIF statisticsin test development. In: Holland PW, Wainer Heditors. Differential item functioning, Vol. 14. Hills-dale, NJ: Lawrence Erlbaum Associates, 1993; 93–106.
15. Saub R, Locker D, Allison P, Disman M. Cross-cultural adaptation of the oral health impact profilefor the Malaysian population. Community DentHealth 2007;24:166–75.
16. Slade GW, Spencer AJ. Development and evaluationof the Oral Health Impact Profile. Community DentHealth 1994;11:3–11.
17. McHorney C, Fleishman J. Assessing and under-standing measurement equivalence in health out-come measures: issues for further quantitative andqualitative enquiry. Med Care 2006;44(Suppl.3):S205–10.
135
Differential item functioning in a questionnaire