7
A number of scales have been developed that measure the functional and psychosocial impacts of oral disorders. The scores derived from these scales are usually used to compare the ‘oral health- related quality of life’ (OHRQoL) of groups defined by age, gender, socioeconomic status, education and ethnicity. The assumption underlying such comparisons is that of measurement equivalence. That is, it is assumed that the scale and the items comprising the scale function in the same way across groups defined by these sociodemographic variables. If a scale and its items are not equivalent then any differences detected may be an artefact of the measurement process rather than a reflection of actual sub-group differences in the underlying trait or construct being measured by the scale (1). Lack of equivalence may occur when one or more items in a scale is interpreted differently by different groups or there are group variations in the relevance of the concepts the items represent. One way of establishing the equivalence of items across the sub-groups participating in a study is through *Correction added after online publication January 14, 2010: Lindye Foster Page was changed to Lyndie Foster Page. Community Dent Oral Epidemiol 2010; 38: 129–135 All rights reserved Ó 2010 John Wiley & Sons A/S Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ 11-14 ) Traebert J, de Lacerda JT, Thomson WM, Foster Page L, Locker D. Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ 11-14 ). Community Dent Oral Epidemiol 2010. Ó 2010 John Wiley & Sons A S. Abstract – Objective: To determine whether a Portuguese language version of the Child Perceptions Questionnaire for 11–14-year-olds (CPQ 11-14 ) showed differential item functioning (DIF) when compared with the original English language version. Methods: CPQ 11-14 data from a school-based Brazilian study (n = 138) was compared with CPQ 11-14 data collected as part of a school-based study conducted in New Zealand (n = 322). In order to detect DIF, ordinal logistic regression analysis was performed with each CPQ 11-14 item as the dependent variable. The independent variables were language group (English versus Portuguese), the CPQ 11-14 sub-scale score of which the item was a part, and an interaction term for language*sub-scale score. Nonuniform DIF was deemed to be present if the interaction term was significant. Moderate to large uniform DIF was deemed to be present if after removing the interaction term the b coefficient (log odds ratio) for language group was significant and numerically greater than 0.64. Analyses were also undertaken to detect pseudo- DIF. Results: Nonuniform DIF was found in five items and moderate to large uniform DIF in an additional four items. Analyses using ‘purified’ sub-scale scores indicated that little of the DIF detected was pseudo-DIF. A comparison of the language groups using DIF affected and DIF-free overall and subscale CPQ 11-14 scores revealed that the DIF detected had only a marginal effect on the differences between language groups in scores. Conclusion: Oral health-related quality of life questionnaires, particularly those that have been translated, need to be assessed for DIF and its likely impact on group comparisons. Jefferson Traebert 1 , Josimari Telino de Lacerda 2 , W. Murray Thomson 3 , Lyndie Foster Page 3 * and David Locker 4 1 Grupo de Pesquisa em Sau ´ de Bucal Coletiva, Universidade do Sul de Santa Catarina, Tubara ˜o, SC, Brazil, 2 Departamento de Sau ´de Pu ´ blica, Universidade Federal de Santa Catarina, Floriano ´ polis, SC, Brazil, 3 Department of Oral Sciences, University of Otago, Dunedin, Otago, New Zealand, 4 Community Dental Health Services Research Unit, Faculty of Dentistry, University of Toronto, Toronto, ON, Canada Key words: differential item functioning; measurement equivalence; oral health- related quality of life measures; test bias Prof. Traebert, Jefferson Av. Jose ´ Aca ´cio Moreira, 787 Dehon Tubara ˜o Santa Catarina Brazil 88704900, Tel: 55 48 36213363 Fax: 55 48 36213363 e-mail: [email protected] Submitted 18 December 2008; accepted 31 October 2009 doi: 10.1111/j.1600-0528.2009.00525.x 129

Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

Embed Size (px)

Citation preview

Page 1: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

A number of scales have been developed that

measure the functional and psychosocial impacts

of oral disorders. The scores derived from these

scales are usually used to compare the ‘oral health-

related quality of life’ (OHRQoL) of groups defined

by age, gender, socioeconomic status, education

and ethnicity. The assumption underlying such

comparisons is that of measurement equivalence.

That is, it is assumed that the scale and the items

comprising the scale function in the same way

across groups defined by these sociodemographic

variables. If a scale and its items are not equivalent

then any differences detected may be an artefact of

the measurement process rather than a reflection of

actual sub-group differences in the underlying trait

or construct being measured by the scale (1). Lack

of equivalence may occur when one or more items

in a scale is interpreted differently by different

groups or there are group variations in the

relevance of the concepts the items represent. One

way of establishing the equivalence of items across

the sub-groups participating in a study is through

*Correction added after online publication January 14, 2010: Lindye Foster Page was changed to Lyndie Foster Page.

Community Dent Oral Epidemiol 2010; 38: 129–135All rights reserved

� 2010 John Wiley & Sons A/S

Differential item functioning in aBrazilian–Portuguese version ofthe Child PerceptionsQuestionnaire (CPQ11-14)Traebert J, de Lacerda JT, Thomson WM, Foster Page L, Locker D. Differentialitem functioning in a Brazilian–Portuguese version of the Child PerceptionsQuestionnaire (CPQ11-14). Community Dent Oral Epidemiol 2010.� 2010 John Wiley & Sons A ⁄ S.

Abstract – Objective: To determine whether a Portuguese language version ofthe Child Perceptions Questionnaire for 11–14-year-olds (CPQ11-14) showeddifferential item functioning (DIF) when compared with the original Englishlanguage version. Methods: CPQ11-14 data from a school-based Brazilian study(n = 138) was compared with CPQ11-14 data collected as part of a school-basedstudy conducted in New Zealand (n = 322). In order to detect DIF, ordinallogistic regression analysis was performed with each CPQ11-14 item as thedependent variable. The independent variables were language group (Englishversus Portuguese), the CPQ11-14 sub-scale score of which the item was a part,and an interaction term for language*sub-scale score. Nonuniform DIF wasdeemed to be present if the interaction term was significant. Moderate to largeuniform DIF was deemed to be present if after removing the interaction termthe b coefficient (log odds ratio) for language group was significant andnumerically greater than 0.64. Analyses were also undertaken to detect pseudo-DIF. Results: Nonuniform DIF was found in five items and moderate to largeuniform DIF in an additional four items. Analyses using ‘purified’ sub-scalescores indicated that little of the DIF detected was pseudo-DIF. A comparison ofthe language groups using DIF affected and DIF-free overall and subscaleCPQ11-14 scores revealed that the DIF detected had only a marginal effect on thedifferences between language groups in scores. Conclusion: Oral health-relatedquality of life questionnaires, particularly those that have been translated, needto be assessed for DIF and its likely impact on group comparisons.

Jefferson Traebert1, Josimari Telino de

Lacerda2, W. Murray Thomson3, Lyndie

Foster Page3* and David Locker4

1Grupo de Pesquisa em Saude Bucal

Coletiva, Universidade do Sul de Santa

Catarina, Tubarao, SC, Brazil,2Departamento de Saude Publica,

Universidade Federal de Santa Catarina,

Florianopolis, SC, Brazil, 3Department of

Oral Sciences, University of Otago, Dunedin,

Otago, New Zealand, 4Community Dental

Health Services Research Unit, Faculty of

Dentistry, University of Toronto, Toronto,

ON, Canada

Key words: differential item functioning;measurement equivalence; oral health-related quality of life measures; test bias

Prof. Traebert, Jefferson Av. Jose AcacioMoreira, 787 Dehon Tubarao Santa CatarinaBrazil 88704900,Tel: 55 48 36213363Fax: 55 48 36213363e-mail: [email protected]

Submitted 18 December 2008;accepted 31 October 2009

doi: 10.1111/j.1600-0528.2009.00525.x 129

Page 2: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

the use of differential item functioning (DIF) (1).

While DIF has been used to assess the assumption

of equivalence with respect to the quality of life

instruments used in medicine, it has not yet been

used with respect to the instruments in common

use in oral health research. Consequently, research

is needed to ensure that OHRQoL scales and items

do function in the same way irrespective of age,

gender, socioeconomic status, education and eth-

nicity. This is particularly important given the

current emphasis on disparities in oral health.

Measurement equivalence is also of concern

where a measure has been developed in one

language or culture and is translated and adapted

for use in another language or culture. Most

measures of OHRQoL were developed in English-

speaking countries such as the UK, USA, Australia

and Canada and almost all have been translated

and used in studies involving European, Asian,

Middle Eastern and Central and South American

populations (2, 3). Although most investigators

follow strict guidelines for the translation of ques-

tionnaires, differences between the original and

translated version can still occur (4), either because

items have been poorly translated or because exact

equivalence of words, meaning or concept may be

difficult to achieve. Where this happens, the trans-

lated items may not function in the same way as

their English language counterparts. This can be

ascertained through an examination of DIF.

Differential item functioning analysis has its

origin in educational psychology and is used to

determine whether the questions on a test have the

same level of difficulty for individuals from differ-

ent social groups who are equivalent in terms of

intelligence or aptitude. A simple definition of DIF

as applied to health status instruments is as

follows: ‘An item in a scale exhibits DIF if

responses to the item differ across groups (such

as, gender or race) after controlling for an estimate

of the construct the scale is intended to measure’

(1). For example, if an item in a scale intended to

measure physical functioning does not exhibit DIF,

then people with the same level of physical

functioning should respond similarly to that item,

irrespective of group membership. If an item does

exhibit DIF then people from different groups but

with the same level of physical functioning would

have different responses to the item (1).

Two types of DIF can occur, uniform and

nonuniform DIF. Uniform DIF indicates that the

responses of a group to an item are systematically

higher or lower than the responses of a

comparison group across the full range of the

construct being measured. Nonuniform DIF is

present if there is an interaction between group

membership and construct level so that the direc-

tion of the DIF varies across the range of the

construct. For example, if responses are systemat-

ically higher at low levels of the trait but system-

atically lower at high levels of the trait then an

item manifests nonuniform DIF (5, 6). An item can

exhibit nonuniform and uniform DIF at the same

time and this has been referred to as ‘nonuniform

asymetrical DIF’ (5). However, since the ultimate

aim of DIF analysis is to develop a DIF-free scale;

that is, one that exhibits homogenous item func-

tioning (7), once an item is found to manifest one

form of DIF it is unnecessary for all practical

purposes to test for other forms of DIF (1).

A number of analytic methods have been used to

detect DIF in health status questionnaires that have

been translated or used in heterogeneous popula-

tions that include different racial or ethnic groups.

The most common methods used are three-way

contingency table analysis, logistic regression and

statistical techniques derived from Item Response

Theory (IRT) (1). Logistic regression can be used to

detect nonuniform and uniform DIF, can be used

with items that have dichotomous or polytomous

response formats, and criteria are available for the

estimation of the magnitude of the DIF (8).

The study reported in this paper used ordinal

logistic regression to assess the item equivalence of

the English language version of the Child Percep-

tions Questionnaire for 11–14 year olds (CPQ11-14)

(9) and a Portuguese language version developed

in Brazil (10). The CPQ11-14 is a generic measure of

the functional and psychosocial impacts of oral

disorders in children of that age. Children are

asked if, in the past 3 months, they have experi-

enced the problems described by 37 items in four

domains: oral symptoms (OS-6 items), functional

limitations (FL-9 items); emotional well-being (EW-

9 items) and social well-being (SW-13 items). The

response options are: never (0), once ⁄ twice (1),

sometimes (2), often (3) and every day ⁄ almost

every day (4). The Portuguese language version

had psychometric properties comparable to the

original English language version when used in

Canada (10). The primary purpose of the paper is

to illustrate the use of ordinal logistic regression

analysis in DIF detection in order to ascertain the

conceptual and operational comparability of the

English language instrument and its Portuguese

counterpart.

130

Traebert et al.

Page 3: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

Methods

Study population and designTwo data sets were used in the analysis: (i) a

population-based study of Brazilian children who

completed the Brazilian–Portuguese language ver-

sion of the CPQ11-14; and, (ii) a population-based

study of New Zealand children who completed the

original English language version. The primary

objective of both studies was to assess the oral

health-related quality of life of children using the

CPQ11-14.

The Brazilian data was obtained as part of a cross-

sectional study involving all 12–14 year-old school-

children (n = 138) from two schools in a deprived

area of the city of Florianopolis in the Southern

Brazilian State of Santa Catarina. Children com-

pleted the CPQ11-14 in their own classrooms, before

having a dental examination. Parental consent was

obtained for the participation of each child in the

research. The research proposal was approved by

the Ethics Committee for Research at the Univer-

sidade Federal de Santa Catarina.

The New Zealand study involved a simple

random sample of 322 12- and 13-year-old children

of non-Maori origin enroled in the Taranaki District

Health Board’s school dental service (11). Each child

completed the CPQ11-14 in the dental clinic waiting

room just prior to having a dental examination.

Consent was obtained from both parents and

children before proceeding, and ethical approval

was obtained from the Taranaki Ethics Committee.

Data analysisDetecting nonuniform and uniform DIF . The two

data sets were merged and analyses undertaken to

compare the item responses of the Brazilian and

New Zealand children. The null hypothesis exam-

ined in each comparison was that there was no

association between responses to the CPQ11-14

items and language group after controlling for an

estimate of the construct measured by the sub-scale

that included the item. An estimate of the construct

was obtained by summing the responses to the

items in each subscale. For example, we examined

whether or not there was an association between

responses to the item ‘In the past 3 months, how

often have you had pain in your teeth, lips, jaws or

mouth?’ and language group after controlling for

the Oral Symptoms (OS) sub-scale score. The item

concerning pain was one of six comprising the OS

sub-scale. All of the other items in the OS sub-scale

were examined in the same way.

Since the CPQ11-14 items are scored on an ordinal

scale, the analytic procedure used was Ordinal

Logistic Regression (OLR) (12, 13). The main

assumption of this technique is that the odds ratio

for a covariate, such as language, is constant for all

categories of the ordinal outcome variable; in this

case, the item response. Given the distribution of

responses to the individual items, with responses

to the lower scored categories being more probable,

the negative log–log link function was employed.

The software used in the analysis was spss 15.0.

The analytic strategy used was that suggested by

Teresi and Fleishman (1) and Petersen et al. (4). For

each item the response to the item was first

modelled as a logit-linear function of a dichoto-

mous variable denoting language group, the sub-

scale score and an interaction term which is the

product of language group and the sub-scale score.

If the interaction term was significant this provided

evidence of nonuniform DIF. If the interaction term

was not significant it was then removed and the

analysis repeated with the remaining two variables

in the model, language group and sub-scale score.

Moderate-to-large uniform DIF was considered to

be present if the language group variable was

significant and the regression coefficient was

numerically larger than 0.64; that is the odds ratio

was outside the interval 0.53–1.89. This standard

was supplied by the Educational Testing Service

(14) as adapted by Bjorner et al. (7). In the analyses,

the English-speaking group was coded 1 and the

Brazilian group was coded 2 and was assigned by

spss to be the reference category. Consequently,

when the regression coefficient (log odds ratio) for

language group was positive, the English language

respondents had an increased likelihood of a

higher score on the item in question, meaning they

reported more frequent impacts. When it was

negative, the English language respondents had

an increased likelihood of a lower score on the item

and reported less frequent impacts.

Since the analysis of items within each sub-scale

involved multiple comparisons, the P-value was

adjusted to account for the number of analyses per

sub-scale. Since the OS sub-scale has six items a

P-value was considered to be significant if it was

less than 0.05 ⁄ 12 (6 items and a test for nonuniform

and uniform DIF for each) or 0.004. For the other

sub-scales the P-values were set as follows: FL –

0.003; EW – 0.003 and SW – 0.002.

Detecting pseudo-DIF . Where a sub-scale contains

more than one item with significant DIF, true DIF

in one item can produce DIF in other items (4, 7).

131

Differential item functioning in a questionnaire

Page 4: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

This is known as ‘pseudo-DIF’. This means that the

sub-scale score might not be the best control

variable to use in the analysis. An ideal control

would be a valid measure of the underlying

construct that is not derived from the items being

tested for DIF. Such measures rarely exist (4). A

compromise is to construct a ‘purified’ scale score

from the items that are not affected by DIF and use

this score as the control variable in the regression

analysis (1). Consequently, all items showing DIF

were reassessed in OLR analyses in which the

control variables were sub-scale scores obtained by

summing the responses to the items that did not

manifest DIF in the initial analyses.

Results

Characteristics of the participantsThe age and gender characteristics of the study

groups are given in Table 1. The groups were

broadly comparable in terms of gender but there

were some differences in the age distributions.

However, the mean ages were identical, both being

12.7 years. Independent samples t-tests indicated

that the Brazilian sample had a significantly higher

overall CPQ11-14 score than the New Zealand

sample and significantly higher scores for all four

sub-scales (Table 2).

Nonuniform and uniform DIFThe interaction term for language and sub-scale

score was significant, indicating nonuniform DIF,

for five items; one from the FL sub-scale, three from

the EW sub-scale and 1 from the SW sub-scale.

These items were FL-‘Taken longer than others to

complete a meal’ (P < 0.001); EW – ‘Felt unsure of

yourself’ (P < 0.001), ‘Felt shy or embarrassed’

(P = 0.001), and ‘Worried that you are not as

good-looking as others’ (P = 0.002); SW – ‘Other

children teased or called you names’ (P = 0.002).

These items are italicized in Table 3.

Table 3 also presents the results of the uniform

DIF analyses. Four of the 32 items tested met the

criteria for moderate to large DIF; one from the OS

sub-scale, two from the FL sub-scale and one from

the SW sub-scale. The b coefficients (log odds

ratios) and P-values for these items are shown in

bold. Two of the b coefficients (log odds ratios)

were positive and two negative.

Pseudo-DIFSince only one of the OS items manifested DIF,

pseudo-DIF detection was not undertaken. For the

FL sub-scale a ‘purified’ score was constructed by

summing the responses to the six DIF-free items.

The three items showing DIF were then reassessed

using the DIF-free sub-scale score as the control. All

still showed evidence of DIF, with only a modest

reduction in the b coefficients (log odds ratios) for

those manifesting uniform DIF. The same proce-

dure was used with the emotional well-being

sub-scale, where three items had nonuniform DIF.

The P-values for these three items remained signif-

icant. In the analysis of items from the SW sub-scale,

the item with uniform DIF remained significant but

the interaction term for the item with nonuniform

DIF failed to reach statistical significance.

Table 1. Gender and age distribution of subjects bylanguage ⁄ country

English (New Zealand) Portuguese (Brazil)

n (%) n (%)

GenderMale 167 (51.9) 63 (45.6)Female 155 (48.1) 75 (54.4)

Age (years)12 86 (26.7) 52 (37.7)13 236 (73.3) 74 (53.6)14 – 12 (8.7)Total 322 (100.0) 138 (100.0)

Values are represented as n (%).

Table 2. Mean CPQ11-14 overall and sub-scale scores by language ⁄ country

CPQ11-14

English (New Zealand) Portuguese (Brazil)

Mean (SD) Median Mean (SD) Median

Oral symptoms 4.8 (2.8)a 4.0 5.7 (4.0) 5.0Functional limitations 5.6 (4.3)b 5.0 7.9 (5.8) 7.0Emotional well-being 3.5 (4.7)b 2.0 7.5 (6.4) 7.0Social well-being 4.0 (5.4)b 2.0 7.0 (6.8) 5.0Overall 17.8 (13.9)b 14.0 28.1 (20.0) 24.5

aDifference between mean scores for English and Portuguese language groups: P < 0.05 – Independent samples t-tests.bDifference between mean scores for English and Portuguese language groups: P < 0.001 – Independent samples t-tests.

132

Traebert et al.

Page 5: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

Comparison of DIF-free CPQ11-14 scoresIn order to assess the practical, as opposed to the

statistical, significance of the DIF observed,

CPQ11-14 scores were calculated excluding those

items with moderate to large uniform DIF. The

mean scores of the language groups were then

compared. The overall score and all four sub-scale

scores remained significantly different, with the

Brazilian sample having higher mean scores

(Table 4). An effect size statistic for the overall

score, calculated from the difference in means

divided by the pooled standard deviation, changed

only marginally, from 0.65 to 0.62.

Discussion

The primary purpose of this paper is to illustrate

the use of DIF analysis in assessing the measure-

ment equivalence of items in an oral health

outcome questionnaire. Although we used DIF

analysis to assess a translated version of the CPQ11-

14, the same analytic approach can be used to assess

whether or not its items function in the same way

across sub-groups from the same population.

Using ordinal logistic regression analysis we

found evidence of both uniform and nonuniform

DIF in some items of the Brazilian-Portuguese

version of the CPQ11-14. For nonuniform DIF, the

Table 3. Results of uniform DIF analyses

b P-value

Oral symptoms (OS)Pain in your teeth, lips,jaws or mouth

)0.245 0.074

Bleeding gums )0.144 0.337Sores in your mouth 0.221 0.173Bad breath 0.166 0.227Food stuck in or betweenyour teeth

0.452 <0.001

Food stuck in the top ofyour mouth

0.855 <0.001

Functional limitations (FL)Breathed through yourmouth

0.881 <0.001

Taken longer than others toeat a meal

– –

Had trouble sleeping 0.136 0.457Difficult to bite or chewfood like apples, corn onthe cob or steak

)0.305 0.055

Difficult to open yourmouth wide

0.188 0.414

Difficult to say any words )0.425 0.025Difficult to eat foods youwould like to eat

)0.409 0.030

Difficult to drink with astraw

)0.501 0.217

Difficult to drink or eat hotor cold foods

)1.091 <0.001

Emotional well-being (EW)Felt irritable or frustrated 0.538 0.004Felt unsure of yourself – –Felt shy or embarrassed – –Been concerned what otherpeople think about yourteeth, lips, mouth or jaws

)0.305 0.067

Worried that you are not asgood-looking as others

– –

Been upset )0.430 0.016Felt nervous or afraid 0.018 0.923Worried what you are notas health as others

0.064 0.724

orried that you are differentthan other people

0.070 0.742

Social well-being (SW)Missed school because ofpain, appointments, orsurgery

)0.179 0.358

Had a hard time payingattention in school

0.296 0.182

Had difficulty doing yourhomework

)0.224 0.295

Not wanted to speak orread out loud in class

)0.048 0.797

Avoid taking part inactivities like sports, clubs,drama, music, school trips

0.340 0.191

Not wanted to talk to otherchildren

)0.521 0.026

Table 3. (Continued)

b P-value

Avoided smiling orlaughing when aroundother children

)0.708 <0.001

Had difficult playing amusical instrument

)0.563 0.054

Not wanted to spend timewith other children

)0.637 0.006

Argued with other childrenor your family

0.221 0.196

Other children teased you orcalled you names

– –

Other children made youfeel left out

0.187 0.426

Other children asked youquestions about yourteeth, lips, jaws or mouth

0.118 0.526

Regression coefficients (log odds ratios) and P-values inbold indicate items with moderate to large uniform DIF.Item in italics manifest nonuniform DIF: Note thatregression coefficients (log odds ratios) and P-valuesare not given for these items. Also note that analyses todetect uniform DIF were not undertaken for these items.

133

Differential item functioning in a questionnaire

Page 6: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

P-value of the interaction term was used to detect

its presence but there are no standards for assess-

ing the magnitude of the DIF observed. Following

Petersen et al. (4), we deemed moderate-to-large

uniform DIF to be present if the language

group variable was statistically significant after

accounting for multiple comparisons and the

absolute magnitude of the logs odds ratio was

greater than 0.64. Other investigators using this

approach have used different criteria to identify

items that manifest DIF, such as changes in -2 log

likelihood values or changes in pseudo-R2 values,

but do not provide benchmarks for judging

whether the DIF observed is small and unlikely to

be of practical significant or large with potential

practical implications (6).

Once significant DIF is detected its causes need

to be identified. Petersen et al. (4) have reviewed a

number of potential causes and how they might be

investigated. DIF in an item can occur due to

random variation, although the significance levels

we used in the analyses reported here suggest that

this is unlikely. It may be due to ‘pseudo-DIF’; that

is, the lack of independence between an item and

the scale score used as a control variable means

that DIF in one item may be caused by DIF in other

items. We investigated this possibility by using

‘purified’ scale scores derived from DIF-free items

and re-assessed all items that had manifested DIF

in the original analysis. Most of the items mani-

festing DIF continued to do so.

Differential item functioning can also be caused

by confounding. Since DIF can also occur in

relation to variables such as gender, age, ethnicity,

socioeconomic status and clinical status, popula-

tions included in cross-cultural studies should be

comparable with respect to these variables insofar

as this is feasible (7). Alternatively, where DIF is

detected the analyses can be repeated controlling

for these potential confounders. Although the

Brazilian and New Zealand populations included

in the study were derived from school-based

surveys and similar in gender and age they may

well have differed according to other variables,

such as socioeconomic status, that could have

confounded the association between item

responses and language group. The Brazilian chil-

dren, for example, came from schools in a deprived

area whereas the New Zealand children comprised

a socioeconomically representative sample of the

Taranaki population. These differences may ex-

plain why mean CPQ scores were higher in the

Brazilian population even after removing items

that showed moderate to large language-based

uniform DIF. It is also possible that some of the

differences were due to DIF related to socioeco-

nomic status. This can only be examined if data on

SES are collected, using measures that are broadly

comparable.

If ‘pseudo-DIF’ and confounding can be elimi-

nated as causes, then the DIF observed is probably

due to problems with translation, called linguistic

DIF, to cross-cultural biases or true cross-cultural

differences that are independent of other sociode-

mographic factors.

It is generally recommended that items mani-

festing DIF should be excluded when comparing

results from a translated and the original English

language version of a questionnaire (7). Depending

upon the number of items excluded, this may have

some effect on the psychometric properties of the

instrument and its measurement sensitivity. Alter-

natively, the translation could be reviewed to see if

a more accurate rendition of the items can be

achieved. Cross-cultural differences in language

use, concepts and meanings may preclude such a

solution. For example, Saub et al. (15) report that

many of the items in the Oral Health Impact Profile

(16) were difficult to translate into the Malay

language. In such circumstances, a balance needs

to be achieved between creating an instrument that

functions in an identical manner to the English

language original and one that works well in

another language (7).

Research should be undertaken to assess the

nature, extent and impact of DIF in all measures of

Table 4. Mean (SD) DIF-free CPQ11-14 overall and sub-scale scores by language ⁄ country

CPQ11-14 English (New Zealand) Portuguese (Brazil) P-valuea

Oral symptoms 4.4 (2.6) 5.4 (3.7) <0.001Functional limitations 1.9 (2.6) 3.7 (3.8) <0.001Emotional well-being 2.2 (3.0) 4.9 (4.4) <0.001Social well-being 3.7 (5.0) 6.3 (6.3) <0.001Overall 12.2 (10.6) 20.2 (15.6) <0.001

aIndependent samples t-tests.

134

Traebert et al.

Page 7: Differential item functioning in a Brazilian–Portuguese version of the Child Perceptions Questionnaire (CPQ11-14)

OHRQoL. Since DIF has been identified in a broad

range of patient-based outcome measures, includ-

ing those that assess functional status, health-

related quality of life, satisfaction with care and

mental and cognitive functioning (17), it would be

unreasonable to suppose that patient-based mea-

sures of oral health outcomes are DIF-free.

Although the work of Petersen et al. (4) and Bjorner

et al. (7) suggests that DIF detection using methods

such as ordinal logistic regression and contingency

table analysis is relatively straightforward, there

are a number of neglected issues that need further

research into their impact on DIF detection rates

(8). These include model assumptions, model fit,

and the impact of differences in the distribution of

the latent variable in the groups being compared.

For example, most methods for detecting DIF

assume that the scale being assessed is unidimen-

sional. Since multidimensionality can give rise to

the appearance of DIF, assessing this unidimen-

sionality assumption is important. Exploratory and

confirmatory factor analyses have not been under-

taken with most oral health outcome measures so

that the dimensionality of these scales and their

component sub-scales is not known. Consequently,

further work regarding the construct validity of

these measures is needed as a prelude to wide-

spread adoption of DIF analysis.

To conclude, the results of this study indicate

that several of the items in the CPQ11-14 exhibited

differential functioning when Portuguese and

English language versions were compared that

remained after using ‘purified’ sub-scale scores as

control variables. Whether this DIF is linguistic in

origin or due to confounding by other variables

that produce DIF needs to be explored further.

AcknowledgementsWhen undertaking this research, J. Traebert was sup-ported a post-doctoral scholarship by CAPES, Ministryof Education, Brazilian Federal Government. The authorsare grateful to the New Zealand Dental AssociationResearch Foundation and the Taranaki District HealthBoard for funding the New Zealand data collection, andto the iwi of Taranaki for their support.

References1. Teresi JA, Fleishman JA. Differential item functioning

and health assessment. Qual Life Res 2007;16:33–42.

2. Herdman M, Fox-Rushby J, Badia X. ‘Equivalence‘and the translation and adaptation of health-related quality of life questionnaires. Qual Life Res1997;6:237–47.

3. Castro RAL, Portela MC, Leao AT. Cross culturaladaptation of quality of life indices for oral health.Cad Saude Publica 2007;23:2275–84.

4. Petersen MA, Groenvold M, Bjorner JB, Aaronson N,Conroy T, Cull A et al. Use of differentialitem functioning to assess the equivalence oftranslations of a questionnaire. Qual Life Res2003;12:373–85.

5. Hidalgo MD, Gomez J. Nonuniform DIF detectionusing discriminant logistic analysis and multinomiallogistic regression: a comparision for polytomousitems. Qual Quant 2006;40:805–23.

6. Crane PK, Gibbons LE, Ocepek-Welikson K, Cook K,Cella D, Narasimhalu K et al. A comparison of threesets of criteria for determining the presence ofdifferential item functioning using ordinal logisticregression. Qual Life Res 2007;16:69–84.

7. Bjorner J, Kreiner S, Ware J, Damsgaard M, Bech P.Differential item functioning in the Danishtranslation of the SF-36. J Clin Epidemiol1998;51:1189–202.

8. Teresi JA. Different approaches to differential itemfunctioning in health applications: advantages, dis-advantages and some neglected topics. Med Care2006;44:S152–70.

9. Jokovic A, Locker D, Stephens A, Kenny D, TompsonB, Guyatt G. Validity and reliability of a question-naire for measuring child oral-health-related qualityof life. J Dent Res 2002;81:459–63.

10. Goursand D, Paiva SM, Zarzar PM, Ramos-Jorge ML,Cornacchia ML, Pordeus IA. et al. Cross-culturaladaptation of the Child Perceptions Questionnaire11-14 (CPQ11-14) for the Brazilian Portuguese lan-guage. Health Qual Life Outcomes 2008;6:2.

11. Foster Page LA, Thomson WM, Jokovic A, Locker D.Validation of the Child Perceptions Questionnaire(CPQ11-14). J Dent Res 2005;84:649–52.

12. Swaminathan H, Rogers HJ. Detecting differentialitem functioning using logistic regression proce-dures. J Educ Meas 1990;27:361–70.

13. French AW, Miller TR. Logistic regression and its usein detecting differential item functioning in polyt-omous items. J Educ Meas 1996;33:315–32.

14. Zieky M. Practical questions in the use of DIF statisticsin test development. In: Holland PW, Wainer Heditors. Differential item functioning, Vol. 14. Hills-dale, NJ: Lawrence Erlbaum Associates, 1993; 93–106.

15. Saub R, Locker D, Allison P, Disman M. Cross-cultural adaptation of the oral health impact profilefor the Malaysian population. Community DentHealth 2007;24:166–75.

16. Slade GW, Spencer AJ. Development and evaluationof the Oral Health Impact Profile. Community DentHealth 1994;11:3–11.

17. McHorney C, Fleishman J. Assessing and under-standing measurement equivalence in health out-come measures: issues for further quantitative andqualitative enquiry. Med Care 2006;44(Suppl.3):S205–10.

135

Differential item functioning in a questionnaire