1 Class 9 Interpreting Pretest Data, Considerations in Modifying or Adapting Measures November 13, 2008 Anita L. Stewart Institute for Health & Aging University

1

Class 9

Interpreting Pretest Data, Considerations in Modifying or Adapting Measures

November 13, 2008

Anita L. Stewart Institute for Health & Aging

University of California, San Francisco

2

Overview of Class 9

Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Creating and testing scales in your

sample

3

Summarize Data on Pretest Interviews

Summarize problems and nature of problems for each item

Determine how important problems are Results become basis for possible

revisions/adaptations

4

Methods of Analysis

Optimal: transcripts of all pretest interviews

For each item - summarize all problems Analyze dialogue (narrative) for clues to

solve problems

5

Behavioral Coding

Systematic approach to identifying problems with items– “interviewer” and “respondent” problems

Can code problems based on:– Standard administration

– Responses to specific probes

6

Examples of Interviewer “Behaviors” Indicating Problem Items

Question misread or altered– Slight change – meaning not affected

– Major change – alters meaning Question skipped

7

Examples of Respondent “Behaviors” Indicating Problem Items

Asks for clarification or repeat of question Did not understand question Doesn’t know the answer Qualified answer (e.g., it depends) Indicates answer falls between existing

response choices Refusal

8

Summarize Behavioral Coding For Each Item

Proportion of interviews (respondents) with each problematic behavior

# of occurrences of problem divided by N– 7/48 respondents requested clarification

9

Behavioral Coding Summary Sheet: Standard Administration (N=20)

Item #

Interviewer: difficulty reading

Subject: asks to repeat Q

Subject: asks for

clarification

1 2/20 0 1/20

2 0 0 0

3 1/20 3/20 2/20

4 0 1/20 0

10

Can Identify Problems Even When No Problem “Behaviors” Found

Respondents appear to answer question appropriately

Additional problems identified with probes– Probe on meaning: Response indicates lack of

understanding

– Probe on use of response options: Response indicates options are problematic

11

Behavioral Coding of Probe Results

I asked you how often doctors asked you about your health beliefs. What does the term “health beliefs” mean to you?

Behavioral coding: # times response indicated lack of understanding as intended– e.g., 2/15 respondents did not understand meaning

based on response to probe

12

Behavioral Coding Summary: Standard Administration (N=20) + Probes (N=10)

Item # Probe

Meaning unclear

Interviewer -difficulty

reading

Subject: asks to

repeat Q

Subject: asks for

clarification

1 10 2/10 2/20 0 1/20

2 0 0 0 0 0

3 10 4/15 1/20 3/20 2/20

4 10 0 0 1/20 0

13

Interpret Behavioral Coding Results

Determine if problems are common– Items with only a few problems may be fine

Quantifying “common” problems– several types of problems (many row entries)

– several subjects experienced a problem» problem w/item identified in >15% of interviews

14

Continue Analyzing Items with “Common” Problems

Identify “serious” common problems– Gross misunderstanding of the question– Yields completely erroneous answer– Couldn’t answer the question at all

Some less serious problems can be addressed by improved instructions or a slight modification

15

Addressing More Serious Problems

Conduct content analysis of transcript – Use qualitative analysis software (e.g.,

NVIVO) For these items: review dialogue that

ensued during administration of item and probes– can reveal source of problems– can help in deciding whether to keep, modify

or drop items

16

Results: Probing Meaning of Phrase I asked you how often doctors asked you

about your health beliefs? What does the term ‘health beliefs’ mean to you?

“.. I don’t want medicine”

“.. How I feel, if I was exercising…” “.. Like religion? --not believing in

going to doctors?”

17

Results: Probing Meaning of a Phrase

What does the phrase “office staff” mean to you?

“the receptionist and the nurses”

“nurses and appointment people”

“the person who takes your blood pressure and the clerk in the front office”

18

Results: Probing Meaning of Phrase

On about how many of the past 7 days did you eat foods that are high in fiber, like whole grains, raw fruits, and raw vegetables? – Probe: what does the term “high fiber” mean to

you? Behavioral coding of item

– Over half of respondents exhibited a problem Review answers to probe

– Over ¼ did not understand the term

Blixt S et al., Proceedings of section on survey research methods,American Statistical Association, 1993:1442.

19

Results: No Behavior Coding Issues but Probe Detected Problems

I seem to get sick a little easier than other people (definitely true, mostly true, mostly false, definitely false)

Behavioral coding of item– Very few problems

Review answers to probe– Almost 3/4 had comprehension problems– Most problems around term “mostly” (either its true

or its not)

Blixt S et al., Proceedings of section on survey research methods,American Statistical Association, 1993:1442.

20

Results: Beck Depression Inventory (BDI) and Literacy

Cognitive interviews: older adults, oncology pts, and less educated adults

Administered REALM (reading literacy test) and some selected BDI items

Asked to paraphrase items

TL Sentell, Community Mental Health Journal, 2008;39:323

21

Results: Beck Depression Inventory (BDI) and Literacy (cont)

For each item, from 0-62% correctly paraphrased item

Most misunderstandings: vocabulary confusion Phrase: I am critical of myself for my

weaknesses and mistakes– “Critical is when you’re very sick”– “I don’t know how to explain mistakes”

22

Interpreting Pretest Results of Self-Administered Questionnaires

Missing data is a clue to problematic items– More missing data associated with unclear,

difficult, or irrelevant items

– Cognitive interviewing can help determine reasons for missing data

23

How Missing Data Prevalence Helps

Items with large percent of responses missing – clue to problem

In H-CAHPS® pretest: Did hospital staff talk with you about whether

you would have the help you needed when you left the hospital?– 35% missing for Spanish group– 29% missing for English group

MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

24

Exploring Differences by Diverse Groups

Back to issue of “equivalence” of meaning across groups

All cognitive interview analyses can be done separately by group

25

Results: Use of Response Scale

Do diverse groups use the response scale in similar ways?

Re questions about cultural competence of providers– Interviewers reported that Asian respondents

who were completely satisfied did not like to use the highest score on the rating scale

California Pan-Ethnic Health Network (CPEHN) Report, 2001

26

Results: Use of Response Scale (cont)

Behavioral Risk Factor Surveillance Survey (BRFSS) pretesting

Found that Puerto Rican, Mexican American, and African American respondents more likely to choose extreme response categories than Whites.

RB Warnecke et al, Ann Epidemiol, 1997:7:334-342

27

Differential Use of CAHPS® 0-10 Global Rating Scale

Compared Medicaid and commercially insured adults on use of scale

Medicaid enrollees more likely than commercial participants to use extreme ends of scale– All other things being equal

PC Damiano et al, Health Serv Outcomes Res Method, 2004:5:193-205

28

Results: Probe on Difficulty:CES-D Item

“During the past week, how often have you felt that you could not shake off the blues, even with help from family and friends”

Probe: Do you feel this is a question that people would or would not have difficulty understanding?– Latinos more likely than other groups to report

people would have difficulty

TP Johnson, Health Survey Research Methods, 1996

29

Overview of Class 9


sample

30

Now What!

Issues in adapting measures based on pretest results

Cognitive interview pretesting during development phases of measure– Can modify items and continue pretesting

Cognitive interview pretesting prior to using published survey:– More problematic

31

Modification: Probing the Meaning of a Phrase

What does the phrase “office staff” mean to you?

“the receptionist and the nurses”

“nurses and appointment people”

“the person who takes your blood pressure and the clerk in the front office”

We changed the question to receptionist and appointment staff

32

Results: Probing Meaning and Cultural Appropriateness

I asked you how often doctors asked you about your health beliefs? What does the term ‘health beliefs’ mean to you?

“.. I don’t want medicine” “.. How I feel, if I was exercising…” “.. Like religion? --not believing in

going to doctors?” We changed the question to “personal beliefs

about your health

33

Criteria for Whether or Not to Modify Measure

Contact author – May be open to modifications, working with you

Be sure your opinion is based on extensive pretests with consistent problems– Don’t rely on a few comments in a small pretest

Work with a measurement specialist to assure that proposed modifications are likely to solve problem

34

Tradeoffs of Using Adapted Measures

Advantages Improve internal validity

Disadvantages Lose external validity Know less about modified measure Need to defend new measure

35

Adding New (Modified) Items

One approach if you find serious problems with a standard measure– Write new items you think will be better (use same

format)– Retain original intact items and add modified items

Can test original scale and revised scale with modified items instead of original items

36

Modifying response categories

If response choices are too few and/or coarse, can improve without compromising too much– Try adding levels within existing response

scale

37

One Modification: Too Many Response Choices

SF36 version 1 1 - All of the time 2 - Most of the time 3 - A good bit of the time 4 - Some of the time 5 - A little of the time 6 - None of the time

SF36 version 2 1 - All of the time 2 - Most of the time 3 - Some of the time 4 - A little of the time 5 - None of the time

38

Modification of Health Perceptions Response Choices for Thai Translation

Usual responses: 1 - Definitely true 2 - Mostly true 3 - Don’t know 4 - Mostly false 5 - Definitely false

Modified: 1 – Not at all true 2 – A little true 3 - Somewhat true 4 - Mostly true 5 – Definitely true

e.g., My health is excellent, I expect my health to get worse

39

Modifying Item Stems

If item wording will not be clear to your population– Can add parenthetical phrases

Have you ever been told by a doctor that you have diabetes (high blood sugar)?

40

Strategy for Modified Measures

Test measure in original and adapted form Choose measure that performs the best

41

Analyzing New (Modified) Measure

Factor analysis – All original items– Original plus new items replacing original

Correlations with other variables– Does the new measure detect stronger associations?

Outcome measure– Does the new measure detect more change over

time?

42

Analytic Strategy: CAHPS® 0-10 Global Rating Scale: Response

Usual classifications 0-9, 10 (dichotomy)

Proposed classification 0-8, 9-10

PC Damiano et al, Health Serv Outcomes Res Method, 2004:5:193-205

Can’t change the scale – part of standardized survey

43

Overview of Class 9


sample

44

Questionnaire Guides

Organizing your survey measures– Keep track of measurement decisions

Sample guide to measures (last week)– Documents sources of measures

– Any modifications, reason for modification

45

“Sample Guide to Measures” Handout

Type of variable Concept Measure Data source Number of items/survey question numbers Number of scores or scales for each measure References

46

Sample “Summary of Survey Variables..” Handout

Develop “codebook” of scoring rules Several purposes

– Variable list

– Meaning of scores (direction of high score)

– Special coding

– How missing data handled

– Type of variable (helps in analyses)

47

Item Naming Conventions

Optimal coding is to assign raw items their questionnaire number – Can always link back to questionnaire easily

Some people assign a variable name to the questionnaire item– This will drive you crazy

48

Variable Naming Conventions

Assigning variable names is an important step– make them as meaningful as possible– plan them for all questionnaires at the beginning

For study with more than one source of data, a suffix can indicate which point in time and which questionnaire– B for baseline, 6 for 6-month, Y for one year– M for medical history, L for lab tests

49

Variable Naming Conventions (cont)

Medical History Questionnaire

HYPERTMB HYPERTM6

Baseline 6 months

50

Variable Naming Conventions (cont)

A prefix can help sort variable groupings alphabetically– e.g., S for symptoms

SPAINB, SFATIGB, SSOBB

51

Overview of Class 9


sample

52

On to Your Field Test or Study

What to do once you have your baseline data

How to create summated scale scores

53

Preparing Surveys for Data Entry: 4 Steps

Review surveys for data quality Reclaim missing and ambiguous data Address ambiguities in the questionnaire

prior to data entry Code open-ended items

54

Review Surveys for Data Quality

Examine each survey in detail as soon as it is returned, and mark any..– Missing data

– Inconsistent or ambiguous answers

– Skip patterns that were not followed

55

Reclaim Missing and Ambiguous Data

Go over problems with respondent– If survey returned in person, review then

– If mailed, call respondent ASAP, go over missing and ambiguous answers

– If you cannot reach by telephone, make a copy for your files and mail back the survey with request to clarify missing data

56

Address Ambiguities in the Questionnaire Prior to Data Entry

When two choices are circled for one question, randomly choose one (flip a coin)

Clarify entries that might not be clear to data entry person

57

Code Open-Ended Items

Open-ended responses have no numeric code– e.g., name of physician, reason for visiting

physician Goal of coding open-ended items

– create meaningful categories from variety of responses

– minimize number of categories for better interpretability

– Assign a numeric score for data entry

58

Example of Open-Ended Responses

1.What things do you think are important for doctors at this clinic to do to give you high quality care?

Listen to your patients more often Pay more attention to the patient Not to wait so long Be more caring toward the patient Not to have so many people at one time Spend more time with the patients Be more understanding

59

Process of Coding Open-Ended Data

Develop classification scheme– Review responses from 25 or more questionnaires – Begin a classification scheme– Assign unique numeric codes to each category– Maintain a list of codes and the verbatim answers

for each– Add new codes as new responses are identified

If a response cannot be classified, assign a unique code and address it later

60

Example of Open-Ended Codes

Communication = 1 Listen to your patients more often = 1 Pay more attention to the patient = 1 Access to care = 2 Not to wait so long = 2 Not to have so many people at one time = 2Allow more time = 3 Spend more time with the patients = 3Emotional Support = 4 Be more understanding = 4 Be more caring toward the patient

61

Verify Assigned Codes

Have a second person independently classify each response using final codes

Investigator can review a small subset of questionnaires to assure that coding assignment criteria are clear and are being followed

62

Reliability of Open-Ended Codes

Depends on quality of question, of codes assigned, and the training and supervision of coders

Initial coder and second coder should be concordant in over 90% of cases

63

Data Entry

Set up file Double entry of about 10% of surveys

– SAS or SPSS will compare two for accuracy» Acceptable 0-5% error» If 6% or greater – consider re-entering data

64

Print Frequencies of Each Item and Review: Range Checks

Verify that responses for each item are within acceptable range– Out of range values can be checked on

original questionnaire» corrected or considered “missing”

– Sometimes out of range values mean that an item has been entered in the wrong column» a check on data entry quality

65

Print Frequencies of Each Item and Review: Consistency Checking

Determine that skip patterns were followed Number of responses within a skip pattern

need to equal number who answered “skip in” question appropriately

66

Print Frequencies of Each Item and Review: Consistency Checking (N=90)

1. Did your doctor prescribe any medications? (75 = yes, 15 = no)

1a. If yes, did your doctor explain the side effects of the medication? (80 responses)

Often will find that more people answered the second question than were supposed to

67

Print Frequencies of Each Item and Review: Consistency Checking (cont.)

Go back to a questionnaires of those with problems – check whether initial “filter” item was

incorrectly answered or whether respondent inadvertently answered subset

– sometimes you won’t know which was correct Hopefully this was caught during initial

review of questionnaire and corrected by asking respondent

68

Deriving Scale Scores

Create scores with computer algorithms in SAS, SPSS, or other program

Review scores to detect programming errors

Revise computer algorithms as needed Review final scores

69

Creating Likert Scale Scores

Translate codebook scoring rules into program code (SAS, SPSS):– Reverse all items as specified

– Apply scoring rules

– Apply missing data rules Sample for SAS (see handout)

70

Testing Scaling Properties and Reliability in Your Sample for Multi-Item Scales

Obtain item-scale correlations– Part of internal consistency reliability

program Calculate reliability in your sample

(regardless of known reliability in other studies) – internal-consistency for multi-item scales– test-retest if you obtained it

71

SAS – Chapter 3: Assessing Reliability with Coefficient Alpha

Review statements and output How to test your scales for internal

consistency and appropriate item-scale correlations

72

SAS/SPSS Both Make Item Convergence Analysis Easy

Reliability programs provide:– Item-scale correlations corrected for overlap

– Internal consistency reliability (coefficient alpha)

– Reliability with each item removed» To see effect of removing an item

73

SAS – Obtaining Item-Scale Correlations and Coefficient Alpha

PROC CORR– DATA=data-set-name– ALPHA– NOMISS– VAR (list of variables)

Output:– Coefficient alpha– Item correlations– Item-scale correlations corrected for overlap

SAS Manual, Chapter 3: Assessing ScaleReliability with Coefficient Alpha

74

SAS – Chapter 3: Assessing Scale Reliability with Coefficient Alpha

PROC CORR– DATA=data-set-name– ALPHA– NOMISS– VAR (list of variables)

Output:– Coefficient alpha– Item correlations– Item-scale correlations corrected for overlap

75

Testing Reliability in STATA

www.stata.com/help.egi?alpha

Alpha varlist [if] [in] [, options]

SEE HANDOUT

http://www.stata.com/help.egi?alpha

76

What if Reliability is Too Low?

Have to decide if you need to modify a scale New scales under development

– Modify using item-scale criteria Standard scales – cannot change

– Simply report problems as caveats in your analyses If problem is substantial

– Can create a modified scale and report results using standard and modified scale

77

Value of Pretesting: Experts Say..

…evidence from our work suggests that many survey questions are seriously underevaluated

Evaluating items at final pretest phase is often too late in the process– Too late for extensive question redesign

A series of question evaluation steps is needed beginning well before the survey

FJ Fowler and CF Cannell. Using behavioral coding to identify problems with survey questions. In Answering Questions…, eds N Schwarz et al, Jossey-Bass, 1996

78

Homework for Class 10

Conduct 2 pretest interviews with individuals similar to your target population – Administer all questions– Administer your 4 probes

Summarize briefly your pretest results Indicate whether the measure appears to be

appropriate for the 2 pretest subjects– No inferences to broader sample needed