64
Eric Vance Director of LISA Assistant Research Professor Department of Statistics gning Surveys and Interpreting the Res A LISA Short Course July 16, 2012 Fralin Auditorium

Eric Vance Director of LISA Assistant Research Professor Department of Statistics

  • Upload
    chick

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Designing Surveys and Interpreting the Results A LISA Short Course. Eric Vance Director of LISA Assistant Research Professor Department of Statistics. July 16, 2012 Fralin Auditorium. This workshop will consist of six sections :. ?. Section 1: Introductions. Data. Experiments. - PowerPoint PPT Presentation

Citation preview

Page 1: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Eric VanceDirector of LISA

Assistant Research ProfessorDepartment of Statistics

Designing Surveys and Interpreting the ResultsA LISA Short Course

July 16, 2012Fralin Auditorium

Page 2: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

1. Introductions 2. Survey Fundamentals

3. Questionnaire Design

4. Survey Implementation

5. Interpreting Results

6. Questions and Answers

This workshop will consist of six sections:

?

Page 3: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 1: Introductions

Decisions

Data

Experiments

Understanding

Knowledge

Page 4: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu

LISA helps VT researchers benefit from the use of Statistics

Experimental Design • Data Analysis • Interpreting ResultsGrant Proposals • Software (R, SAS, JMP, SPSS...)

LISA’s mission is to train statisticians to become interdisciplinary collaborators and promote the value of statistical thinking in

all phases of scientific research. In 2011 there were 1046 total clients of LISA’s three services:• 355 Collaborative Projects from 62 VT departments• 304 visitors to Walk-in Consulting (M-F, 1-3PM in the GLC)• 387 attendees at LISA Short Courses

Page 5: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

www.lisa.stat.vt.edu/?=short_courses

Slides, plots, and R code are available on this website by clicking on the course title

Page 6: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Statistics in All Phases of Research

Decisions

Data

Design experiments to answer research questions

Visualize the data; analyze with

statistical models

Control sources of variation,detect outliers

Experiments

Understanding

KnowledgeDetermine practical

and statistical significance

of results

Make scientifically sound decisions and

communicate them

Page 7: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 2: Survey Fundamentals

Page 8: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Statistical analyses should start with a research question

1. What are your questions?What do you want to know?

2. Do you need data to answer these questions? If so, what data do you need?

3. How will you collect these data?

Page 9: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

There are many ways to collect data besides surveys

• Direct measurements or observations: instead of asking someone how much water they use, install a water meter.

• Use existing data sources: collect data from the water utility

• Explore complex issues with focus groups: interview a group of women about how they use water in their daily lives.

Page 10: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Scientific surveys measure the statistics behind the stories

• Personal interviews and focus groups are designed to collect anecdotes and to understand “why” something is happening

• Surveys can tell you “what” is happening to a large number of people under different situations

• Surveys can also help explain “why” something is happening

• Surveys can transform stories into numbers and allow us to make statistical comparisons: before vs. after; here vs. there; Group 1 vs. Group 2

Page 11: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Scientific surveys elicit unbiased answers from samples representative of the overall population

• In Section 3 we will discuss ways to elicit accurate responses

• The key concept of a survey is that the one can draw conclusions about the overall population based on the results from a much smaller sample

• An exit poll of 1000 randomly sampled voters will be within 3% of the final vote percentage 95% of the time

• The more precise you need your final result to be, the more people you must sample: an exit poll of 9600 voters will be within 1% of the final vote percentage 95% of the time

Page 12: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

5 steps involved in designing a survey1. Clearly define your research objectives (What do

you want to know?)

2. Define the population to be sampled (Who do you want to know about?)

3. Develop a sampling plan (Will my sample be representative of the population I care about?)

4. Design a questionnaire to minimize errors and biases (How does each question relate to your research objectives?)

5. Pilot test and retest your survey (Fix errors and start again at Step 1)

Page 13: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Step 1: Clearly define your research objectives

• State CLEARLY and CONCISELY your – Overall research goals – Specific scientific questions

• Refer to these objectives constantly throughout the design of your survey to ensure your survey is answering the desired questions of interest

Page 14: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Step 2: Define the population to be sampled

• Subject: Any material we measurePlant, person, piano, etc.

• Population: representation of all the possible outcomes or measurements of interest

• Sample: subset of the population to be measured (i.e., a group of subjects that represent the population)

Who will you interview to answer your research questions?The population is the overall group of interest or the target group.

Page 15: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Step 3: Develop a sampling plan• Once the target population has been identified, next the

sampling plan must be devised

• Goal: Randomly select a small percent of the population that will in turn represent the ideas of the population as a whole.

• The sampling plan involves:A. The technique used to select the subjects for your study

• Simple random sampling (draw names out of a out or from a list)• Stratified random sampling (sample one person per team/table)• Cluster sampling (randomly pick a team/table and sample

everyone from that table)• Systematic sample (sample the first person from each team/table)

B. The number of people needed for your study• Sample size calculations (sample enough for precise results for $)

Page 16: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Simple random sampling• Subjects chosen by random mechanism• Each subject has an equal chance of being in the study• Easiest to summarize BUT most tedious to implement

in the field

Example: Randomly select 10 students from the Stat 3005 class roster to ask a question.

Page 17: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Stratified random sampling• First divide population into strata (groups) based on

similarity• Then randomly select subjects within each strata

o Easier to implemento May result in more precise summary

Example: Randomly select 5 male students and randomly select 5 female students from the STAT 5615 class roster to ask a question.

Page 18: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Cluster sampling• Population has many clusters• First randomly select a number of clusters• Then sample all the units within each cluster• Require clusters to be representatives of

population

Example:Population: opinions of all students (attending class) at Virginia Tech1. Randomly select a certain number of

classes2. Ask all students in each class their

opinion

Note: Cluster sampling is often NOT as efficient as stratified sampling for surveying.

Page 19: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Systematic sampling

Example:Telemarketers randomly sample every 10th phone number on the Yellow Book to make marketing calls.

• Select every kth subject from a list of all possible subjects

Page 20: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Sample size calculations

• How many people do we interview?Answer: It depends

• Sample size calculations can be computed using statistical methods (Ask a LISA statistical collaborator for help!)

• Sample size calculations also involve characteristics of the study:Time, money, precision required

• For many Gallup polls, the population of interest is all adult Americans. To represent this population, the sample usually consists of around 1,000 adults.When sample sizes approach 500 or more the gains in accuracy get smaller and smaller for the increase in sample size.

Page 21: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Sample size calculation for a proportion

Let n = sample sizeσ = standard deviationd = confidence interval sizeα = significance level

Then, to get a (1-α/2)*100% confidence interval, we need a sample size of:

Page 22: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Sample size calculationfor a proportion

For example, suppose we want an estimate for a 95% confidence interval of width 0.2 (meaning we have a 0.1 margin of error). If we know from a pilot study that the standard deviation of the population is 1, then,

σ = 1d = 0.2α = 0.05

And plugging these numbers into the previous equation, we get,

n = 384.15

Which means we need to sample 385 people.

Page 23: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 3: Questionnaire Design

Page 24: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Step 4: Minimize biases and errors when designing the questionnaire

and sampling planThree major types of biases and errors:

Selection bias or coverage error:Your sample is not representative of your population

Nonresponse bias:Those who respond to your survey are different in important ways from those who choose not to respond

Measurement error:Survey responses are inaccurate

Page 25: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Selection bias or coverage error

• Definition:– Not all members of a population have a known, nonzero chance

of being selected for survey• Problem:

– Survey may turn out to be biased• Possible Solutions:

– Identify target population (might require some expertise in the subject of the survey)

– Construct a sampling frame - a list of all possible respondents– Avoid duplicates and respondents that are outside of target

population; and excluding a portion of target population– Randomize

Page 26: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Nonresponse bias

• Definition:– Survey error that happens when respondents are different from

nonrespondents in a significant way• Problems:

– Filters out certain types of respondents– The reason for which a person responds (or, conversely, does

not respond) to a survey is related to the subject of the survey• Possible Solutions:

– Provide incentives for completing survey– Explain why the survey is important– Keep the survey short and sweet– Give more weight to answers from hard-to-reach respondents

(and ask a statistician for advice)

Page 27: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Nonresponse bias

In a national sample of board-certified physicians, a short survey was mailed asking physicians to nominate the five best hospitals in their specialty regardless of cost or location. Up to three follow-ups were mailed to nonresponders to gain participation. The final response rate was 47.3%.

Males were significantly more likely to respond than females, which would not be an issue if men and women answered in the same way…

But, men were significantly more likely to nominate one or two top hospitals in their specialty. In addition, women were significantly more likely to nominate hospitals only in their region.

Page 28: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Measurement error• Definition:

– Inaccurate answers to survey questions (sometimes due to lack of clarity in writing)

• Problems:– Makes it difficult to judge if answers are accurate– May lead to incorrect conclusions about target

population• Possible Solutions:

– Write clear, concise questions– Be aware of leading questions– Be aware of social factors that may influence

responses– Explain why the survey is important

Page 29: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Measurement error

In a study about measurement error in earnings data, respondents were asked to report their annual wages. The reported wages were then compared to earnings statements on detailed W-2 records.

Not surprisingly, the study found that respondents tended to over-report their wages when compared to their W-2 records. Also, the discrepancy between reported and official wages decreased as official wage increased.

Page 30: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

The best way to write good survey questions is to pilot test and re-write• An article appearing in the International Journal of Market

Research gives great advice about questionnaire design. This YouTube video summarizes 10 things to look out for. http://youtu.be/53mASVzGRF4

Page 31: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Keep the questionnaireas short as possible

• The Creative Research Systems has the following useful suggestions: (http://www.surveysystem.com/sdesign.htm)

• Follow the “KISS” method meaning “Keep it short and simple!”• Categorize questions into 3 groups:

– Must Know– Useful to Know– Nice to Know

• If the questionnaire seems too long, start omitting the “nice to know” questions

• Don’t get caught in the trap where you find that you have a captive audience, so you begin asking questions that are not pertinent

Page 32: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Think about the order of questions

• Group related questions together• Choose first question carefully. The first question should:

– Apply to everyone– Be easy to read– Be interesting

• Place sensitive questions near the end– Give respondents a chance to become comfortable with

questionnaire• Ask about sequential events in the order that they occurred• Avoid unintended question order effects

Page 33: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Avoid potential question order effects

• Priming– Early questions refresh respondents’ memory for subsequent

questions• Carryover

– Respondents believe questions are similar and answer them with same criteria

• Consistency– Respondents answer questions similarly to try to appear consistent

• Norm of evenhandedness– Respondents answer questions similarly to try to be fair

• Anchoring– Early questions set a standard for comparison to later questions

• Subtraction– Considerations in answers to early questions are left out of

subsequent judgments• Avoiding extremeness

– Respondents try to seem neutral by choosing some items while rejecting others

Page 34: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Save demographic questions forthe end of the survey

• The following demographic questions should be saved for the end of the questionnaire:Age, Education, income, martial status, etc.

• Ensures that respondents will not feel that they are losing their anonymity when answering the rest of the questions

• Choose the most important questions for your survey to be asked at the beginning of the survey

Page 35: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Open versus closed questions• Open questions allow the respondent to freely answer the

question.Imposes fewer restrictions and allows for more depth in the overall answer

• Closed questions force the respondent to answer the question by choosing from predetermined choices.Advantage: Ease in analysis

• One suggestion is to test the survey on a small group with an open question. From those responses form a closed question that encompasses the categories expressed in the responses to the open questions.

• Allow for an “other” option in closed questions, to permit respondents to write their own responses

Page 36: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Avoid double-barreled questions• Refrain from having two concepts embedded in one question

Example:“Do you have time to read the newspaper every day?”

• Notice you are asking about “time” and “reading the newspaper every day”.

Revision:“Do you read the newspaper every day?”

• If the answer is no, you can create a question to determine the reasons the person does not read the newspaper.

Page 37: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Convert opinions and words into numbers using the Likert scale

• A popular technique in survey design is the use of scaling questions.– Respondents are able to select a number

or category that represents their answer to the survey question.

• Likert scaling is common technique used in questionnaires.– A Likert item is question or statement on

a questionnaire where the respondent gives a rating for their response on a topic.

– The rating is usually the level of agreement the respondent has concerning the statement or question.

– A Likert item is balanced, meaning there is an equal number of positive and negative positions.

http://en.wikipedia.org/wiki/File:Example_Likert_Scale.jpg

Page 38: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Convert opinions and words into numbers using the Likert scale

• The 5-point and 7-point scale responses are the most common • Make sure the visual middle option actually corresponds to the

middle value: Example:

Revision:

• Likert items can be analyzed separately or the items may be summed and the sum can be analyzed. The sum of Likert items is called the Likert Scale.

Disagree Neither agree or disagree Slightly Agree Agree Strongly Agree

v

Disagree Slightly Disagree Neither agree or disagree Slightly Agree Agree

v

Page 39: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Step 5: Pilot test (and retest)your survey

• You should pretest the survey on a smaller sample whenever possible!

• This pilot test can

Allow you to revise the questionnaire if neededAllow you to create a closed question from the responses for an open questionHelp you estimate the variability in the responses to your questions and determine the necessary sample size

Page 40: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 4: Survey Implementation

Page 41: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Decide how to collect the survey data • After we know what data we want to collect, who we want to

survey, and how we ask the questions we must determine the best instrument for collecting the survey data

• Data collection options:Personal Interviews—either paper or PDA/SmartphoneTelephone InterviewsMail SurveysEmail Surveys

• For more discussion of data collection options see http://www.surveysystem.com/sdesign.htm.

Page 42: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Personal interviews

• A face-to-face encounter between the interviewer and the subject

• Advantages:– People usually respond when confronted face-to-face– Can get a better sense of the reaction of the subject– Prevent misunderstandings

• Disadvantages:– More costly– Interviewers who are not trained properly may

introduce bias into the sample

Page 43: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Telephone interviews• Most popular instrument for survey in the United States since

96% of homes have telephones• Personal interviews and telephone interviews are usually the

most successful forms of surveying with response rates around 60 to 75%

• Advantages:– Less expensive than personal interviewing– Random phone numbers can be dialed– Fast results

• Disadvantages:– People are reluctant to answer phone interviews– Phone calls can usually only be made from around 6pm-9pm– Phone surveys normally need to be shorter in length than

personal interviews

Page 44: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Mail surveys

• Advantages:– Cheap– Questionnaire can include pictures – People are able to answer on their own time

• Disadvantages:– Timely processes– Response rates have a tendency to be low

Page 45: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Email surveys

• Advantages– Cheap – Fast– You can attach pictures or sound files

• Disadvantages– People may respond multiple times– People who have email may not be representative of the

population as a whole

Page 46: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

In 2011 in Mozambique we interviewed 1600 households using

PDAs• An on-the-ground statistician ensured the data were of

high quality by downloading the data every night, checking them for errors, giving feedback to the surveyors, and correcting any errors found

Page 47: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 5: Interpreting the Results

Page 48: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Data must be processed, analyzed, and reported

• Download or input your data onto a computer

• Clean your data—start with the most important variables

1. Ensure all data are in the correct format2. Decide what to do with missing data3. Detect outliers and coding errors by visual or

graphical inspection

• Process textual data by reading, classifying, and counting

Page 49: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Example of processing data in Excel

• Pre-workshop survey data downloaded to Excel

• Clean your data

1. Ensure all data are in the correct formatFix numerical data so they are all numerical

2. Decide what to do with missing dataIgnore some missing valuesDecide which missing values should be 0Decide which what to do with “I Don’t Know” responses

3. Detect outliers and coding errors by visual or graphical inspection

• Process text by classifying it and creating a Pareto chart

Page 50: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

The best statistical analysis is often just a plot or graph of your data

• Summarize your data one variable (height) at a timeHistograms show the distribution of the data points

Page 51: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

The best statistical analysis is often just a plot or graph of your data

• Summarize your data one variable (height) at a timeHistograms show the distribution of the data points

Page 52: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

The best statistical analysis is often just a plot or graph of your data

• Summarize your data one variable (height) at a timeBox plots summarize the distribution of the data

Page 53: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

The best statistical analysis is often just a plot or graph of your data

• Summarize your data one variable (height) at a timeBox plots summarize the distribution of the data

Median

75th percentile

25th percentileOutlier?

Page 54: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Summarize two variables based on quant/quant, quant/qual, qual/qual

• The relationship between two quantitative variables can be visualized in a scatter plot and quantified by correlation or regression

• The relationship between a quantitative and qualitative variable can be shown in a side-by-side box plot and summarized with a t-test

• The relationship between two qualitative variables can be shown in a table or a mosiac plot and summarized by Fisher’s Exact Test or a Chi-squared test

Page 55: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Plot two quantitative variableson a scatter plot

• The relationship between two quantitative variables can be visualized in a scatter plot and quantified by correlation or regression

Page 56: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Plot two quantitative variableson a scatter plot

• The relationship between two quantitative variables can be visualized in a scatter plot and quantified by correlation or regression

MalesFemales

Page 57: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Use side-by-side box plots to show the difference between two groups

• The relationship between a quantitative and qualitative variable can be shown in a side-by-side box plot and summarized with a t-test

Page 58: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Summarize qualitative responses by classifying and using a Pareto chart• Process textual data by reading, classifying, and counting• A Pareto chart orders categories from highest to lowest

frequency

Page 59: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Guideline for reportingstatistical results

• Focus on the statistical estimates of quantities you care about and how they compare to other quantities rather than describing the results of a statistical test

Example:The average height of females (n=15) in our sample was 64.9 in. This was 4.5 inches shorter than the average male (69.4 in, n=5).

A 95% Confidence Interval for this difference is (-9.4 to 0.7).

A t-test showed that this difference was not statistically significant (p=0.08).

Page 60: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Section 6: Questions and Answers

• What is the question you most want answered today?

• Can you convince your team that your question should be the one Dr. Vance will answer?

Page 61: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

References• Dillman, Don A., Jolene D. Smyth, and Leah Melani Christian.

Internet, Mail, and Mixed-Mode Surveys: The Tailored Design Method. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc, 2009.

• Lietz, P. (2010) Research into Questionnaire Design. International Journal of Market Research, 52, 2, pp. 249-272.

• Scheaffer, Richard L., William Mendenhall III, and R. Lyman Ott. Elementary Survey Sampling. 6th ed. Belmont, CA: Duxbury, 2006.

• http://en.wikipedia.org/wiki/Likert_scale• http://www.surveysystem.com/sdesign.htm• http://www.csudh.edu/dearhabermas/sampling01.htm • http://www.youtube.com/watch?v=53mASVzGRF4• Eric Vance [email protected]• LISA, http://www.lisa.stat.vt.edu

Page 62: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Mozambique Survey

• Asked people living in villages to rate how “painful” a task it was to fetch water on a 6-point Likert scale (ranging from 1- not painful at all, to 6- extremely painful)– Question was given to households in villages both with

and without a water pump– Some households, especially those without water

pumps, must travel hours per day to fetch water• How can we best depict the resulting data?

– Histograms– Box-and-whisker plots

Page 63: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics
Page 64: Eric Vance Director of  LISA Assistant Research Professor Department of  Statistics

Water Fetch Pain