48
Chapter 8 Sampling 1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

  • View
    274

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 1

Chapter 8

Producing Data: Sampling

Page 2: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Objectives (BPS chapter 8)Producing Data: Sampling

Observation versus experiment

Population versus sample

Sampling methods

How to sample badly

Simple random samples

Other sampling designs

Caution about sample surveys

Learning about populations from samples (inference)

Page 3: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 3

Experiments vs. Observational Studies Experiment

– experimenter determines which units receive which treatments (ideally using some form of random allocation)

– Deliberately imposes some treatment on individuals in order to observe their responses.

– Studies whether the treatment causes change in the response.

Observational study– compare units that happen to have received each of the treatments– often useful for identifying possible causes of effects, but cannot

reliably establish causation

Only properly designed and executed experiments can reliably demonstrate causation.

Page 4: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 4

Population The complete collection of all subjects or

objects (scores, people, measurements, and

so on) that are being studied.

The collection is complete in the sense that it

includes all subjects to be studied.

Page 5: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 5

Census: The collection of data from every

individual in a population.

Sample : A subset of elements drawn from a population from which we collect data.

The sample must be a good representative of the entire population.

A sampling design describes exactly how to choose a sample from the population.

Page 6: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 6

Population

individuals

Page 7: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 7

Sampling Frame

List of individuals that could possibly be selected

for the sample (not necessarily the same as the population)

Page 8: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 8

List of Individuals123456789

1011121314151617

Census

1

9

23 4 5 6

78

10

17161513

14

1211

1

9

23 4 5 6

78

10

17161513

14

1211

Census

Page 9: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 9

Sampling Frame

1

9

23 4 5 6

7810

17161513

14

1211

List of Individuals123456789

1011121314151617

Sample

Page 10: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 10

Example

Suppose we are interested in the average age of all Malaspina students.

The relevant population is all Malaspina students (including students in all campuses).

Possible Sampling Frame: List of Malaspina students at the Nanaimo campus.

Page 11: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 8 Sampling 11

Example Cont.

A sample can be students in this Math 161 class, or, 50 randomly selected Malaspina students at the Nanaimo campus.

If we use the ages of all Malaspina students, then we have a census.

Page 12: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 12

Thought Question

Popular magazines often contain surveys that ask their readers to answer questions about hot topics in the news. Do you think the responses the magazines receive are representative of public opinion? Explain why or why not.

Page 13: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 13

Thought Question

Suppose you access an online listing of all courses at your institution, alphabetized by department, to determine what proportion of all courses have a statistics course as a prerequisite. If you decide to sample 50 courses in order to get a representative sample of courses, how would you select them? Would it be appropriate to simply select the first 50 courses listed?

Page 14: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 14

Bad Sampling Plans

Convenience sampling

– selecting individuals who are easiest to reach

– Problem:– Sample might not be representative of the

target population.

Page 15: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 15

Convenience Sampling

Sampling mice from a large cage to study how a drug affects physical activity– lab assistant reaches into the cage to select

the mice one at a time until 10 are chosen

Which mice will likely be chosen?– could this sample yield biased results?

Page 16: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 16

Bad Sampling Plans

Voluntary response sampling

– allowing individuals to choose to be in the sample

Problem:– People with strong opinions (or feelings) about the

issue tend to respond.

– Example: RateMyProfessor.com

Page 17: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 17

Voluntary Response To prepare for her book Women and Love, Shere

Hite sent questionnaires to 100,000 women asking about love, sex, and relationships.– 4.5% responded– Hite used those responses to write her book

Moore (Statistics: Concepts and Controversies, 1997) noted:– respondents “were fed up with men and eager to fight

them…”– “the anger became the theme of the book…”– “but angry women are more likely” to respond

Page 18: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

CNN on-line surveys:Bias: People have to care enough about an issue to bother replying.

This sample is probably a combination of people who hate “wasting the

taxpayers’ money” and “animal lovers.”

Page 19: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 19

Bias

The design of a statistical study is biased if it systematically favours certain outcomes.

Convenience Sampling and Voluntary

Response Sampling often produce biased samples.

Page 20: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 20

Polls and Surveys

Data carelessly collected (even if the sample size is large), is subject to a high degree of bias.

To avoid biases, samples must be randomly chosen.

Page 21: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 21

Avoiding Bias

We select a sample in order to get information about some population.

How can we choose a sample that fairly represents the population?

Probability Sample: A sample chosen by chance. We must know what samples are possible and what chance (probability), each possible sample has.

Page 22: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 22

Simple Random Sampling

Each individual in the population has the same chance of being chosen for the sample

Each group of individuals in the population of the required size (n) has the same chance of being the sample actually selected

Page 23: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 23

Simple Random Sample (SRS)

A simple random sample (SRS) of size nconsists of n individuals from the populationchosen in such a way that every set of nindividuals has an equal chance of being selected.

Page 24: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 24

How to choose an SRS

– Label each individual in the population with a unique number

– “drawing names (numbers) out of a hat”

– random number table (see Table B on pg. 686 of text)

– computer software (www.randomizer.org) or see textbook website (http://bcs.whfreeman.com/bps4e)

Statistical Applets – Simple Random Sample

Page 25: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 25

Simple Random Sampling

Suppose there are 800 courses at an institution, alphabetized by department (and numbered 001-800), and you decide to randomly select 50 of them to determine what proportion of all the courses have a statistics course as a prerequisite. Use a random number table to select which 50 courses to sample.

Example: Courses with Statistics Prerequisite

Page 686 of textbook:Pick a line and column at random: suppose we get line 111, column 3Random numbers: 605 130 929 700 412 712

TRY: Use line 126, column 1:Random numbers: 969 271 993 136 809 741

Page 26: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 26

Systematic Sample

randomly select a member of the sampling frame for the sample

using a set procedure or rule, select the rest of the individuals for the sample– for example, randomly select an individual

from the sampling frame, and then select every 25th member of the sampling frame to be in the sample

Page 27: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 27

Stratified Random Sample first divide the population into groups of similar

individuals, called strata second, choose a separate simple random sample

in each stratum third, combine these simple random samples to form

the full sample– if only certain strata are (randomly) chosen to be used,

and all subjects in these strata make up the sample, then we have a cluster sample.

– the population is often divided according to geographic regions (called clusters).

Page 28: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 28

Multistage Sample divide the population of interest into groups randomly select some of those groups divide the resulting collection of individuals into

smaller groups randomly select some of those groups continue dividing the resulting collection of

individuals into groups and randomly selecting some of those groups until you can simply list all of the resulting individuals and randomly select n of them for your sample

Page 29: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 29

Probability Sampling Plans

Simple random sampling (SRS) Systematic sampling Stratified random sampling Cluster sampling Multistage sampling

Page 30: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 30

Steps for Designing a Study

1. Identify your objective

2. Develop a plan: Experiment or

Observational study

3. Use a random procedure to collect data

4. Analyze the data and form conclusions

Page 31: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 2 31

Hey!Do you believe

in the deathpenalty?

_________________ Sampling - use results that are readily available

Page 32: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 32

__________________ - selection so

that each has an equal chance of being selected

Page 33: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 33

_____________ Sampling - Select some starting point and then select every Kth element in the population

Page 34: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 34

_______________ Sampling - subdivide the population into subgroups (strata) that share the same characteristic, then draw a sample from each stratum

Page 35: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 35

__________________ Sampling - divide the population into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters

Page 36: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 36

Thought Question

When surveying students on their opinions on their professor’s teaching methods, do you think it matters who conducts the interviews? Explain your answer with an example.

Page 37: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 37

Sources of Error in Surveys

Random sampling reduces bias in choosing a sample and allows control of variability.

Sampling in the real world is more complex and less reliable than we might hope for.

Confidence statements do not reflect all sources of error that are present in sampling.

Page 38: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 38

Sampling Errors – Errors that are caused by the act of taking a sample.

Random Sampling Error: the difference between a sample result and the true population result; such an error results from chance sample fluctuations.

- Measured by the margin or error.

Nonsampling Errors – Errors that are not related to the act of taking a sample.

Example: Sample data that are incorrectly collected, recorded, or analyzed (such as using a defective instrument, or copying the data incorrectly).

Nonsampling errors can be much larger than the sampling errors.

Page 39: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 39

Sampling Errors

Using the wrong sampling frame.

Undercoverage: Excluding some units in the population.

Page 40: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 40

Sampling Errors

Disasters– Using voluntary response (self selection)

– Using a convenience or haphazard sample

cannot extend results to the population of interest(need a broad cross-section of the population)

Page 41: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 41

Difficulties– Processing errors (data entry, calculations)– Wording of questions / Response error

Disasters– Nonresponse (cannot contact subjects or

they do not respond)

Nonsampling Errors

Page 42: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 42

Sources of Nonsampling Errors

Non-response bias: Cannot contact subjects or they do not respond.

- Nonrespondents often behave or think differently

from respondents. – low response rates can lead to huge biases.

Processing Errors:

Data that are incorrectly collected, recorded, calculated etc.

Page 43: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 43

Nonsampling errors cont. Survey format effects:

Factors such as question order, questionnaire layout, self -administered questionnaire or interviewer, can affect the results.

Interviewer effects:Different interviewers asking the same questions can tend to obtain different answers.

Response bias: Fancy term for lying when you think you should not tell the truth. Like if your family doctor asks: “How much do you drink?” Or a survey of female students asking: “How many men do you date per week?” People also simply forget and often give erroneous answers to questions about the past.

Page 44: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 44

Concerns when Asking Survey Questions

Deliberate bias Unintentional bias Desire to please Asking the uninformed Unnecessary complexity Ordering of questions Confidentiality and anonymity

Page 45: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 45

Confidentiality and Anonymity

Confidential answer– respondent is known, but the information is

a secret– facilitates follow-up studies

Anonymous answer– the respondent is not known, or cannot be

linked to his/her response– usually yields more truthful answers

Page 46: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 46

Dealing with errors

Statistical methods are available for estimating the likely size of sampling errors.

-margin of error gives the sampling error.

All we can do with nonsampling errors is to try to minimize them at the study-design stage.

Pilot Survey: One tests a survey on a relatively small group of people to try to identify any problems with the survey design before conducting the survey proper.

Page 47: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 47

Learning about Populations from Samples

The techniques of inferential statistics allow us to draw inferences or

conclusions about a population from a sample.

– Your estimate of the population is only as good as your sampling

design Be sure to eliminate possible biases .

– Your sample is only an estimate—and if you randomly sampled

again, you would probably get a somewhat different result.

– The bigger the sample the better. We’ll get back to it in later

chapters.

Page 48: Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Chapter 4 48

QuizFor each study, identify: the population, the sample, the sampling method and any possible biases.

1) To assess the opinions of students at VIU regarding campus safety, a reporter interviews 15 students he meets walking on the campus late at night who are willing to give their opinions.

2) An SRS of 1200 adult Americans is selected and asked: “In light of the huge national deficit, should the government at this time spend additional money to establish a national system of health insurance?” Thirty-nine percent of those responding answered yes.