Upload
howard-stevens
View
217
Download
0
Embed Size (px)
Citation preview
Chapter 12Chapter 12Sampling Design
How do we gather How do we gather data?data?• Surveys Surveys • Opinion pollsOpinion polls• InterviewsInterviews• StudiesStudies
– ObservationalObservational– Retrospective (past)Retrospective (past)– Prospective (future)Prospective (future)
• ExperimentsExperiments
PopulationPopulation• the entire group of the entire group of individuals that we individuals that we want information aboutwant information about
CensusCensus• a complete count of the a complete count of the populationpopulation
How good is a How good is a census?census?
Do frog fairy tale . . .Do frog fairy tale . . .
The answer is 83!The answer is 83!
Why would we not Why would we not use a census all use a census all
the time?the time?1)1) Not accurateNot accurate
2)2) Very expensiveVery expensive
3)3) Perhaps impossiblePerhaps impossible
4)4) If using destructive sampling, you would If using destructive sampling, you would destroy population destroy population
• Breaking strength of soda bottlesBreaking strength of soda bottles
• Lifetime of flashlight batteriesLifetime of flashlight batteries
• Safety ratings for carsSafety ratings for cars
Look at the U.S. census – it has a huge amount of
error in it; plus it takes a long to compile the data making the data obsolete
by the time we get it!
Suppose you wanted to know the average weight
of the white-tail deer population in Texas –
would it be feasible to do a census?
Since taking a census of any population takes
time, censuses are VERY costly to do!
SampleSample• A part of the population that A part of the population that
we actually examine in we actually examine in order to gather informationorder to gather information
• Use sample to generalize to Use sample to generalize to populationpopulation
Sampling Sampling designdesign
• refers to the refers to the methodmethod used to choose the used to choose the sample from the sample from the populationpopulation
Sampling Sampling frameframe
• a list of a list of everyevery individual in the individual in the populationpopulation
Jelly Blubber ActivityJelly Blubber Activity
• Select 10 Jelly blubbers that Select 10 Jelly blubbers that you think are representative of you think are representative of the population of blubbers in the population of blubbers in regards to length.regards to length.
• Find the mean length of your Find the mean length of your samplesample
• consist of consist of nn individuals from the individuals from the population chosen in such a way thatpopulation chosen in such a way that–every individual has an equal every individual has an equal
chance of being selectedchance of being selected–every set of every set of nn individuals has an individuals has an
equal chance of being selectedequal chance of being selected
Simple Random Simple Random Sample (SRS)Sample (SRS)Suppose we were to take an SRS of 100 PWSH students – put each students’ name in a
hat. Then randomly select 100 names from the hat. Each
student has the same chance to be selected!
Not only does each student has the same chance to be selected – but every possible group of 100 students has the same chance to be selected! Therefore, it has to be possible for all 100 students
to be seniors in order for it to be an SRS!
Stratified Stratified random random samplesample
• population is divided into population is divided into homogeneous groups homogeneous groups called stratacalled strata
• SRS’s are pulled from each SRS’s are pulled from each stratumstratum
Homogeneous groups are groups that are alike based upon some
characteristic of the group members.
Suppose we were to take a stratified random sample of 100 Nimitz students. Since students
are already divided by grade level, grade level can be our strata.
Then randomly select 50 seniors and randomly select 50 juniors.
Systematic Systematic random random samplesample• select sample by select sample by
following a systematic following a systematic approachapproach
• randomly select where to randomly select where to beginbegin
Suppose we want to do a systematic random sample of Nimitz students -
number a list of students(There are approximately 2000 students – if we want a sample of 100, 2000/100 =
20)Select a number between 1 and 20 at
random. That student will be the first student chosen, then choose
every 20th student from there.
Cluster Cluster SampleSample
• based upon locationbased upon location
• randomly pick a randomly pick a location & sample location & sample allall therethere
Suppose we want to do a cluster sample of Nimitz students. One
way to do this would be to randomly select 10 classrooms during 2nd period. Sample all
students in those rooms!
For the Jelly Blubber For the Jelly Blubber colony:colony:
= 19.41= 19.41
Multistage Multistage samplesample
• select successively select successively smaller groups within smaller groups within the population in stagesthe population in stages
• SRS used at each stageSRS used at each stage
To use a multistage approach to sampling Nimitz students, we could
first divide 2nd period classes by level (AP, Honors, Regular, etc.) and
randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The
selection process is done in stages!
SRSSRS•AdvantageAdvantagess–UnbiasedUnbiased–EasyEasy
•DisadvantagesDisadvantages– Large varianceLarge variance– May not be May not be
representativerepresentative– Must have Must have
sampling frame sampling frame (list of (list of population)population)
StratifiedStratified•AdvantagesAdvantages
– More precise More precise unbiased unbiased estimator than estimator than SRSSRS
– Less variabilityLess variability– Cost reduced Cost reduced
if strata if strata already existsalready exists
•DisadvantagesDisadvantages– Difficult to do if Difficult to do if
you must divide you must divide stratumstratum
– Formulas for SD & Formulas for SD & confidence confidence intervals are more intervals are more complicatedcomplicated
– Need sampling Need sampling frameframe
Systematic Random Systematic Random SampleSample
•AdvantagesAdvantages– UnbiasedUnbiased– Ensure that the Ensure that the
sample is sample is spread across spread across populationpopulation
– More efficient, More efficient, cheaper, etc.cheaper, etc.
•DisadvantageDisadvantagess– Large varianceLarge variance– Can be Can be
confounded by confounded by trend or cycletrend or cycle
– Formulas are Formulas are complicatedcomplicated
Cluster Cluster SamplesSamples
•AdvantageAdvantagess– Unbiased Unbiased – Cost is Cost is
reducedreduced
•DisadvantagesDisadvantages– Clusters may not Clusters may not
be be representative of representative of populationpopulation
– Formulas are Formulas are complicatedcomplicated
Identify the sampling Identify the sampling designdesign
1)The Educational Testing Service 1)The Educational Testing Service (ETS) needed a sample of (ETS) needed a sample of colleges. ETS first divided all colleges. ETS first divided all colleges into groups of similar colleges into groups of similar types (small public, small types (small public, small private, etc.) Then they private, etc.) Then they randomly selected 3 colleges randomly selected 3 colleges from each group.from each group.
Stratified random Stratified random samplesample
2) A county commissioner wants to 2) A county commissioner wants to survey people in her district to survey people in her district to determine their opinions on a determine their opinions on a particular law up for adoption. She particular law up for adoption. She decides to randomly select blocks decides to randomly select blocks in her district and then survey all in her district and then survey all who live on those blocks.who live on those blocks.
Identify the sampling Identify the sampling designdesign
Cluster samplingCluster sampling
3) A local restaurant manager wants 3) A local restaurant manager wants to survey customers about the to survey customers about the service they receive. Each night service they receive. Each night the manager randomly chooses a the manager randomly chooses a number between 1 & 10. He then number between 1 & 10. He then gives a survey to that customer, gives a survey to that customer, and to every 10and to every 10thth customer after customer after them, to fill it out before they leave.them, to fill it out before they leave.
Identify the sampling Identify the sampling designdesign
Systematic random samplingSystematic random sampling
Random Random digit tabledigit table
• each entry is equally likely each entry is equally likely to be any of the 10 digitsto be any of the 10 digits
• digits are independent of digits are independent of each othereach other
The following is part of the random digit table found on page 847 of your textbook:
Row
1 4 5 1 8 5 0 3 3 7 1
2 4 2 5 5 8 0 4 5 7 0
3 8 9 9 3 4 3 5 0 6 3
Numbers can be read across.
Numbers can be read vertically.
Numbers can be read diagonally.
Suppose your population consisted of these 20 Suppose your population consisted of these 20 people:people:
1) Aidan6) Fred 11) Kathy 16) Paul2) Bob 7) Gloria 12) Lori 17) Shawnie3) Chico 8) Hannah 13) Matthew 18) Tracy4) Doug 9) Israel 14) Nan 19) Uncle Sam5) Edward 10) Jung 15) Opus 20) Vernon
Use the following random digits to select a sample of five from these people.
We will need to use double digit random
numbers, ignoring any number greater than 20. Start with Row 1
and read across.
Row1 4 5 1 8 0 5 1 3 7 12 0 1 5 5 8 0 1 5 7 03 8 9 9 3 4 3 5 0 6 3
Ignore.
18) Tracy
5) Edward
13) Matthew
1) Aidan
15) Opus
Ignore.Ignore.Ignore.
Stop when five people are selected. So my sample would
consist of :
Aidan, Edward, Matthew, Opus, and Tracy
BiasBias• A systematic error in A systematic error in
measuringmeasuring the estimate the estimate
• favors certain outcomesfavors certain outcomesAnything that causes the data to be wrong!
It might be attributed to the researchers, the
respondent, or to the sampling method!
Sources of Sources of BiasBias
• things that things that can causecan cause bias in your samplebias in your sample
• cannot do anything cannot do anything with bad datawith bad data
Voluntary Voluntary responseresponse
•People chose to People chose to respond respond
•Usually only people Usually only people with very strong with very strong opinions respondopinions respond
An example would be the surveys in magazines that ask readers to mail in the survey.
Other examples are call-in shows, American Idol, etc.
Remember, the respondent selects themselves to
participate in the survey!
Remember – the way to determine
voluntary response is:
Self-selection!!
Convenience Convenience samplingsampling
•Ask people who Ask people who are easy to askare easy to ask
•Produces bias Produces bias resultsresults
An example would be stopping friendly-looking people in the
mall to survey. Another example is the surveys left on
tables at restaurants - a convenient method!
The data obtained by a convenience sample will be
biased – however this method is often used for surveys &
results reported in newspapers and magazines!
UndercoveraUndercoveragege
•some groups of some groups of population are left population are left out of the out of the sampling processsampling process
Suppose you take a sample by
randomly selecting names from the phone
book – some groups will not
have the opportunity of being selected!
People with unlisted phone numbers – usually high-income families
People without phone numbers –usually low-income families
People with ONLY cell phones – usually young adults
NonresponseNonresponse•occurs when an individual occurs when an individual chosen for the sample chosen for the sample can’t be contacted or can’t be contacted or refuses to cooperaterefuses to cooperate
•telephone surveys 70% telephone surveys 70% nonresponsenonresponse
People are chosen by the researchers, BUT refuse to
participate.
NOTNOT self-selected!
This is often confused with voluntary response!
Because of huge telemarketing efforts in the past few years,
telephone surveys have a MAJOR problem with
nonresponse! One way to help with the
problem of nonresponse is to make follow contact with
the people who are not home when you first contact
them.
Response biasResponse bias•occurs when the occurs when the behavior of respondent behavior of respondent or interviewer causes or interviewer causes bias in the samplebias in the sample
•wrong answerswrong answers
Suppose we wanted to survey high school students on drug
abuse and we used a uniformed police officer to
interview each student in our sample – would we get honest
answers?
Response bias occurs when for some reason (interviewer’s or
respondent’s fault) you get incorrect answers.
Wording of the Wording of the QuestionsQuestions
•wording can influence the wording can influence the answers that are givenanswers that are given
•connotation of wordsconnotation of words•use of “big” words or use of “big” words or technical wordstechnical words
Questions must be worded as neutral as possible to
avoid influencing the response.
The level of vocabulary should be appropriate for
the population you are surveying
– if surveying Podunk, TX, then you should
avoid complex vocabulary.
– if surveying doctors, then use more
complex, technical wording.
Source of Bias?Source of Bias?1) Before the presidential election of 1) Before the presidential election of 1936, FDR against Republican ALF 1936, FDR against Republican ALF Landon, the magazine Landon, the magazine Literary DigestLiterary Digest predicting Landon winning the election predicting Landon winning the election in a 3-to-2 victory. A survey of 2.8 in a 3-to-2 victory. A survey of 2.8 million people. George Gallup million people. George Gallup surveyed only 50,000 people and surveyed only 50,000 people and predicted that Roosevelt would win. predicted that Roosevelt would win. The Digest’s survey came from The Digest’s survey came from magazine subscribers, car owners, magazine subscribers, car owners, telephone directories, etc.telephone directories, etc.
Undercoverage – since the Digest’s survey comes from car owners, etc., the people selected were mostly from high-income families and thus mostly Republican! (other answers are possible)
2) Suppose that you want 2) Suppose that you want to estimate the total to estimate the total amount of money spent by amount of money spent by students on textbooks each students on textbooks each semester at SMU. You semester at SMU. You collect register receipts for collect register receipts for students as they leave the students as they leave the bookstore during lunch one bookstore during lunch one day.day.
Convenience sampling – easy way to collect data
orUndercoverage – students who
buy books from on-line bookstores are included.
3) To find the average 3) To find the average value of a home in value of a home in Irving, one averages the Irving, one averages the price of homes that are price of homes that are listed for sale with a listed for sale with a realtor.realtor.
Undercoverage – leaves out homes that are not for sale or
homes that are listed with different realtors.
(other answers are possible)