BUS304 – Data Collection 1
Chapter 1 Data Collection
Descriptive Statistics
Tools that collect, present and describe data
Collecting Data
Collecting Data
Characterizing Data
Characterizing Data
Presenting Data
Presenting Data
survey
observation
experiments
etc
Mathematical
description of data: e.g.
average housing price;
stock price volativity.
BUS304 – Data Collection 2
Population and Samples
A statistic research always starts with a question: What is the average starting
salary for a business major?
How is the housing price in San Diego area?
Are the college textbooks too expensive?
Who is a more valuable player? Reggie Bush or Vince Young?
What else?
Population:
-- All the items that are of interest
Sample:
-- A subset of the population
(or say, part of the population)
a b c d
ef gh i jk l m n
o p q rs t u v w
x y z
PopulationSample
b c
g i n
o r u
y
How to determine? -- Check whether it covers all the items of interest
Exercise: Determine the population for each question on the leftHow to determine? -- Check whether it covers all the items of interest
Exercise: Determine the population for each question on the left
BUS304 – Data Collection 3
Sampling:
Techniques to select only part of the population to conduct the
study
The result will be less reliable than that from studying the
population
But sometimes it is more reasonable to use sample than use
population
Less time consuming
Less costs
Sometimes, study is destructive. e.g. matches
Think of some examples of sampling.
BUS304 – Data Collection 4
Sampling Techniques
Non-Statistical Sampling
Samples are selected at
convenience
Results will be subject to
bias
Examples:
• Ask a friend, a neighbor, etc.
• A survey on the Internet;
• Judges.
Statistical Sampling
Use probability theory to
guide the selection
Sampling bias can be
estimated (we will learn how
to estimate later the
semester)
We will learn four
techniques in this category.
BUS304 – Data Collection 5
Four Statistical Sampling Techniques
Simple random Sampling The most basic statistical sampling
method. Select at random Dice, Card, Random number
generator (calculator, Excel)
Exercise: Use random number generator in
Excel to select a sample of ten NBA players and find out the average heights.
Simple Random
Systematic
Stratified
Cluster
BUS304 – Data Collection 6
Four Statistical Sampling Techniques
Systematic Sampling A simplified version of simple random
sampling
Select a random start, and then go by equal space
Question: how to determine the interval so that everyone has a chance to be selected?
Formula:
Interval = Population size / sample size
Simple Random
Systematic
Stratified
Cluster
BUS304 – Data Collection 7
Systematic sampling exercise
Use systematic sampling technique to select 10 NBA players
and find out the average height.
Think? How many random numbers you need to generate?
BUS304 – Data Collection 8
Four Statistical Sampling Techniques
Stratified Sampling Divide the population into subgroups
Use simple random sampling method (or systematic sampling) to select from each group
Combine to form one big sample
Think: what is the benefit of using stratified sampling?
• More representative
Simple Random
Systematic
Stratified
Cluster
BUS304 – Data Collection 9
Stratified sampling exercise
Use stratified sampling technique to select a sample of 10 NBA
players, including 2PFs, 2SFs, 2SGs, 2PGs, and 2Cs.
Find out the average weight.
BUS304 – Data Collection 10
Four Statistical Sampling Techniques
Cluster Sampling Divide the population into subgroups
-- called “clusters”.
Randomly select some subgroups (not
all!)
In each selected subgroup, use random
sampling technique to select sub-
samples
Combine the sub-samples to form one
aggregate sample
Think: when we use cluster sampling?
(e.g. market research, select towns first)
Simple Random
Systematic
Stratified
Cluster
BUS304 – Data Collection 11
Clustered Sampling Technique
Use each NBA team as a cluster
Randomly select 5 teams to conduct the study
In each of the selected teams, select 2 players
Combine them into an aggregate sample of ten.
Think, how many times do you need to use the Random
Number Generator?
Discuss the difference between cluster sampling technique and
stratified sampling technique.
BUS304 – Data Collection 12
Compare different techniques Simple random sampling and systematic sampling:
Need to know the population size
Doesn’t care about the composition of the population
Stratified sampling: Use the information about the population composition to control sample
The sample can be more representative to the population
Cluster sampling: Generally used when you have a geographically distributed population
Divide the population into several geographical areas
Randomly select some areas (not all) to study – cost saving.
Sometimes, a combination of techniques can be used.
BUS304 – Data Collection 13
Discussion Which sampling techniques should be used for (or are used in)
the following studies? – discuss the potential bias of the
techniques.
1. NBC wants to conduct an opinion poll to understand people’s opinion
on Hillary Clinton’s chance of being selected as president in 2008.
2. CSUSM wants to collect opinions about how the junior faculty
members teach their classes
3. Policemen want to detect drunk drivers to prevent potential accidents.
4. Oscar judges determine the best pictures of the year.
5. Fans vote for the NBA all-star team.
6. American Citizens vote for president.
BUS304 – Data Collection 14
Summary
In today’s lecture:
Two important concepts: Population and Sample
Four Sampling Techniques:
• Simple Random Sampling
• Systematic Sampling
• Stratified Sampling
• Cluster Sampling