28
Introduction to Statistical Research Methods (829N1) Maria Savona SPRU (Science Policy Research Unit) Lecture 2: Samples & Data

Samples and Data

Embed Size (px)

DESCRIPTION

Samples and Data

Citation preview

Page 1: Samples and Data

Introduction to Statistical Research Methods (829N1)

Maria Savona

SPRU (Science Policy Research Unit)

Lecture 2: Samples & Data

Page 2: Samples and Data

Basics on sampling

2

Page 3: Samples and Data

From last week’s lecture

Mr. Smith owns a company and he wants to know what are the tastes of his customers? Would he interview all of them?

I need to test the quality of my products. Should I test all of them? How many?

A pharmaceutical company developed a revolutionary drug that is supposed to reduce patients’ fever. How do they test if it works?

3Sampling

Page 4: Samples and Data

What is sampling?

In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population.

4

Sample

Population

Inference

Why do we need sampling? It has some advantages: it’s faster, cheaper and at times the only

feasible choice It allows to accurately estimate general features of a population using

data collected from a tiny fraction of the total.

Page 5: Samples and Data

How do we sample from a population?

Sampling process

1. Define the population

2. Specify a sampling frame

3. Specify a sampling method

4. Determine the sample size

5. Collect the data

5

Page 6: Samples and Data

1) Defining the population

The population is the set of all possible cases of interests (e.g. firms, people, students, countries, patents, etc.).

A sample requires a clearly defined population, from which to draw the sample.

Requires conceptual clarity

e.g. consider question: ‘How many hours of study do students have each week’? But who are the ‘student’ population?– Anyone who sees themselves as a student?– Or restricted to those registered in higher education? – Anywhere in UK? Restricted to Sussex? – Full time only? Home and overseas? Etc. 6

Page 7: Samples and Data

2) Specifying a sampling frame

Once the population of interest is defined, we have to specify how we access that population

A sampling frame is the source from which a sample is drawn. It is a list of all those within a population who can be sampled

List of registered students at Sussex UniversitySussex University students

List of firms located in London from Companies HouseAll the firms based in London

Patent database from WIPOAll the patents owned by a company

Sampling framePopulation

7

working in the pharmaceutical sector Selecting those companies with some specificSIC (Standard Industrial Classification) codes

Selecting the patents which have some specific IPC codes (International Patent Classification)

which are about semiconductors

Page 8: Samples and Data

2) Specifying a sampling frame

8

Sampling frame

Sample (drawn from sampling frame)

Target population

Page 9: Samples and Data

2) Specifying a sampling frame

Sampling frame definition is very important; errors in the sampling frame affect the representativeness of the sample

Sometimes the sampling frame does not match completely the population; – e.g. if we are conducting household surveys on poverty in the Brighton area our

sampling frame misses homeless people and thus introduces bias because homeless people are disproportionately poor and they are not included in the sampling frame.

Other potential issues with sampling frames:– Missing elements: some members of the population are not included in the

sampling frame– Foreign elements: the sampling frame includes some non-members of the

population– Duplicate elements: some elements could be included more than once

9

Page 10: Samples and Data

3) Specifying a sampling method

The sampling method is the way in which the sample units are to be selected.

Probability sampling: the probability ofextraction of a population unit is knowna) Simple random samplingb) Systematic samplingc) Stratified sampling

Non-probability samplingd) Quota samplinge) Convenience samplingf) Snowball sampling

10

Inference from the sample to the

population

They do not allow statistical inference,

but they are still informative and

allow some kind of generalizations

Page 11: Samples and Data

a) Simple random sampling

A subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process.

Imagine a school with 1000 students and you want to select 100 for further study. You could select them randomly by pulling their names out of a hat. This random process means that each student has an equal chance (or probability) of being selected.

11

Page 12: Samples and Data

b) Systematic sampling

The population (sampling frame) units are sorted according to some characteristics.– The first sample unit is extracted randomly– The other units are extracting following a step n

12

LIST OF ALL POSTGRADUATES IN SOCIAL SCIENCES

A…………..B…………..C…………..D…….…….E……….….F…………..G…………..…

SAMPLEPick every nth case from list

You randomly start here

Page 13: Samples and Data

The sampling frame is divided into sub-groups (strata) with respect to some relevant population characteristics (e.g. gender, age, etc.) so that units are relatively similar within stratum and different across different strata.

Stratified sampling consists in performing simple random sampling within each stratum, so that all the sub-groups are adequately represented.

For example, we have a population of 1000 people, 500 males and 500 females, and we would like to extract a sample of 100 people, stratified by gender our sample will include 50 randomly selected males and 50 randomly selected females.

13

b) Stratified sampling

Page 14: Samples and Data

LIST OF ALL SOCIAL SCIENCE POSTGRADUATES SORTED BY

DEPARTMENT

SAMPLE

• SPRU

• Sociology

• Geography

• History

• Etc.

Separate random sample drawn from each department (stratum)

How many from each stratum?

14

b) Stratified sampling

Total number of students (N=1000)Sample size (n=50)Number of students at SPRU (80)

Page 15: Samples and Data

Probability sampling allows a precise and accurate estimation of population parameters. However, in some cases probability sampling is not possible. In this case, non-probability sampling is common practice.

Quota sampling is similar to stratified sampling. The population is segmented in sub-groups (strata). Quota sampling does not need a sampling frame for

each stratum, since extraction does not follow a probabilistic rule.

Quota sampling only requires that the same proportions apply to the sample. For example, an interviewer is told to sample 50 females and 50 males between the age of 18 to 30.

15

d) Quota sampling

Page 16: Samples and Data

e) Availability (convenience) Sampling

As the name suggests, the sample is merely based on those who are “easy to find”

– a local factory to provide a sample of workers– this class to provide a sample of students– a bus station near my house to provide a sample of

users of public transport – university students used to provide a sample of

consumers

Some studies do not need a representative sample and the aim is to show a methodology or to test a theory.

16

Page 17: Samples and Data

f) Snowball Sampling

Uses initial respondents to contact new respondents. A useful option where relevant contacts are hard to

identify e.g. where the roles and responsibilities of different personnel in a company are unclear or where you’re interested in surveying a minority population but do not know how to locate more than one or two members of that population.

Depends upon whether members of these populations know other members

Again, it does not grant a representative sample.

17

Page 18: Samples and Data

4) Determining the sample size

Size matters More is better!

If this class is my population (54 students) and I want to find out what is the mean amount of money in your pockets. Let’s say that on average you have £5 per person – obviously some will have more, some less.

If I ask 2… may well pick 2 with nothing, or 2 with £20 If I ask 10… more likely to balance out and give a mean closer to true value. If I ask 20… even more likely….

However working out the appropriate sample size depends on other factors such as the precision level required and budget constraints.

18

Page 19: Samples and Data

4) Determining the sample size

With a sample of 30 units and a population of 100 units we commit an 11.75% error.

19Household expenditure on pizza (with ‘true’ population mean=$20, and standard deviation=9, confidence level 5%). Source: adapted from: Mazzocchi 2008, Box 5.4, p.116.

Same sample (30) and a population of 1000, the error increases to 16.28%.

With a sample of 500 units and a population of 1000, the error is only 1.98%.

With a sample of 1000 units the error is below 3% even with a population of 100 millions units.

Page 20: Samples and Data

Sampling and non-sampling error

An estimate based on a sample can differ from the true population figure because of: sampling error: random chance involved when sampling. Two main factors

affect the amount of sampling error:a) The size of the sample - increasing sample size reduces sampling

errorb) The amount of variation in the population in the characteristic being

measured (age, income etc.) - the more variation, the greater the sampling error (for any given sample size)

non-sampling error: errors arising from all other aspects of the procedurea) Poorly designed sampling frameb) Measurement errors during fieldwork c) Systematic non response d) Systematic attrition

20

Page 21: Samples and Data

Data...!

21

Page 22: Samples and Data

What type of data?

Cross-sectional data

Longitudinal data

– Time series

– Panel data

22

Page 23: Samples and Data

Variables

A variable is a condition or a quality that can differ from one case to another Conceptual definition: literal or general definition of the

variable Operational definition: specifying the criteria for taking

a measurement of that variable

23

For example: We want to measure firm’s innovativeness We define innovativeness as the capacity to produce new

inventions We measure innovativeness by taking the number of patents

of the firm

Page 24: Samples and Data

Scales of measurement

The scale of measurement specifies a range of values that the variable can take

Discrete versus continuous– Discrete data means that there are finite values within a

specified range (e.g. number of children per household)1 2 3 4 5

– Continuous data means that there are infinite values within a specified range (e.g. age)

1 2 3 | 4 5 -> 3years and 6 months

-> 3 years, 6 months and 2 days, etc.24

Page 25: Samples and Data

4 levels of measurement

Nominal measures use numbers simply as labels for different values (e.g., Female=1, Male=2; or Bus=1, Train=2, Car=3).

Ordinal measures are like nominal ones in that they too use numbers simply as labels, but in this case a higher number does indicate more and a lower number less (e.g., ‘How often do you smoke?’ Never=1, Sometimes=2, Frequently=3, Very often=4).

Interval/ratio scales are those that permit to say by how much a case is better or stronger than another.– Interval scales measure the order of data points and the size of the intervals in

between data points.– Ratio scales are interval scales with a true zero point.

Interval scales can have an arbitrary zero reference point, while ratio scales have a true zero point. For example 0 age and 0 income means no age and no income (i.e. the zero reference point is non-arbitrary), while 0°C only indicates the point at which water freezes, it does not mean no heat at all! However, this distinction is not relevant for the kind of analysis we will carry out.

25Incr

easi

ng p

reci

sion

and

mea

ning

Page 26: Samples and Data

Some more examples

26

*** The level of measurement of a variable is important because it determines the techniques that you can use to analyse it ***

Page 27: Samples and Data

How does SPSS classify data?

27

Page 28: Samples and Data

Before next week…

Read “Introduction to SPSS” available on Study Direct

Download and install SPSS by following the instructions available on this page:

http://www.sussex.ac.uk/its/services/software/owncomputer

28