Upload
nicholas-robbins
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Course Overview
1. What is Evaluation?2. Outcomes, Impact, and Indicators3. Why Randomise?4. How to Randomise?5. Sampling and Sample Size6. Threats and Analysis7. Project from Start to Finish8. Cost Effectiveness and Scaling
Lecture Outline
• Precision and accuracy • Statistical toolsPopulation and sampling
distribution Law of Large Number and Central
Limit Theorem Standard deviation and standard
error
Accuracy versus Precision
Precision
(Sampl
e Size)
Accuracy (Randomization)
truth truthestimates
estimates
truth truthestimates
estimates
This session’s question
• How large does the sample need to be for you to be able to detect a given treatment effect?
• Randomization removes the bias (ensures accuracy) but it does not remove noise
• We control precision with sample size
Lecture Outline
• Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit
Theorem Standard deviation and standard error
Population distribution
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0
100
200
300
400
500
600
26
test scores
1 Standard Deviation
Population Frequency
Standard deviation
Population mean
Take a random sample : Sampling distribution
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
test scores
Population distribution
Sampling distribution (1)
Population mean Population mean
Lecture Outline
• Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central
Limit Theorem Standard deviation and standard error
• We generally don’t have a our population distribution but, we have our sampling distribution.
• What do we know about our sampling distribution?
• Two statistical laws help us here (1)Central Limit Theorem (2)The Law of Large Numbers
(1) Central Limit Theorem
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
100200300400500
test scores
To here…
This is the distribution of the population(Population Distribution)
This is the distribution of Means from all Random Samples(Sampling distribution)
Central Limit Theorem
Population
Draw 9Mean test
score
Draw 10Mean test
score
Draw 8Mean test
score
Draw 7Mean test
score
Draw 10 random students, take the average, plot it: 10 times.
Inadequate sample size
No clear distribution around population mean
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748495051520123456789
10
Frequency of Means With 10 draws
More sample means around population mean
Still spread a good deal
Draw 10 random students: 50 and 100 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 50 draws
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means with 100 draws
Distribution now significantly more normal
Starting to see peaks
Draws 10 random students: 500 and 1000 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 500 draws
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 1000 draws
• This is a theoretical exercise. In reality we do not have multiple draws, we only have one draw.
• BUT, we can control the number of people in that draw. This is what we refer to as SAMPLE SIZE.
• The previous example was based on a sample size of 10
• What happens if we take a sample size of 50?
We w
ill appro
ach a bell c
urv...
The b
ell cu
rve w
ill be narro
wer
Both A &
B
Neither.
The underly
ing sa...
25% 25%25%25%
What happens to the sampling distribution if we draw a sample size of 50 instead of 10, and take the mean (thousands of
times)?
A. We will approach a bell curve faster (than with a sample size of 10)
B. The bell curve will be narrower
C. Both A & B D. Neither. The
underlying sampling distribution does not change.
N = 10 N = 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 5 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 5202468
10
Frequency of Means With 10 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10Frequency of Means With 5
Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 5202468
10
Frequency of Means With 10 Samples
(2) Law of Large Numbers
N= 10 N = 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
Lecture Outline
• Precision and accuracy • Statistical tools Population and sampling distribution Law of Large Number and Central Limit
Theorem Standard deviation and standard error
Standard deviation/error
• What’s the difference between the standard deviation and the standard error?
• The standard error = the standard deviation of the sampling distributions
Standard Deviation
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
test scores
Sample FrequencyPopulation mean Standard deviation
Standard Error
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
test scores
Sample FrequencyPopulation mean Standard deviation Standard error
Sample size ↑ x4, SE ↓ ½
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26
test scores
Sample Frequency Population mean Standard deviation Standard errorSample Distribution
Sample size ↑ x9, SE ↓ ?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26
test scores
Sample Frequency Population mean Standard deviation Standard errorSample Distribution
Sample size ↑ x100, SE ↓?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26
test scores
Sample Frequency Population mean Standard deviation Standard errorSample Distribution