Upload
ziva
View
57
Download
2
Embed Size (px)
DESCRIPTION
Measures of Spread. Measures of Center. Range Inter-quartile Range Variance Standard deviation. Mode Median Mean. *. *. *. Measures of Symmetry. Measures of Position. Skewness. Percentile Deviation Score Z-score. *. *. Descriptive Statistics: Overview. Central tendency. - PowerPoint PPT Presentation
Citation preview
Descriptive Statistics: Overview
Measures of Center
Mode Median Mean
*
Measures of Symmetry
Skewness
Measures of SpreadRange Inter-quartile Range VarianceStandard deviation*
*
Measures of Position
Percentile Deviation ScoreZ-score
**
Central tendency
• Seeks to provide a single value that best represents a distribution
Central tendency
0
2
4
6
8
10
12
14
16
18
3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Nightly Hours of Sleep
No.
of
Peop
le
Central tendency
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6
# of wheels
# of
veh
icle
s
Central tendency
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100
120
140
160
180
200
220
240
Income in 1,000s
No.
of
Peo
ple
Central tendency
• Seeks to provide a single value that best represents a distribution
• Typical measures are – mode – median– mean
Mode
• the most frequently occurring score value• corresponds to the highest point on the frequency distribution
0
1
2
3
4
5
33 34 35 36 37 38 39 40 41 42 43 44 45
Score
Fre
quen
cyFor a given sample N=16:
33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45
The mode = 39
Mode
• The mode is not sensitive to extreme scores.
0
1
2
3
4
5
33 35 37 39 41 43 45 47 49
Score
Fre
quen
cy
For a given sample N=16:
33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 50
The mode = 39
Mode
• a distribution may have more than one mode
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cy
For a given sample N=16:
34 34 35 35 35 35 36 37 38 38 39 39 39 39 40 40
The modes = 35 and 39
Mode
• there may be no unique mode, as in the case of a rectangular distribution
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cy
For a given sample N=16:
33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40
No unique mode
Median• the score value that cuts the distribution in half (the
“middle” score)• 50th percentile
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cyFor N = 15the median is the eighth score = 37
Median
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cyFor N = 16the median is the average of the eighth and ninth scores = 37.5
Mean
• this is what people usually have in mind when they say “average”
• the sum of the scores divided by the number of scores
Changing the value of a single score may not affect the mode or median, but it will affect the mean.
For a population:
n
X
For a sample:
n
XX
Mean
X=7.07 In many cases the mean is the preferred measure of central tendency, both as a description of the data and as an estimate of the parameter.0
2
4
6
8
10
12
14
16
18
3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Nightly Hours of Sleep
No.
of
Peop
le
__
In order for the mean to be meaningful, the variable of interest must be measures on an interval scale.
0
1
2
3
4
5
Buddhist
Prote
stant
Cathol
ic
Jewish
Musli
m
Score
Fre
quen
cy
X=2.4__
Mean
The mean is sensitive to extreme scores and is appropriate for more symmetrical distributions.
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cy
X=36.8__
0
1
2
3
4
5
33 34 35 36 37 38 39 40
Score
Fre
quen
cy
X=36.5__
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100
120
140
160
180
200
220
240
Income in 1,000s
No.
of
Peo
ple X=93.2
__
• a symmetrical distribution exhibits no skewness
• in a symmetrical distribution the Mean = Median = Mode
0
2
4
6
8
10
12
14
16
18
3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5
Nightly Hours of Sleep
No.
of
Peop
le
Symmetry
• Skewness refers to the asymmetry of the distribution
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100
120
140
160
180
200
220
240
Income in 1,000s
No.
of
Peo
ple
Skewed distributions
• A positively skewed distribution is asymmetrical and points in the positive direction.
Mode = 70,000$Median = 88,700$Mean = 93,600$
mode mean
median
•mode < median < mean
• A negatively skewed distribution
Skewed distributions
0
1
2
3
4
5
6
7
0 20 40 60 80 100
Test score
No.
of
Peo
ple
• mode > median > mean
modemean
median
Measures of central tendency
+ -
Mode• quick & easy to compute
• useful for nominal data
• poor sampling stability
Median• not affected by extreme scores • somewhat poor sampling
stability
Mean
• sampling stability
• related to variance
• inappropriate for discrete data
• affected by skewed distributions
Distributions
• Center: mode, median, mean• Shape: symmetrical, skewed• Spread
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70 80 90 100
Scores
# of
Peo
ple
Measures of Spread
• the dispersion of scores from the center• a distribution of scores is highly variable if the scores
differ wildly from one another
• Three statistics to measure variability– range– interquartile range– variance
Range
• largest score minus the smallest score
• these two
have same range (80)
but spreads look different
• says nothing about how scores vary around the center
• greatly affected by extreme scores (defined by them)
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70 80 90 100
Scores#
of P
eopl
e
Interquartile range
• the distance between the 25th percentile and the 75th percentile
• Q3-Q1 = 70 - 30 = 40• Q3-Q1 = 52.5 - 47.5 = 5
• effectively ignores the top and bottom quarters, so extreme scores are not influential
• dismisses 50% of the distribution
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70 80 90 100
Scores#
of P
eopl
e
Deviation measures
• Might be better to see how much scores differ from the center of the distribution -- using distance
• Scores further from the mean have higher deviation scores
Score Deviation
Amy 10 -40
Theo 20 -30
Max 30 -20
Henry 40 -10
Leticia 50 0
Charlotte 60 10
Pedro 70 20
Tricia 80 30
Lulu 90 40
AVERAGE 50
Deviation measures
• To see how ‘deviant’ the distribution is relative to another, we could sum these scores
• But this would leave us with a big fat zero
Score Deviation
Amy 10 -40
Theo 20 -30
Max 30 -20
Henry 40 -10
Leticia 50 0
Charlotte 60 10
Pedro 70 20
Tricia 80 30
Lulu 90 40
SUM 0
Deviation measures
So we use squared deviations from the mean
Score Deviation
Sq.
Deviation
Amy 10 -40 1600
Theo 20 -30 900
Max 30 -20 400
Henry 40 -10 100
Leticia 50 0 0
Charlotte 60 10 100
Pedro 70 20 400
Tricia 80 30 900
Lulu 90 40 1600
SUM 0 6000
This is the sum of squares (SS)
SS= ∑(X-X)2__
Variance
We take the “average” squared deviation from the mean and call it VARIANCE
(to correct for the fact that sample variance tends to underestimate pop variance)
For a population:
N
SS2
For a sample:
12
n
SSs
Variance
1. Find the mean.
2. Subtract the mean from every score.
3. Square the deviations.
4. Sum the squared deviations.
5. Divide the SS by N or N-1.
Score Dev’n Sq. Dev.
Amy 10 -40 1600
Theo 20 -30 900
Max 30 -20 400
Henry 40 -10 100
Leticia 50 0 0
Charlotte 60 10 100
Pedro 70 20 400
Tricia 80 30 900
Lulu 90 40 1600
SUM 0 6000 6000/8=750
The standard deviation is the square root of the variance
The standard deviation measures spread in the original units of measurement, while the variance does so in units squared.
Variance is good for inferential stats. Standard deviation is nice for descriptive stats.
Standard deviation
12
n
SSss
Example
0
2
4
6
8
10
12
14
0 10 20 30 40 50 60 70 80 90 100
Scores
# of
Peo
ple
N = 28X = 50s2 = 555.55s = 23.57
N = 28X = 50s2 = 140.74s = 11.86
Descriptive Statistics: Quick Review
Measures of Center
Mode Median Mean
* *
Measures of Symmetry
Skewness
Measures of SpreadRange Inter-quartile Range VarianceStandard deviation* *
* *
Descriptive Statistics: Quick Review
For a population: For a sample:
Mean
Variance
2 SS
N
s2 SS
n 1
Standard Deviation
2ss 2
• Treat this little distribution as a sample and calculate:– Mode, median, mean
– Range, variance, standard deviation
1 2 3 4 5
Exercise
Descriptive Statistics: Overview
Measures of Center
Mode Median Mean
*
Measures of Symmetry
Skewness
Measures of SpreadRange Inter-quartile Range VarianceStandard deviation*
*
Measures of Position
Percentile Deviation ScoreZ-score
**
Measures of Position How to describe a data point in relation to its distribution
Quantile
Deviation Score
Z-score
Measures of Position
Quantiles
Quartile
Divides ranked scores into four equal parts
25% 25% 25% 25%
(minimum) (maximum)(median)
Quantiles
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Divides ranked scores into ten equal parts
Decile
Quantiles
Divides ranked scores into 100 equal parts
Percentile rank of score x = • 100number of scores less than x
total number of scores
Percentile rank
Deviation Scores
Score Deviation
Amy 10 -40
Theo 20 -30
Max 30 -20
Henry 40 -10
Leticia 50 0
Charlotte 60 10
Pedro 70 20
Tricia 80 30
Lulu 90 40
Average 50
For a population:
For a sample:
deviation X X
deviation X
•What if we want to compare scores from distributions that have different means and standard deviations?
•Example –Nine students scores on two different tests
–Tests scored on different scales
Nine Students on Two Tests
Test 1 Test 2
Amy 10 1
Theo 20 2
Max 30 3
Henry 40 4
Leticia 50 5
Charlotte 60 6
Pedro 70 7
Tricia 80 8
Lulu 90 9
Average 50 5
Nine Students on Two Tests
Test 1 Test 2Deviation Score 1
Deviation Score 2
Amy 10 1 -40 -4
Theo 20 2 -30 -3
Max 30 3 -20 -2
Henry 40 4 -10 -1
Leticia 50 5 0 0
Charlotte 60 6 10 1
Pedro 70 7 20 2
Tricia 80 8 30 3
Lulu 90 9 40 4
Average 50 5
Z-Scores
• Z-scores modify a distribution so that it is centered on 0 with a standard deviation of 1
• Subtract the mean from a score, then divide by the standard deviation
For a population: For a sample:
S
XXz
X
z
Z-Scores
Test 1 Test 2 Z- Score 1 Z-Score 2
Amy 10 1 -1.5 -1.5
Theo 20 2 -1.2 -1.2
Max 30 3 -.77 -.77
Henry 40 4 -.34 -.34
Leticia 50 5 0 0
Charlotte 60 6 .34 .34
Pedro 70 7 .77 .77
Tricia 80 8 1.2 1.2
Lulu 90 9 1.5 1.5
Average 50 5 0 0
St Dev 25.8 2.58 1 1
A distribution of Z-scores…
Z-Scores
•Always has a mean of zero
•Always has a standard deviation of 1
•Converting to standard or z scores does not change the shape of the distribution: z scores cannot normalize a non-normal distribution
A Z-score is interpreted as “number of standard deviations above/below the mean”
Exercise
Test 3 Z-Score
Amy 52
Theo 39
Max -1.5
Henry 1.3
On their third test, the class average was 45 and the standard deviation was 6. Fill in the rest.
Descriptive Statistics: Quick Review
For a population: For a sample:
Mean
Variance
Z-score
2 SS
N
s2 SS
n 1
S
XXz
X
z
Standard Deviation
2ss 2
If you add or subtract a constant from each value in a distribution, then• the mean is increased/decreased by that amount• the standard deviation is unchanged• the z-scores are unchanged
If you multiply or divide each value in a distribution by a constant, then• the mean is multiplied/divided by that amount• the standard deviation is multiplied/divided by that amount• the z-scores are unchanged
Messing with Units
ExampleScore Dev’s Sq dev Z-score
Theo 5 -1 1 -1.5
Max 3 -3 9 -.5
Henry 5 -1 1 .5
Leticia 7 1 1 .5
Charlotte 7 1 1 1.0
Pedro 8 2 4 -1.0
Tricia 4 -2 4 1.5
Lulu 9 3 9 -.5
MEAN 6 STDEV 1.94
Adding 1Score Dev’s Sq dev Z-score
Theo 6 -1 1 -1.5
Max 4 -3 9 -.5
Henry 6 -1 1 .5
Leticia 8 1 1 .5
Charlotte 8 1 1 1.0
Pedro 9 2 4 -1.0
Tricia 5 -2 4 1.5
Lulu 10 3 9 -.5
MEAN 7 STDEV 1.94
ExampleScore Dev’s Sq dev Z-score
Theo 5 -1 1 -1.5
Max 3 -3 9 -.5
Henry 5 -1 1 .5
Leticia 7 1 1 .5
Charlotte 7 1 1 1.0
Pedro 8 2 4 -1.0
Tricia 4 -2 4 1.5
Lulu 9 3 9 -.5
MEAN 6 STDEV 1.94
Multiplying by 10Score Dev’s Sq dev Z-score
Theo 50 -10 100 -1.5
Max 30 -30 900 -.5
Henry 50 -10 100 .5
Leticia 70 10 100 .5
Charlotte 70 10 100 1.0
Pedro 80 20 400 -1.0
Tricia 40 -20 400 1.5
Lulu 90 30 900 -.5
MEAN 60 STDEV 19.4
Other Standardized Distributions
The Z distribution is not the only standardized distribution. You can easily create others (it’s just messing with units, really).
Score
Theo 5
Max 3
Henry 5
Leticia 7
Charlotte 7
Pedro 8
Tricia 4
Lulu 9
Average 6
St Dev 1.94
Example:
Let’s change these test scores into ETS type scores (mean 500, stdev 100)
Other Standardized Distributions
Score Z-Score ETS type
score
Theo 3 -1.5 350
Max 5 -.5 450
Henry 7 .5 550
Leticia 7 .5 550
Charlotte 8 1.0 600
Pedro 4 -1.0 400
Tricia 9 1.5 650
Lulu 5 -.5 450
Average 6 0 500
St Dev 1.94 1 100
Here’s How:
Convert to Z scores
Multiply by 100 to increase the st dev
Add 500 to increase the mean
Other Standardized Distributions
Exercise
Score PercentileDeviation Score Z-Score
IQ type score
(Mean 100
Stdev 10)
Theo 20
Max 18
Henry 13
Leticia 17
Charlotte 19
Pedro 16
Tricia 11
Lulu 9