Upload
christiana-sutton
View
215
Download
2
Embed Size (px)
Citation preview
Measures of Variability:
“The crowd was scattered all across the park, but a fairly large group was huddled together around the
statue in the middle.”
Why Can’t Everyone Be Like Me?
Have you ever noticed that while many objects are very similar, they are not exactly alike?
•Is every quarter-pounder exactly a quarter-pound?•Why do two pairs of pants the same size fit slightly differently?•How much do adults differ in the amount of sleep they need?
All the above are concerned with variability.
Measure of variability - a value which indicates the degree to which a set of scores is clustered or scattered around a measure of central tendency.
What measures of variability do not do is:
1. specify how far a particular score diverges from the mean
2. provide information about the level of performance of a set of scores
3. describe the shape of a distribution
We will examine four measures of variability:
1. range
2. interquartile range (and semiinterquartile range)
3. standard deviation
4. index of dispersion
RangeThe range is the difference between the upper-exact limit of the highest score and lower-exact limit of the lowest score.
In the data below, 28 is the highest score and 12 is the lowest. Therefore, the range is 28.5-11.5 = 17.
Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40
Advantages of the Range
The range:
– is easy to calculate
– is easily understood by general audiences
– can provide a very quick and dirty idea of dispersion
Disadvantages of the Range
The range:
- does not tell us about the scores between the end points
range = 20 range = 20
10 30 10 30
- a single extreme score can grossly distort the degree of variability
- in general, the larger the sample size, the larger the range
- the range is a terminal statistic
10 30
Interquartile RangeThe interquartile range is the difference between the 1st and 3rd quartiles.
25%25%25%25%
Q1 Q2 Q3
You can see from the diagram that Q2 is actually the median.
Another way to think of it is that Q2 is the same as the centile with a rank of 50 (i.e., the score below which there are 50% of the cases).
In the same way, Q1 and Q3 are the centiles with ranks of 25 and 75, respectfully. That is, they are the scores below which there are 25% and below which there are 75% of the cases.Once the centiles are calculated, you simply calculate the difference:
Interquartile range = Q3 - Q1
Score f cum f27 - 29 1 4024 - 26 5 3921 - 23 5 3418 - 20 14 2915 - 17 12 1512 - 14 3 3 n = 40
Consider the following data:
40 x .75 = 30; [(1/5) x 3] + 20.5
Q3 (C75) = 21.10
40 x .5 = 20; [(5/14) x 3] + 17.5
Q2 (C50) = 18.57 (Unnecessary for calculation
of interquartile range)
40 x .25 = 10; [(7/12) x 3] + 14.5
Q1 (C25) = 16.25
Interquartile range = Q3 - Q1
= 21.10 - 16.25 = 4.85
Advantages of the Interquartile Range
– is not sensitive to extreme scores
– is the only reasonable measure of variability with open-ended distributions
– should be used with highly skewed distributions
The interquartile range:
Q1 Q2 Q3
25%25%25% 25%
Disvantages of the Interquartile Range
– is a terminal statistic
– is unfamiliar to most people
The interquartile range:
A related measure is the semiinterquartile range.
It is half the distance between the first and third quartiles:
Q3 - Q1
2Semiinterquartile range =
A Short Tangent
Below are several people standing near a tree.
10ft. 7ft. 0ft. 6ft. 9ft.
If we wanted to find out, on average, how far the people were from the tree, we could simply add the distances and divide by the number of people: 10 + 7 + 0 + 6 + 9 5 = 6.4ft.
Standard DeviationNow consider the following data:
Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40
X = 18.85
You can see that some scores are closer to the mean than are others.
Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40
X = 18.85
We can determine the distance a score is from the mean by calculating a deviation score which indicates how far a score is above or below the mean.
Deviation Score: A Brief Review
x = X - X tells us the position of X relative to X.
For example, a score of 24 would have a deviation score of 5.15: x = 24 - 18.85 = 5.15. That is, it is 5.15 points above the mean.
A score of 16, in contrast, would have a deviation score of -2.85: x = 16 - 18.85 = -2.85. That is, it is 2.85 points below the mean.
16 18.85 24
-2.85
5.15
xx x x x xx x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x
18.85
Using deviation scores, we could find out how far away each score is from the mean.
x = X-X -5.82 3.71 -2.62 2.75 . . . 2.64 -2.83 -2.02 2.33
If we wanted to find the average of those distances, we could add them all and divide by the number of scores. Unfortunately, since the mean is the balance point, x = 0.
What we can do, however, is take the absolute value of each deviation score and find the mean of them:
x = |X-X| 5.82 3.71 2.62 2.75 . . . 2.64 2.83 2.02 2.33
= 142.56
142.56 40 = 3.56Mean distance =
Standard Deviation
…the “average” distance a set of scores is from the mean.
DO NOT FORGET THIS!
Well, Not Exactly…
nX - X)
S = 2
√
The definition just given, while an excellent way of understanding and interpreting “standard deviation,” is not technically correct (but it is a mean):
Calculation formula:
X2 -(X)2
nS = √ n
x2
n=√X nX =
(just a reminder)
Advantages of the Standard Deviation
– is quite resistant to sampling variability
– is mathematically tractable
The standard deviation:
Disadvantages of the Standard Deviation
– is not a good index of variability with a few very extreme scores
– should not be used with highly skewed distributions
– cannot be used with open-ended distributions
The standard deviation:
Coefficient of VariationConsider the following:
X1 = 9.00, S1 = 3.00
X2 = 90.00, S2 = 3.00
Note the dispersion of S1 around X1 appears considerably greater than that of S2 around X2.
Coefficient of VariationIf two means are very different, we may consider a relative measure of dispersion:
CV = 100 SX( )
In our example:
CV1 = 100
CV2 = 100
3.009.00 = 33.33( )
3.0090.00 = 3.33( )
The larger CV, the larger the dispersion relative to the mean.
Coefficient of Variation
The coefficient of variation is also useful when comparing the standard deviations of two variables with different units of measure (e.g., SAT scores vs. age).
Index of Dispersion (D)
• When you have a qualitative variable, the index of dispersion is available as a measure of variability.
• It is defined as the ratio between distinguishable pairs (DP) and the number of distinguishable pairs under the condition of maximum dispersion (DPmax):
D = DPDPmax
a1
a2
b1 b2
b3 b4
Category A Category B
Political Affiliation
Consider the following data of a survey asking individuals their political affiliation:
Eight pairs of observations can be distinguished:a1b1 a1b2 a1b3 a1b4
a2b1 a2b2 a2b3 a2b4
Cannot distinguish between this pair
(b2b4)
Can distinguish between this pair
(a2b3)
Nine pairs of observations can be distinguished under the condition of maximum dispersion:
a1b1 a1b2 a1b3
a2b1 a2b2 a2b3
a3b1 a3b2 a3b3
a1
a2
b1b2
b3
a2
Category A Category B
Political Affiliation
The diagram below illustrates the “condition of maximum dispersion” (i.e., if the observations were equally spread across the available categories):
D = DPDPmax
= = .8989
• D can range between 0-1.
• “0” if all observations are in one category and none in any others
• “1” if all observations are equally divided between categories
• Should interpret D as the percent of Dpmax
• Useful when comparing two distributions of equal number of categories
Index of Dispersion
Computational Formula for D
( n2j )
c
j=1
n2 (c-1) where:
n = number of observationsc = number of categoriesnj = number of observations in category j
c n2 -D =