32
Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Embed Size (px)

Citation preview

Page 1: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Measures of Variability:

“The crowd was scattered all across the park, but a fairly large group was huddled together around the

statue in the middle.”

Page 2: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Why Can’t Everyone Be Like Me?

Have you ever noticed that while many objects are very similar, they are not exactly alike?

•Is every quarter-pounder exactly a quarter-pound?•Why do two pairs of pants the same size fit slightly differently?•How much do adults differ in the amount of sleep they need?

All the above are concerned with variability.

Page 3: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Measure of variability - a value which indicates the degree to which a set of scores is clustered or scattered around a measure of central tendency.

What measures of variability do not do is:

1. specify how far a particular score diverges from the mean

2. provide information about the level of performance of a set of scores

3. describe the shape of a distribution

Page 4: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

We will examine four measures of variability:

1. range

2. interquartile range (and semiinterquartile range)

3. standard deviation

4. index of dispersion

Page 5: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

RangeThe range is the difference between the upper-exact limit of the highest score and lower-exact limit of the lowest score.

In the data below, 28 is the highest score and 12 is the lowest. Therefore, the range is 28.5-11.5 = 17.

Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40

Page 6: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Advantages of the Range

The range:

– is easy to calculate

– is easily understood by general audiences

– can provide a very quick and dirty idea of dispersion

Page 7: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Disadvantages of the Range

The range:

- does not tell us about the scores between the end points

range = 20 range = 20

10 30 10 30

Page 8: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

- a single extreme score can grossly distort the degree of variability

- in general, the larger the sample size, the larger the range

- the range is a terminal statistic

10 30

Page 9: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Interquartile RangeThe interquartile range is the difference between the 1st and 3rd quartiles.

25%25%25%25%

Q1 Q2 Q3

You can see from the diagram that Q2 is actually the median.

Page 10: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Another way to think of it is that Q2 is the same as the centile with a rank of 50 (i.e., the score below which there are 50% of the cases).

In the same way, Q1 and Q3 are the centiles with ranks of 25 and 75, respectfully. That is, they are the scores below which there are 25% and below which there are 75% of the cases.Once the centiles are calculated, you simply calculate the difference:

Interquartile range = Q3 - Q1

Page 11: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Score f cum f27 - 29 1 4024 - 26 5 3921 - 23 5 3418 - 20 14 2915 - 17 12 1512 - 14 3 3 n = 40

Consider the following data:

40 x .75 = 30; [(1/5) x 3] + 20.5

Q3 (C75) = 21.10

40 x .5 = 20; [(5/14) x 3] + 17.5

Q2 (C50) = 18.57 (Unnecessary for calculation

of interquartile range)

40 x .25 = 10; [(7/12) x 3] + 14.5

Q1 (C25) = 16.25

Interquartile range = Q3 - Q1

= 21.10 - 16.25 = 4.85

Page 12: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Advantages of the Interquartile Range

– is not sensitive to extreme scores

– is the only reasonable measure of variability with open-ended distributions

– should be used with highly skewed distributions

The interquartile range:

Q1 Q2 Q3

25%25%25% 25%

Page 13: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Disvantages of the Interquartile Range

– is a terminal statistic

– is unfamiliar to most people

The interquartile range:

Page 14: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

A related measure is the semiinterquartile range.

It is half the distance between the first and third quartiles:

Q3 - Q1

2Semiinterquartile range =

Page 15: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

A Short Tangent

Below are several people standing near a tree.

10ft. 7ft. 0ft. 6ft. 9ft.

If we wanted to find out, on average, how far the people were from the tree, we could simply add the distances and divide by the number of people: 10 + 7 + 0 + 6 + 9 5 = 6.4ft.

Page 16: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Standard DeviationNow consider the following data:

Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40

X = 18.85

Page 17: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

You can see that some scores are closer to the mean than are others.

Score f Score f 28 1 19 2 27 0 18 5 26 1 17 7 25 2 16 2 24 2 15 5 23 2 14 0 22 1 13 2 21 2 12 1 20 5 n = 40

X = 18.85

We can determine the distance a score is from the mean by calculating a deviation score which indicates how far a score is above or below the mean.

Page 18: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Deviation Score: A Brief Review

x = X - X tells us the position of X relative to X.

For example, a score of 24 would have a deviation score of 5.15: x = 24 - 18.85 = 5.15. That is, it is 5.15 points above the mean.

A score of 16, in contrast, would have a deviation score of -2.85: x = 16 - 18.85 = -2.85. That is, it is 2.85 points below the mean.

16 18.85 24

-2.85

5.15

Page 19: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

xx x x x xx x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x

18.85

Using deviation scores, we could find out how far away each score is from the mean.

x = X-X -5.82 3.71 -2.62 2.75 . . . 2.64 -2.83 -2.02 2.33

If we wanted to find the average of those distances, we could add them all and divide by the number of scores. Unfortunately, since the mean is the balance point, x = 0.

Page 20: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

What we can do, however, is take the absolute value of each deviation score and find the mean of them:

x = |X-X| 5.82 3.71 2.62 2.75 . . . 2.64 2.83 2.02 2.33

= 142.56

142.56 40 = 3.56Mean distance =

Page 21: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Standard Deviation

…the “average” distance a set of scores is from the mean.

DO NOT FORGET THIS!

Page 22: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Well, Not Exactly…

nX - X)

S = 2

The definition just given, while an excellent way of understanding and interpreting “standard deviation,” is not technically correct (but it is a mean):

Calculation formula:

X2 -(X)2

nS = √ n

x2

n=√X nX =

(just a reminder)

Page 23: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Advantages of the Standard Deviation

– is quite resistant to sampling variability

– is mathematically tractable

The standard deviation:

Page 24: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Disadvantages of the Standard Deviation

– is not a good index of variability with a few very extreme scores

– should not be used with highly skewed distributions

– cannot be used with open-ended distributions

The standard deviation:

Page 25: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Coefficient of VariationConsider the following:

X1 = 9.00, S1 = 3.00

X2 = 90.00, S2 = 3.00

Note the dispersion of S1 around X1 appears considerably greater than that of S2 around X2.

Page 26: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Coefficient of VariationIf two means are very different, we may consider a relative measure of dispersion:

CV = 100 SX( )

In our example:

CV1 = 100

CV2 = 100

3.009.00 = 33.33( )

3.0090.00 = 3.33( )

The larger CV, the larger the dispersion relative to the mean.

Page 27: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Coefficient of Variation

The coefficient of variation is also useful when comparing the standard deviations of two variables with different units of measure (e.g., SAT scores vs. age).

Page 28: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Index of Dispersion (D)

• When you have a qualitative variable, the index of dispersion is available as a measure of variability.

• It is defined as the ratio between distinguishable pairs (DP) and the number of distinguishable pairs under the condition of maximum dispersion (DPmax):

D = DPDPmax

Page 29: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

a1

a2

b1 b2

b3 b4

Category A Category B

Political Affiliation

Consider the following data of a survey asking individuals their political affiliation:

Eight pairs of observations can be distinguished:a1b1 a1b2 a1b3 a1b4

a2b1 a2b2 a2b3 a2b4

Cannot distinguish between this pair

(b2b4)

Can distinguish between this pair

(a2b3)

Page 30: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Nine pairs of observations can be distinguished under the condition of maximum dispersion:

a1b1 a1b2 a1b3

a2b1 a2b2 a2b3

a3b1 a3b2 a3b3

a1

a2

b1b2

b3

a2

Category A Category B

Political Affiliation

The diagram below illustrates the “condition of maximum dispersion” (i.e., if the observations were equally spread across the available categories):

Page 31: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

D = DPDPmax

= = .8989

• D can range between 0-1.

• “0” if all observations are in one category and none in any others

• “1” if all observations are equally divided between categories

• Should interpret D as the percent of Dpmax

• Useful when comparing two distributions of equal number of categories

Index of Dispersion

Page 32: Measures of Variability: “The crowd was scattered all across the park, but a fairly large group was huddled together around the statue in the middle.”

Computational Formula for D

( n2j )

c

j=1

n2 (c-1) where:

n = number of observationsc = number of categoriesnj = number of observations in category j

c n2 -D =