33
Measures of Variability • In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values cluster around the center. • This is known as variability.

Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Embed Size (px)

Citation preview

Page 1: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Measures of Variability

• In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values cluster around the center.

• This is known as variability.

Page 2: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Range• There are various measures of variability,

the most straightforward being the range of the sample:

Highest value minus lowest value

• While range provides a good first pass at variance, it is not the best measure because of its sensitivity to extreme scores

Page 3: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

• While range provides a good first pass at variance, it is not the best measure because:• It is calculated from only 2 points of data• Those two values are the most extreme in the

sample (obviously sensitive to outliers)• Can change dramatically from sample to

sample

Page 4: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Interquartile Range

• Range based on percentiles. Data can be ordered and then, much like we did for the median as the 50th percentile also known as the second quartile or Q2, we now look for numbers corresponding to the 25th and 75th percentile (Q1 and Q3). Q3- Q1 gives us the interquartile range.

• Note that by only concerning oneself this middle 50% of scores, extreme scores will not affect this measure of variability. On the other hand, we also lose half our data in its calculation.

Page 5: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

IQR and SIQR

• The semi-interquartile range is simply half the IQR– Represents the average spread of those scores falling

in the quartile above and below the median

• If we had a scale whose median was 20 and SIQR of 5, we can say that the typical deviation of scores about the median does not extend more than 5 points above or below.

• However: what if our data is skewed?

Page 6: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

IQR: Graphical Representation• Box and whisker plots (Tukey) are

graphical representations of IQR– “Hinges” mark the interquartile range– “Whiskers” encompassing 99% of the data– Any outliers designated in some fashion

Page 7: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

The Average Deviation

• Another approach to estimating variance is to directly measure the degree to which individual data points differ from the mean and then average those deviations.

• That is: ( )X X

N

Page 8: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

The Average Deviation (cont.)• However, if we try to do this with real data,

the result will always be zero:

Example: (2,3,3,4,4,6,6,12)

( ) ( 3, 2, 2, 1, 1,1,1,7) 00

8 8

X X

N

405

8

XX

N

Page 9: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Average Deviation

• One way to get around the problem with the average deviation is to use the absolute value of the differences, instead of the differences themselves.

• The absolute value of some number is just the number without any sign:

For Example: |-3| = 3

Page 10: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Average Deviation

• Thus, we could re-write and solve our average deviation question as follows:

• The data set in question has a mean of 5 and a mean absolute deviation of 2.25.

| | (3, 2,2,1,1,1,1,7) 182.25

8 8

X X

N

Page 11: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

The Variance

• Although the MAD is an acceptable measure of variability, the most commonly used measure is variance (denoted s2 for a sample and 2 for a population) and its square root termed the standard deviation (denoted s for a sample and for a population).

Page 12: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

The Variance (cont.)• The computation of variance is also based

on the basic notion of the average deviation however, instead of getting around the “zero problem” by using absolute deviations (as in MAD), the “zero problem” is eliminating by squaring the differences from the mean.

Specifically:2

2

1

( )

1

ni

xi

X Xs

N

Page 13: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Example: (2,3,3,4,4,6,6,12)

(9 4 4 1 1 1 1 49) 7010

7 7

405

8

XX

N

22 ( )x

X Xs

N

Page 14: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

• Variance is not too meaningful on its own. What we’d like is something that’s on the same scale as the original variable.

• Standard deviation is just the square root of the variance, and gets our measure of variability back to the original scale units.

2 10 3.16s s

Page 15: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

• An equivalent formula that is easier to work with when calculating variances by hand is:

• Although this second formula may look more intimidating, a few examples will show you that it is actually easier to work with.

22

2

( )

1x

XX

NsN

Page 16: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Relation of range to s

• If the variable is normally distributed, a rule of thumb is that:

s = R/6

• This will be more clear when we talk about the normal distribution and its properties, but the idea is that roughly 99% of the data falls between 3 SD above and below the mean.

Page 17: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Comparing variabilityThe coefficient of variation allows us to compare variability on measures of different scales.

Often we want something that has enough variability to accurately reflect the nature of the underlying variable. Suppose someone came up with a measure of say, motivation, that was measured with questionnaire items with Likert scales ranged from 1-4. Someone else comes up with a better one that utilizes a 7 point scale. We could use the CV to compare their relative spread.

sCV

X

Page 18: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Visualizing Means and Standard Deviations

This demonstration allows you to play with the mean and standard deviation of a distribution. Note that changing the mean of the distribution simply moves the entire distribution to the left or right without changing its shape. In contrast, changing the standard deviation alters the spread of the data but does not affect where the distribution is “centered”

Run demo

Page 19: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Your turn

• Find the mean, range and standard deviation of the following scores:

• 3 1 7 9 8 8 5 7 4 22

2

( )

1x

XX

NsN

Page 20: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Your turn

• Mean = 5.78

• Range = 8

• s = 2.68

Page 21: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters

• The mean (X) and variance (s2) are the descriptive statistics that are most commonly used to represent the data points of some sample.

• The real reason that they are the preferred measures of central tendency and variance is because of certain properties they have as estimators of their corresponding population parameters; and 2.

Page 22: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters (cont.)

• Four properties are considered desirable in a population estimator; sufficiency, unbiasedness, efficiency, & resistance.

• Both the mean and the variance are the best estimators in their class in terms of the first three of these four properties.

• To understand these properties, you first need to understand a concept in statistics called the sampling distribution

Page 23: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values
Page 24: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Sampling Distribution Demo•We will discuss sampling distributions off and on throughout the course, and I only want to touch on the notion now.

•Basically, the idea is this – in order to exam the properties of a statistic we often want to take repeated samples from some population of data and calculate the relevant statistic on each sample. We can then look at the distribution of the statistic across these samples and ask a variety of questions about it.

•Check out this demonstration

Page 25: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Properties of a Statistic

• 1) Sufficiency

A sufficient statistic is one that makes use of all of the information in the sample to estimate its corresponding parameter. For example, this property makes the mean more attractive as a measure of central tendency compared to the mode or median.

Page 26: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters

• 2) Unbiasedness

A statistic is said to be an unbiased estimator if its expected value (i.e., the mean of a number of sample means) is equal to the population parameter it is estimating.

Explanation of N-1 in s2 formula.

Page 27: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Assessing the Bias of an Estimator

• Using the procedure, the mean can be shown to be an unbiased estimator.

• However, if the more intuitive formula for s2 is used:

• it turns out to underestimate 2

sx2 =(X – X )2

N

Page 28: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Assessing the Bias of an Estimator (cont.)

• This bias to underestimate is caused by the act of sampling and it can be shown that this bias can be eliminated if N-1 is used in the denominator instead of N.

• Note that this is only true when calculating s2, if you have a measurable population and you want to calculate 2, you use N in the denominator, not N-1.

22

1

( )

1

ni

xi

X Xs

N

Page 29: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Degrees of Freedom

• The mean of 6, 8, & 10 = 8.

• If I allow you to change as many of these numbers as you want BUT the mean must stay 8, how many of the numbers are you free to vary?

Page 30: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Degrees of Freedom• The point of this exercise is that when the

mean is fixed, it removes a degree of freedom from your sample -- this is like actually subtracting 1 from the number of observations in your sample.

• It is for exactly this reason that we use N-1 in the denominator when we calculate s2 (i.e., the calculation requires that the mean be fixed first which effectively removes -- fixes -- one of the data points).

Page 31: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters

• 3) Efficiency

The efficiency of a statistic is reflected in the variance that is observed when one examines the statistic over a bunch of independently chosen samples. The smaller the variance, the more efficient the statistic is said to be.

Page 32: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters

• 4) Resistance

The resistance of an estimator refers to the degree to which that estimate is effected by extreme values.

As mentioned previously, both X and s2 are highly sensitive to extreme values.

Page 33: Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values

Estimating Population Parameters

• 4) Resistance

Despite this, they are still the most commonly used estimates of the corresponding population parameters, mostly because of their superiority over other measures in terms sufficiency, unbiasedness, & efficiency.