26
Measures of Dispersion

Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Measures of Dispersion

Page 2: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

There are many, many model probability distributions

Here’s a link to a map of 50+ probability distributions, showing how they all relate!

Page 3: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

● The center is the mean, median, or mode.

● The spread is the variability of the data:

● Shape can be described by symmetry, skewness, number of peaks (modes), etc.

Image source

Features of Probability Distributions

Page 4: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Descriptive Statistics● Measures of central tendency

○ Mean, median, mode

● Measures of spread○ Variance, standard deviation

○ Range, max, min, quartiles

● Measures of shape○ Skew, modalities, etc.

Page 5: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Spread

Page 6: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

● Measurements vary for one of two reasons:

○ Systematic bias: underlying problems with how the data were collected,

leading to inaccuracies (e.g., thermometer off by a degree)

○ Randomness: natural fluctuations in measurements (aka “noise”)

● You can and will get different results taking the same measurements over

and over again, as a result of the randomness.

Measurements Vary

Page 7: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

● Measures of central tendency tell us something about the center of a

probability distribution or a histogram.

● But they don’t tell us anything about spread.

Spread

Data Source

Page 8: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

One Measure of Spread● Data: Change in the valuation of

different financial sectors.

● Let’s think up a metric we can use to

measure the spread of a distribution.

● A starting point: what is the average

deviation of the values from the mean? ○ [(1.66 - 0.58) + (1.08 - 0.58) + (0.89 - 0.58) +

… + (0.28 - 0.58) + (0.13 - 0.58) + (-0.28 -

0.58)] / 10 = 0.

● What is the problem with this measure? Data Source

Page 9: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

● The problem is that we are get positive and negative deviations, and they

end up cancelling each other out, so the average deviation is 0!

● How can we make all deviations positive?

○ We can take absolute values, or

○ We can square everything!

● Our measure of spread will be the average squared deviation of values from

the mean. This is called the sample variance.

One Measure of Spread

Page 10: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

● The total sum of the squared deviations:○ (1.66 - 0.58)2 + (1.08 - 0.58)2 + (0.89 - 0.58)2 + (0.69 - 0.58)2 + (0.52 - 0.58)2 + (0.47 - 0.58)2

+ (0.36 - 0.58)2 + (0.28 - 0.58)2 + (0.13 - 0.58)2 + (-0.28 - 0.58)2 ∽ 2.62

● To find the average, we divide by 10.

● So, the sample mean is ∽ 0.58.

● And the sample variance is ∽ 0.262.

● Why such a big spread? Well, because we squared the deviations!

● Arguably of more interest, since it is of the same magnitude as the values, is

the square root of the sample variance: i.e., the sample standard deviation.

Calculating Sample Variance

Page 11: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Interquartile Range (IQR)

Page 12: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Maximum, minimum, and range● The maximum is an outcome that is greater than or equal to all others in our sample

● The minimum is an outcome that is less than or equal to all others in our sample

● The range is the difference between the maximum and the minimum

● The maximum and minimum are sensitive to outliers

○ If an outcome is added to sample that is less than the minimum, then the minimum

changes

○ If an outcome is added to our sample that is greater than the maximum, then the

maximum changes

● To determine if the maximum and the minimum in our data set are indeed outliers we

can use a rule of thumb called the interquartile range rule

Page 13: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Quartiles ● Quartiles within an ordered set of data are three points that divide the dataset into

four equal groups, each containing a quarter of the data

● The first quartile (Q1) is the midpoint between the minimum and the median

● The second quartile (Q2) is the median

● The third quartile (Q3) is the midpoint between the median and the maximum

● The interquartile range (IQR) is the difference between Q3 and Q1 (IQR = Q3 - Q1)

Image source

Page 14: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Computing quartiles ● Find the median, and use it to divide the dataset into two halves

○ If there are an odd number of data points in the dataset do not include the median in either half

○ If there are even number of data points in the dataset, split it exactly in half

● The lower quartile value is the median of the lower half of the data

● The upper quartile value is the median of the upper half of the data

Page 15: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Example of computing quartiles ● Computing quartiles when the sample size is odd

● Ordered sample: 3, 4, 4, 5, 6, 8, 8

First Quartile (Q1) 4

Second Quartile (Q2) (Median) 5

Third Quartile (Q3) 8

Interquartile Range (IQR = Q3 - Q1) 4

Page 16: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Another example of computing quartiles● Computing quartiles when the sample size is even

● Ordered sample: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8

First Quartile (Q1) 3

Second Quartile (Q2) (Median) 5.5

Third Quartile (Q3) 7

Interquartile Range (IQR = Q3 - Q1) 4

Page 17: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Box and whisker plots: for visualizing quartiles

● Depict minimum, first quartile, median, third quartile

and maximum

● The upper whisker (from the maximum to the third

quartile) represents the upper 25% of the distribution

● The interquartile range (IQR) represents the middle

50% of the data

● The lower whisker (from the first quartile to the

minimum) represents the lower 25% of the distribution

Image source

Page 18: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Do pets relieve stress?● Does someone experience different level of stress when

doing tasks with a pet, a good friend, or alone?

● Allen et al. had 45 people count backwards by 13s and 17s

● The people were randomly assigned to 3 different groups:

pet (P), friend (F), and alone (C, for control)

● The dependent variable measured was the subject’s average

heart rate during the task

Page 19: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Study Results● The task was most stressful

around friends and least

stressful around pets

● We are comparing levels of a

quantitative variable (heart

rate) across levels of a

categorical (qualitative)

variable (treatment)

Image source

Page 20: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Side-by-side box and whisker plots (double the fun!)

● Side-by-side box and whisker plots are a

method for visualizing data when one

variable is categorical (qualitative) and

the other is quantitative

● They can be used to compare the

distributions associated with

quantitative variable across the levels of

a categorical variable

● In this plot, the stars are outliers

Image source

Page 21: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Interquartile range rule for detecting outliers

1. Calculate the interquartile range (Q3-Q1).

2. Multiply the interquartile range (IQR) by 1.5.

3. Add 1.5*IQR to the third quartile. This value is called the upper fence.

Values greater than this are suspected outliers.

4. Subtract 1.5*IQR from the first quartile. This value is called the lower fence.

Values less than this are suspected outliers.

Image source

Page 24: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Summary

Page 25: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Descriptive statistics don’t tell the whole story

Symmetric, unimodal Skewed right Skewed left

Bimodal Multimodal SymmetricImage Source

Page 26: Measures of Dispersion - pdfs.semanticscholar.org€¦ · Measures of Dispersion. There are many, many model probability distributions Here’s a link to a map of 50+ probability

Summary● We can summarize data sets by the features of their frequency

distributions, such as their center, dispersion, shape, etc.○ Mean, median, and mode are three measures of central tendency

○ Variance, standard deviation, and IQR are three measures of dispersion

● Descriptive statistics can be more informative than raw data, but

they do not tell all; so they can also be misleading