34
Describing Quantitative Data with Numbers

Describing quantitative data with numbers

Embed Size (px)

Citation preview

Page 1: Describing quantitative data with numbers

DescribingQuantitative Data

with Numbers

Page 2: Describing quantitative data with numbers

Summarizing distributions of univariate data

1. Measuring center: median, mean2. Measuring spread: range, interquartile

range, standard deviation3. Measuring position: quartiles, percentiles,

standardized scores (z-scores)4. Using boxplots5. The effect of changing units on summary

measures

Page 3: Describing quantitative data with numbers

Measuring Center

When describing the “center” of a set of data, we can use the mean or the median.

Mean: “Average” value

Median: “Center” value (Q2)

Page 4: Describing quantitative data with numbers

Where is the Center of the Distribution?

If you had to pick a single number to describe all the data what would you pick?

It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle.

On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode.

Page 5: Describing quantitative data with numbers

Mean

To find the mean of a set of observations, add their values and divide by the number of observations.

x xin

Page 6: Describing quantitative data with numbers

Find the mean of:

2 3 4 6 8 12

61286432

833.5x

Page 7: Describing quantitative data with numbers

Although the mean is the most popular measure of center, it is not always the most appropriate.

The mean is very sensitive to extreme observations (outliers).

Because outliers affect the mean, we say that the mean is NOT a resistant measure of center.

So if the mean is not a resistant measure of center, what is? Median

Page 8: Describing quantitative data with numbers

MedianThe median is the value with exactly half the data values below it and half above it.

It is the middle data value once the data values have been ordered) that divides the histogram into two equal areas

It has the same units as the data

The median is not influenced by extreme observations, so we say that the median is a resistant measure of center.

Page 9: Describing quantitative data with numbers

Finding the Median

First sort the values (arrange them in order), then follow one of these:

1. If the number of data values is even, the median is found by computing the mean of the two middle numbers.

2. If the number of data values is odd, the median is the number located in the exact middle of the list.

Page 10: Describing quantitative data with numbers

5.40 1.10 0.42 0.73 0.48 1.10

0.42 0.48 0.73 1.10 1.10 5.40

(in order - even number of values – no exact middle shared by two numbers)

0.73 + 1.1 MEDIAN is 0.915 2

5.40 1.10 0.42 0.73 0.48 1.10 0.66 0.42 0.48 0.66 0.73 1.10 1.10 5.40

(in order - odd number of values)

exact middle MEDIAN is 0.73

Page 11: Describing quantitative data with numbers

Mean vs Median

Mean MedianAverage value of variable Typical value of variable

Not resistant to outliers Resistant to outliers

A good measure when the data is symmetric

A reliable measure regardless of the shape of the distribution

Farther out in the long tail than the median when data is skewed

Close to the center even when the data is skewed

Easy to find Less prone to mistakes

Page 12: Describing quantitative data with numbers

Check For Understanding

Page 13: Describing quantitative data with numbers

Check For Understanding

Page 14: Describing quantitative data with numbers

Measuring Spread

Range

Interquartile Range (IQR)

Standard Deviation

Page 15: Describing quantitative data with numbers

Range

Distance between largest and smallest values.

Range = Maximum – Minimum

Range is useful if there are no outliers.

Page 16: Describing quantitative data with numbers

Interquartile RangeHow to find the IQR: 1. Find median 2. Find the median of both halves of data

the lower median is 1st Quartilethe upper median is 3rd Quartile

3. Subtract the two quartile scores

Page 17: Describing quantitative data with numbers

Outliers

One general rule of thumb for identifying outliers is finding any data points that lie:

Lower than 1.5 * IQR below Q1OR

Higher than 1.5 * IQR above Q3

Page 18: Describing quantitative data with numbers

Check For Understanding

• The “Descriptive Statistics” of test grades for a certain class are listed below.

Mean = 74.71Median = 76Standard Deviation = 12.61Minimum = 35Maximum = 94Q1 = 68Q3 = 84• (a) Determine the IQR for this data. • (b) Using the answer from part (a), determine whether

the lowest and highest values in the data are outliers.

Page 19: Describing quantitative data with numbers

Standard Deviation

A standard deviation is a measure of the average deviation from the mean.

sx 1

n 1(xi x)

2

Page 20: Describing quantitative data with numbers

If the data is uniform or symmetric use:

If the data is skewed, use:

MeanCenter:

Spread:standard deviation

MedianCenter:

Spread:Five-number summary, Range, IQR

Page 21: Describing quantitative data with numbers

Distributions with Outliers

Since outliers affect mean and standard deviation, it is usually better to use median and IQR

However, if the distribution is unimodal—use mean and median and just report outliers separately

However, if you find a simple reason for outlier, eliminate it and use mean and standard devation—if symmetric

Page 22: Describing quantitative data with numbers

Measuring Position

Quartiles

Percentiles

Z-scores

• We can either use z-Scores or percentiles to declare the location of an observation in a distribution.

• z-Scores use the mean and standard deviation.

• Percentiles use a position relative to the starting point.

Page 23: Describing quantitative data with numbers

Percentiles/Quartiles

• is the notation for

the kth percentile

• is the notation for the nth quartile

P Q25 1P Q50 2 median

P Q75 3

Page 24: Describing quantitative data with numbers

Finding PercentilesIf you are trying to find the percentile

corresponding to a certain score x:

number of scores < 100

total number of scores

xPercentile

• Percentiles are used often when reporting academic scores such as SAT scores. Let’s say you get a 620 on the math portion of the SAT. It might also indicate that you are in the “78th percentile”. That means that you scored better than 78% of all students taking that particular SAT.

Page 25: Describing quantitative data with numbers

Measuring Relative Standing With Standardized Values (z-Scores)

• One way to compare an individual to the whole distribution is to describe it’s location in the distribution relative to the mean.

• Let’s do this by describing how many standard deviations an individual is away from the mean value.

• We call this the “standardized value,” or, the “z-

Score.”

Page 26: Describing quantitative data with numbers

Here is how to interpret z-scores:

A z-score less than 0 represents an element less than the mean.

A z-score greater than 0 represents an element greater than the mean.

A z-score equal to 0 represents an element equal to the mean.

A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.

A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.

Page 27: Describing quantitative data with numbers

Five-Number Summary

The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.

Minimum Q1 Median Q3 Maximum

Page 28: Describing quantitative data with numbers

Boxplots

The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot.

Page 29: Describing quantitative data with numbers

How to make a boxplot:

1. Draw and label a number line that includes the range of the distribution.

2. Draw a central box from Q1 to Q3.3. Note the median M inside the box.4. Extend lines (whiskers) from the box out to

the minimum and maximum values that are not outliers.

Page 30: Describing quantitative data with numbers

Comparing Boxplots

Page 31: Describing quantitative data with numbers

Check For Understanding

Page 32: Describing quantitative data with numbers

Effect of Changing Units

If you add a constant to every value, the mean and median increase by the same constant.

Example:Suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 = 16.

If you multiply every value by a constant. Then, the mean and the median will also be multiplied by that constant.

Example:Assume that a set of scores has a mean of 5 and a median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and the new median will be 6 * 10 = 60.

Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of central tendency are affected when we change units:

Page 33: Describing quantitative data with numbers

Check For Understanding

The average score on a test is 150 with a standard deviation of 15. Each score is then increased by 25. What are the new mean and standard deviation?

Page 34: Describing quantitative data with numbers

Check For UnderstandingThe test grades from a college statistics class are shown below.

85 72 64 65 98 78 75 76 82 80 61 92 72 58 65 74 92 85 74 76 77 77 62 68 68 54 62 76 73 85 88 91 99 82 80 74 76 77 70 60

(a) Construct two different graphs of these data(b) Calculate the five-number summary and the mean and standard deviation of the data.(c) Describe the distribution of the data, citing both the

plots and the summary statistics found in questions (a) and (b).