31
Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide, 4,146; cancer, 1,583; heart disease, 1,013; congenital defects, 383; AIDS, 276. Make a bar graph to display these data. What additional information do you need to make a pie chart?

Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Embed Size (px)

Citation preview

Page 1: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Warm-up

The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group were accidents, 12,958; homicide, 5,793; suicide, 4,146; cancer, 1,583; heart disease, 1,013; congenital defects, 383; AIDS, 276.

Make a bar graph to display these data.What additional information do you need to make a pie

chart?

Page 2: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Section 1.2

Describing Distributions with Numbers

Page 3: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Specific Ways to Describe Shape, Center and SpreadCenter:

Mean – ordinary arithmetic average. Pronounced “x-bar.”

1

1 n

ii

X Xn

•Median – the midpoint of the data set. Denoted M.

Σ, pronounced “sigma” means the sum of… In other words, you add up the terms 1 through n.

Page 4: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Bonds vs. Aaron

Barry Bonds Hank Aaron16 40 13 32

19 42 27 44

24 46 26 39

25 49 44 29

25 73 30 44

33 39 38

33 40 47

34 34 34

34 45 40

37 44 20

37 24

Page 5: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Have no fear… Your calculator is here!

You can get all this information from your calculator.

Type your data in L1 and L2. Stat, 1-Var Stats, L1.Do the same thing for L2.

Page 6: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Compare Centers

Find the mean and median of both Bonds’ and Aaron’s home runs.

35.4375

34.9

34

38X

Y

X

Y

M

M

Bonds has a higher average number of home runs, but this average is affected by the extreme value of 73. The median for Aaron is higher than Bonds, indicating that he hit more home runs than Bonds in a typical season.

Page 7: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Resistant and Non-resistant

The mean is affected by extreme observations, such as Bonds’ single season record of 73 home runs. It is a non-resistant measure of center.

The median, however, is resistant to extreme measures. It is preferable when a data set has outliers.

Page 8: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Think About This

Change Bonds’ single season record from 73 home runs to 100 home runs. How is the mean affected? The median?

How do the mean and median compare to each other in a symmetric distribution?

In a (unimodal) skewed right distribution?In a (unimodal) skewed left distribution?

Page 9: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Introduction to Measures of Spread

Today, we’ll learn about quartiles. Oddly enough, they divide a data set into fourths (25% sections).

Finding quartiles is like finding the median. You count midpoints, and average the middle two numbers if there are an even number of data points.

Page 10: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

A Visual Representation of Quartiles

Q1 Lower

Quartile 25th %ile

Q2

Median 50th %ile

Q3 Upper

Quartile 75th %ile

So, there are really only THREE quartiles, and the middle one isn’t usually called a quartile (it’s called the median). We generally refer to Q1, M, and Q3.

25% 25% 25% 25%

Page 11: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

To find Q1, you find the median of the lowest half of data.

To find Q3, you find the median of the higher half of the data.

Page 12: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Try it!

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

Find the Range, Median, Q1, and Q3

Page 13: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Solution

16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73

Median = 34Q1 = 25 Q3 = 41

So, the Range is 73 – 16 = 57. This gives us a little information about the variability of Bonds’ home runs in a season.

The middle 50% of the data lies between 25 and 41, so we see where the spread of the middle half of the data lies.

Page 14: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Interquartile Range and the Outlier RuleIQR is simply Q3 - Q1.In our Barry Bonds example, IQR = 41 –

25 = 16.The IQR is a suitable measure of spread

and is paired with Median.We use the IQR to define what an outlier

is. An outlier is any value (or values) that falls more than 1.5*IQR above the upper quartile or below the lower quartile.

Page 15: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

“Fences”

Think of the 1.5*IQR rule as fences. They draw the boundary line beyond which values are outliers.

Is Barry Bonds’ 73 homer season an outlier???Recall: Q1 = 25; Q3 = 41; IQR = 16So, 1.5*IQR = 1.5*16 = 24.Add 24 to Q3 and Subtract 24 from Q1:Upper boundary = 24 + 41 = 65Lower boundary = 25 – 24 = 1

Conclusion: 73 falls above the outlier boundary of 65, so it is an outlier!!!

Page 16: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

5 Number Summary

The five number summary consists of the lowest value, Q1, the Median, Q3, and the highest value. It is important because we’ll use it to create a new kind of graph: a boxplot (also called a box-and-whiskers plot).

Page 17: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Bonds’ Boxplot

Recall his 5 number summary:

L = 16; Q1 = 25; M = 34; Q3 = 41; H = 73

10 20 30 40 50 60 70

Number of home runs in a season

Page 18: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Modified Boxplots

Modified boxplots show outliers as isolated points. Bonds’ 73 home run season was an outlier, so the whisker in a modified boxplot only extends to the last data point that was NOT an outlier. Any outlier is shown as a star (*).

CAUTION: Many students extend the whisker to the outlier “fence” (i.e. 65) This is WRONG! The whisker should stop at the last actual data point. So tell me – where should the upper whisker end in a modified boxplot of Bonds’ home runs per season???

49

Page 19: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

We can look at these in the calculator as well.

Go to StatPlot.

Page 20: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

It’s Never Too Soon for a Practice AP Question

2005 AP Statistics Problem #1

Page 21: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Question 1 Part a)

Part a) is graded Essentially Correct, Partially Correct, or Incorrect

To receive an Essentially Correct, a student must successfully compare center, shape and spread. Specific numeric values are not required.

To receive a Partially Correct, a student must successfully compare 2 of the 3 measures of center, shape and spread.

All other responses are graded as Incorrect.

Page 22: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Special Notes

Compare means you state which is larger. For example, “the mean of the rural students’ daily caloric intake is greater than the mean for the urban students” is a correct comparison. However, stating “the mean of the rural students’ daily caloric intake is 40.45 while the mean for the urban students is 32.6” is not a COMPARISON.

Page 23: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

In Conclusion

Graders were looking for three comparisons:Center—the mean caloric intake of the rural

students is greater than the mean caloric intake of the urban students

Spread—the spread of the rural students’ distribution is larger than the spread of the urban students

Shape—the rural students’ caloric intakes are roughly symmetric while the urban students’ caloric intakes are skewed right.

Page 24: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

There’s More to Spread than IQR

Section 1.2 Standard Deviation

Page 25: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Describing Data with Numbers

So far, we’ve learned the 5 Number Summary to describe a set of data: Min, Q1, M, Q3, and Max.

We’ve also used the mean as another measure of center.

Page 26: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Measuring Spread: Standard DeviationThe most commonly used measure of

spread is the standard deviation.Standard deviation tells us, on average,

how far the observations are away from the mean.

Page 27: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Standard Deviation and Variance

Variance is the average of the squares of the deviations of the observations from the mean.

WHAT???But your calculator can tell you all of this!

22

1

1xx

ns i

Page 28: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

Properties of Standard Deviation

s2 is called variance. Square root of s2 is __.

s measures spread about the mean and is called standard deviation.

s = 0 only when there is NO SPREAD (in other words, all the data values are the same). As the observations become more spread out about their mean, s gets larger.

s is not resistant to skewness or outliers. WHY?

Page 29: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

RecapMeasures of spread:

IQR, standard deviationMeasures of center:

Median, MeanWhen to use which???

The mean and the std. dev. are not resistant to outliers, so use them only when the distribution is roughly symmetric and there aren’t outliers.

Use the 5 Number Summary when the distribution is strongly skewed or has outliers.

Page 30: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

How the AP Folks Test Your Ability to ReasonHow do the following affect the mean?

The median? The Std. Dev.?Adding a certain amount to every value in

a data setMultiplying each value in a data set by the

same number

Page 31: Warm-up The number of deaths among persons aged 15 to 24 years in the United States in 1997 due to the seven leading causes of death for this age group

HomeworkDay 1: Chapter 1 #40, 41, 45, 50,

52

Day 2: Chapter 1 #63, 91, 94, 96, 101