22
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE

Chapter 1: Exploring Data - Miss Sadowski's Math Page · Chapter 1 Exploring Data ... The table and dotplot below displays the Environmental ... second dotplot shows the number of

  • Upload
    vunhan

  • View
    238

  • Download
    0

Embed Size (px)

Citation preview

+

Chapter 1: Exploring Data Section 1.2

Displaying Quantitative Data with Graphs

The Practice of Statistics, 4th edition - For AP*

STARNES, YATES, MOORE

+ Chapter 1

Exploring Data

Introduction: Data Analysis: Making Sense of Data

1.1 Analyzing Categorical Data

1.2 Displaying Quantitative Data with Graphs

1.3 Describing Quantitative Data with Numbers

+ Section 1.2 Displaying Quantitative Data with Graphs

After this section, you should be able to…

CONSTRUCT and INTERPRET dotplots, stemplots, and histograms

DESCRIBE the shape of a distribution

COMPARE distributions

USE histograms wisely

Learning Objectives

+

1)Draw a horizontal axis (a number line) and label it with the

variable name.

2)Scale the axis from the minimum to the maximum value.

3)Mark a dot above the location on the horizontal axis

corresponding to each data value.

Dis

pla

yin

g Q

uantita

tive D

ata

Dotplots

One of the simplest graphs to construct and interpret is a

dotplot. Each data value is shown as a dot above its

location on a number line.

How to Make a Dotplot

Number of Goals Scored Per Game by the 2004 US Women’s Soccer Team

3 0 2 7 8 2 4 3 5 1 1 4 5 3 1 1 3

3 3 2 1 2 2 2 4 3 5 6 1 5 5 1 1 5

+

Examining the Distribution of a Quantitative Variable

The purpose of a graph is to help us understand the data. After

you make a graph, always ask, “What do I see?”

In any graph, look for the overall pattern and for striking

departures from that pattern.

Describe the overall pattern of a distribution by its:

•Shape

•Center

•Spread

Note individual values that fall outside the overall pattern.

These departures are called outliers.

How to Examine the Distribution of a Quantitative Variable

Dis

pla

yin

g Q

uantita

tive D

ata

Don’t forget your

SOCS!

+

Examine this data

The table and dotplot below displays the Environmental

Protection Agency’s estimates of highway gas mileage in miles

per gallon (MPG) for a sample of 24 model year 2009 midsize

cars.

Dis

pla

yin

g Q

uantita

tive D

ata

Describe the shape, center, and spread of

the distribution. Are there any outliers?

2009 Fuel Economy Guide

MODEL MPG

1

2

3

4

5

6

7

8

9

Acura RL 22

Audi A6 Quattro 23

Bentley Arnage 14

BMW 5281 28

Buick Lacrosse 28

Cadillac CTS 25

Chevrolet Malibu 33

Chrysler Sebring 30

Dodge Avenger 30

2009 Fuel Economy Guide

MODEL MPG <new>

9

10

11

12

13

14

15

16

17

Dodge Avenger 30

Hyundai Elantra 33

Jaguar XF 25

Kia Optima 32

Lexus GS 350 26

Lincolon MKZ 28

Mazda 6 29

Mercedes-Benz E350 24

Mercury Milan 29

2009 Fuel Economy Guide

MODEL MPG <new>

16

17

18

19

20

21

22

23

24

Mercedes-Benz E350 24

Mercury Milan 29

Mitsubishi Galant 27

Nissan Maxima 26

Rolls Royce Phantom 18

Saturn Aura 33

Toyota Camry 31

Volksw agen Passat 29

Volvo S80 25

Example, page 28

+

Examine this data

Here is the estimated battery life for each of 9 different smart

phones (in minutes) according to http://cell-

phones.toptenreviews.com/smartphones/.

Dis

pla

yin

g Q

uantita

tive D

ata

Describe the shape, center, and spread of the distribution. Are there any outliers? Describe the shape, center, and spread of the distribution. Are there any

outliers?

Solution: Shape: There is a peak at 300 and the distribution has a long tail

to the right (skewed to the right). Center: The middle value is 330

minutes. Spread: The range is 460 – 300 = 160 minutes. Outliers: There

is one phone with an unusually long battery life, the HTC Droid at 460

minutes.

Alternate Example

Smart Phone Battery Life (minutes)

Apple iPhone 300

Motorola Droid 385

Palm Pre 300

Blackberry Bold 360

Blackberry Storm 330

Motorola Cliq 360

Samsung Moment 330

Blackberry Tour 300

HTC Droid 460

BatteryLife (minutes)

300 340 380 420 460

Collection 1 Dot Plot

+

Dis

pla

yin

g Q

uantita

tive D

ata

Describing Shape

When you describe a distribution’s shape, concentrate on

the main features. Look for rough symmetry or clear

skewness.

Definitions:

A distribution is roughly symmetric if the right and left sides of the

graph are approximately mirror images of each other.

A distribution is skewed to the right (right-skewed) if the right side of

the graph (containing the half of the observations with larger values) is

much longer than the left side.

It is skewed to the left (left-skewed) if the left side of the graph is

much longer than the right side.

Symmetric Skewed-left Skewed-right

+

Dis

pla

yin

g Q

uantita

tive D

ata

Alternate Example

Here are dotplots for two different sets of quantitative data.

DieRoll

0 2 4 6 8 10

Collection 2 Dot Plot

calories

100 140 180 220

Collection 1 Dot Plot

The first dotplot shows the outcomes of 100 rolls of a 10-sided die. This

distribution is roughly symmetric with no obvious modes. Don’t worry about

the small differences in the number of dots for each die roll—this is bound

to happen just by chance even if the frequencies should be the same. A

distribution with this shape can be called “approximately uniform.” The

second dotplot shows the number of calories in one serving of whole wheat

or multigrain bread. This distribution is skewed to the left with a peak at 220

calories.

+

Dis

pla

yin

g Q

uantita

tive D

ata

Comparing Distributions

Some of the most interesting statistics questions

involve comparing two or more groups.

Always discuss shape, center, spread, and

possible outliers whenever you compare

distributions of a quantitative variable.

Example, page 32

Compare the distributions of

household size for these

two countries. Don’t forget

your SOCS!

Pla

ce

U

.K

So

uth

Afr

ica

+

Dis

pla

yin

g Q

uantita

tive D

ata

Alternate Example

How do the annual energy costs (in dollars) compare for refrigerators

with top freezers and refrigerators with bottom freezers? The data

below is from the May 2010 issue of Consumer Reports.

Compare the distributions of annual energy costs for these two

types of refrigerators. Don’t forget your SOCS!

EnergyCost

Ty

pe

14012611298847056

bottom

top

Dotplot of EnergyCost vs Type

Solution: Shape: The distribution for bottom freezers looks skewed to the

right and possibly bimodal, with modes near $58 and $70 per year. The

distribution for top freezers looks roughly symmetric with its main peak

centered around $55. Center: The typical energy cost for the bottom

freezers is higher than the typical cost for the top freezers (median of $69

vs. median of $55). Spread: There is much more variability in the

energy costs for bottom freezers, since the range is $101 compared to

$17 for the top freezers. Outliers: There are a couple of bottom freezers

with unusually high energy costs (over $140 per year). There are no

outliers for the top freezers.

+

1)Separate each observation into a stem (all but the final

digit) and a leaf (the final digit).

2)Write all possible stems from the smallest to the largest in a

vertical column and draw a vertical line to the right of the

column.

3)Write each leaf in the row to the right of its stem.

4)Arrange the leaves in increasing order out from the stem.

5)Provide a key that explains in context what the stems and

leaves represent.

Dis

pla

yin

g Q

uantita

tive D

ata

Stemplots (Stem-and-Leaf Plots)

Another simple graphical display for small data sets is a

stemplot. Stemplots give us a quick picture of the distribution

while including the actual numerical values.

How to Make a Stemplot

+

Dis

pla

yin

g Q

uantita

tive D

ata

Stemplots (Stem-and-Leaf Plots)

These data represent the responses of 20 female AP

Statistics students to the question, “How many pairs of

shoes do you have?” Construct a stemplot.

50 26 26 31 57 19 24 22 23 38

13 50 13 34 23 30 49 13 15 51

Stems

1

2

3

4

5

Add leaves

1 93335

2 664233

3 1840

4 9

5 0701

Order leaves

1 33359

2 233466

3 0148

4 9

5 0017

Add a key

Key: 4|9

represents a

female student

who reported

having 49

pairs of shoes.

+

Dis

pla

yin

g Q

uantita

tive D

ata

Splitting Stems and Back-to-Back Stemplots

When data values are “bunched up”, we can get a better picture of

the distribution by splitting stems.

Two distributions of the same quantitative variable can be

compared using a back-to-back stemplot with common stems.

50 26 26 31 57 19 24 22 23 38

13 50 13 34 23 30 49 13 15 51

0

0

1

1

2

2

3

3

4

4

5

5

Key: 4|9

represents a

student who

reported

having 49

pairs of shoes.

Females

14 7 6 5 12 38 8 7 10 10

10 11 4 5 22 7 5 10 35 7

Males

0 4

0 555677778

1 0000124

1

2 2

2

3

3 58

4

4

5

5

Females

333

95

4332

66

410

8

9

100

7

Males

“split stems”

+

Dis

pla

yin

g Q

uantita

tive D

ata

Alternate Example – Who’s Taller? Which gender is taller, males or females? A sample of 14-year-olds

from the United Kingdom was randomly selected using the

CensusAtSchool website.

Here are the heights of the students (in cm):

Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175,

174, 165, 165, 183, 180

Female: 160, 169, 152, 167, 164, 163, 160, 163, 169, 157, 158,

153, 161, 165, 165, 159, 168, 153, 166, 158, 158, 166

15 1479

16 235579

17 4567

18 037

15 14

15 79

16 23

16 5579

17 4

17 567

18 03

18 7

Key: 15|1 represents a male who is 151 cm tall.

Here are two stemplots for Male

heights, one with split stems:

Here is a back-to-back stemplot

comparing male and female heights:

Female Male

332 15 14

98887 15 79

433100 16 23

99876655 16 5579

17 4

17 567

18 03

18 7

+

Dis

pla

yin

g Q

uantita

tive D

ata

Splitting Stems and Back-to-Back Stemplots

When data values are “bunched up”, we can get a better picture of

the distribution by splitting stems.

Two distributions of the same quantitative variable can be

compared using a back-to-back stemplot with common stems.

50 26 26 31 57 19 24 22 23 38

13 50 13 34 23 30 49 13 15 51

0

0

1

1

2

2

3

3

4

4

5

5

Key: 4|9

represents a

student who

reported

having 49

pairs of shoes.

Females

14 7 6 5 12 38 8 7 10 10

10 11 4 5 22 7 5 10 35 7

Males

0 4

0 555677778

1 0000124

1

2 2

2

3

3 58

4

4

5

5

Females

333

95

4332

66

410

8

9

100

7

Males

“split stems”

+

1)Divide the range of data into classes of equal width.

2)Find the count (frequency) or percent (relative frequency) of

individuals in each class.

3)Label and scale your axes and draw the histogram. The

height of the bar equals its frequency. Adjacent bars should

touch, unless a class contains no individuals.

Dis

pla

yin

g Q

uantita

tive D

ata

Histograms

Quantitative variables often take many values. A graph of the

distribution may be clearer if nearby values are grouped

together.

The most common graph of the distribution of one

quantitative variable is a histogram.

How to Make a Histogram

+

Making a Histogram

The table on page 35 presents data on the percent of

residents from each state who were born outside of the U.S.

Dis

pla

yin

g Q

uantita

tive D

ata

Example, page 35

Frequency Table

Class Count

0 to <5 20

5 to <10 13

10 to <15 9

15 to <20 5

20 to <25 2

25 to <30 1

Total 50 Percent of foreign-born residents

Nu

mb

er

of

Sta

tes

+

Making a Histogram

The table presents data on the average points scored per

game (PTSG) for the 30 NBA teams in the 2009-2010 regular

season..

Dis

pla

yin

g Q

uantita

tive D

ata

Calculator Example

101.7 99.2 95.3 97.5 102.1 102

106.5 94 108.8 102.4 100.8 95.7

101.7 102.5 96.5 97.7 98.2 92.4

100.2 102.1 101.5 102.8 97.7 110.2

98.1 100 101.4 104.1 104.2 96.2

+

1)Don’t confuse histograms and bar graphs.

2)Don’t use counts (in a frequency table) or percents (in a

relative frequency table) as data.

3)Use percents instead of counts on the vertical axis when

comparing distributions with different numbers of

observations.

4)Just because a graph looks nice, it’s not necessarily a

meaningful display of data.

Dis

pla

yin

g Q

uantita

tive D

ata

Using Histograms Wisely

Here are several cautions based on common mistakes

students make when using histograms.

Cautions

+

In this section, we learned that…

You can use a dotplot, stemplot, or histogram to show the distribution of a quantitative variable.

When examining any graph, look for an overall pattern and for notable departures from that pattern. Describe the shape, center, spread, and any outliers. Don’t forget your SOCS!

Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape.

When comparing distributions, be sure to discuss shape, center, spread, and possible outliers.

Histograms are for quantitative data, bar graphs are for categorical data. Use relative frequency histograms when comparing data sets of different sizes.

Summary

Section 1.2 Displaying Quantitative Data with Graphs

+ Looking Ahead…

We’ll learn how to describe quantitative data with

numbers.

Mean and Standard Deviation

Median and Interquartile Range

Five-number Summary and Boxplots

Identifying Outliers

We’ll also learn how to calculate numerical summaries

with technology and how to choose appropriate

measures of center and spread.

In the next Section…