67
Statistics Numerical Representation of Data Part 2 – Measure of Variation

Statistics Numerical Representation of Data Part 2 – Measure of Variation

Embed Size (px)

Citation preview

Page 1: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Statistics

Numerical Representation of Data

Part 2 – Measure of Variation

Page 2: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Warm –up

Approximate the mean of the frequency distribution.

Class Frequency, f

1 – 6 21

7 – 12 16

13 – 18 28

19 – 24 13

Page 3: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Warm-up - What can be said about the relationship between the mean and median in the dotplot below?

a) The mean is smaller than the median.

b) The mean is bigger than the median.

c) The mean is equal to the median.

d) Nothing can be determined based on the graph.

Page 4: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Warm-up - An investor was interested in determining how much gain she had in her 401K plan in the last 6 quarters. The data is listed below. Find the median and the mean of the data.

-510 110 1230 1900 -680 1700

a) Mean = 1021.7 Median = 670b) Mean = 1021.7 Median = 1565 c) Mean = 625 Median = 3.5d) Mean = 625 Median = 670e) Mean = 625 Median = 1565

Page 5: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Agenda Warm-up Homework Review Lesson Objectives

Determine the range of a data set Determine the variance and standard deviation of a

population and of a sample Use the Empirical Rule and Chebychev’s Theorem to

interpret standard deviation Approximate the sample standard deviation for grouped data

Summary Homework

Page 6: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Measures of VariationMeasures of Variation

Consider the following:Consider the following:

Wait time in minutes:Wait time in minutes:Bank 1Bank 1 Bank 2Bank 2

MedianMedian 7.207.20 7.207.20

MeanMean 7.157.15 7.157.15

Complaints/moComplaints/mo 3 3 22 22

Page 7: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Objective 1

Compute the range of a variable from raw data

3-7

Page 8: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Range

Range The difference between the maximum and

minimum data entries in the set. The data must be quantitative. Range = (Max. data entry) – (Min. data entry)

Page 9: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Finding the Range

A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the range of the starting salaries.

Starting salaries (1000s of dollars)

41 38 39 45 47 41 44 41 37 42

Page 10: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Range

Ordering the data helps to find the least and greatest salaries.

37 38 39 41 41 41 42 44 45 47

Range = (Max. salary) – (Min. salary)

= 47 – 37 = 10

The range of starting salaries is 10 or $10,000.

minimum maximum

Page 11: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Finding the Range of a Set of Data

The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Find the range.

Range = 43 – 5

= 38 minutes

3-11

Page 12: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Objective 2

Compute the variance of a variable from raw data

3-12

Page 13: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Deviation, Variance, and Standard Deviation

Deviation The difference between the data entry, x, and

the mean of the data set. Population data set:

Deviation of x = x – μ Sample data set:

Deviation of x = x – x

Page 14: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Finding the Deviation

A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the deviation of the starting salaries.

Starting salaries (1000s of dollars)

41 38 39 45 47 41 44 41 37 42Solution:• First determine the mean starting salary.

41541.5

10

x

N

Page 15: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Deviation

Determine the deviation for each data entry.

Salary ($1000s), x

Deviation: x – μ

41 41 – 41.5 = –0.5

38 38 – 41.5 = –3.5

39 39 – 41.5 = –2.5

45 45 – 41.5 = 3.5

47 47 – 41.5 = 5.5

41 41 – 41.5 = –0.5

44 44 – 41.5 = 2.5

41 41 – 41.5 = –0.5

37 37 – 41.5 = –4.5

42 42 – 41.5 = 0.5Σx = 415 Σ(x – μ) = 0

Page 16: Statistics Numerical Representation of Data Part 2 – Measure of Variation

To order food at a McDonald’s Restaurant, or at Wendy’s Restaurant, you must stand in line. The following data represent the wait time (in minutes) in line for a simple random sample of 30 customers at each restaurant during the lunch hour. For each sample, answer the following:

(a) What was the mean wait time?

(b) Draw a histogram of each restaurant’s wait time.

(c ) Which restaurant’s wait time appears more dispersed? Which line would you prefer to wait in? Why?

3-16

Page 17: Statistics Numerical Representation of Data Part 2 – Measure of Variation

1.50 0.79 1.01 1.66 0.94 0.672.53 1.20 1.46 0.89 0.95 0.901.88 2.94 1.40 1.33 1.20 0.843.99 1.90 1.00 1.54 0.99 0.350.90 1.23 0.92 1.09 1.72 2.00

3.50 0.00 0.38 0.43 1.82 3.040.00 0.26 0.14 0.60 2.33 2.541.97 0.71 2.22 4.54 0.80 0.500.00 0.28 0.44 1.38 0.92 1.173.08 2.75 0.36 3.10 2.19 0.23

Wait Time at Wendy’s

Wait Time at McDonald’s

3-17

Page 18: Statistics Numerical Representation of Data Part 2 – Measure of Variation

(a) The mean wait time in each line is 1.39 minutes.

3-18

Page 19: Statistics Numerical Representation of Data Part 2 – Measure of Variation

(b)

3-19

Page 20: Statistics Numerical Representation of Data Part 2 – Measure of Variation

The population variance of a variable is the sum of squared deviations about the population mean divided by the number of observations in the population, N.

That is it is the mean of the sum of the squared deviations about the population mean.

3-20

Page 21: Statistics Numerical Representation of Data Part 2 – Measure of Variation

The population variance is symbolically represented by σ2 (lower case Greek sigma squared).

Note: When using the above formula, do not round until the last computation. Use as many decimals as allowed by your calculator in order to avoid round off errors.

3-21

Page 22: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Computing a Population Variance

The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Compute the population variance of this data. Recall that

17424.85714

7

3-22

Page 23: Statistics Numerical Representation of Data Part 2 – Measure of Variation

xi μ xi – μ (xi – μ)2

23 24.85714 -1.85714 3.44898

36 24.85714 11.14286 124.1633

23 24.85714 -1.85714 3.44898

18 24.85714 -6.85714 47.02041

5 24.85714 -19.8571 394.3061

26 24.85714 1.142857 1.306122

43 24.85714 18.14286 329.1633

902.8571 2

ix 2

2 902.8571

7ix

N

129.0 minutes2

3-23

Page 24: Statistics Numerical Representation of Data Part 2 – Measure of Variation

The Computational Formula

3-24

Page 25: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Computing a Population Variance Using the Computational Formula

The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Compute the population variance of this data using the computational formula.

3-25

Page 26: Statistics Numerical Representation of Data Part 2 – Measure of Variation

2 2 2 223 36 ... 43 5228ix Data Set : 23, 36, 23, 18, 5, 26, 43

23 36 ... 43 174ix

22

2

2

1745228

77

i

i

xx

NN

129.0

3-26

Page 27: Statistics Numerical Representation of Data Part 2 – Measure of Variation

The sample variance is computed by determining the sum of squared deviations about the sample mean and then dividing this result by n – 1.

3-27

Page 28: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Note: Whenever a statistic consistently overestimates or underestimates a parameter, it is called biased. To obtain an unbiased estimate of the population variance, we divide the sum of the squared deviations about the mean by n - 1.

3-28

Page 29: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Computing a Sample Variance

Previously, we obtained the following simple random sample for the travel time data: 5, 36, 26.

Compute the sample variance travel time.

Travel Time, xi Sample Mean, Deviation about the Mean,

Squared Deviations about the Mean,

5 22.333 5 – 22.333

= -17.333

(-17.333)2 = 300.432889

36 22.333 13.667 186.786889

26 22.333 3.667 13.446889

xix x 2

ix x

2500.66667ix x

2

2 500.66667

1 3 1

ix xs

n

250.333 square minutes

3-29

Page 30: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Objective 3

Compute the standard deviation of a variable from raw data

3-30

Page 31: Statistics Numerical Representation of Data Part 2 – Measure of Variation

The population standard deviation is denoted by

It is obtained by taking the square root of the population variance, so that

The sample standard deviation is denoted by

s

It is obtained by taking the square root of the sample variance, so that

2s s3-31

Page 32: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Computing a Population Standard Deviation

The following data represent the travel times (in minutes) to work for all seven employees of a start-up web development company.

23, 36, 23, 18, 5, 26, 43

Compute the population standard deviation of this data.

Recall, from the last objective that σ2 = 129.0 minutes2. Therefore,

2 902.857111.4 minutes

7

3-32

Page 33: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Computing a Sample Standard Deviation

Recall the sample data 5, 26, 36 results in a sample variance of

2

2 500.66667

1 3 1

ix xs

n

250.333 square minutes

Use this result to determine the sample standard deviation.

2 500.66666715.8 minutes

3 1s s

3-33

Page 34: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Comparing Standard Deviations

Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why?

3-34

Page 35: Statistics Numerical Representation of Data Part 2 – Measure of Variation

1.50 0.79 1.01 1.66 0.94 0.672.53 1.20 1.46 0.89 0.95 0.901.88 2.94 1.40 1.33 1.20 0.843.99 1.90 1.00 1.54 0.99 0.350.90 1.23 0.92 1.09 1.72 2.00

3.50 0.00 0.38 0.43 1.82 3.040.00 0.26 0.14 0.60 2.33 2.541.97 0.71 2.22 4.54 0.80 0.500.00 0.28 0.44 1.38 0.92 1.173.08 2.75 0.36 3.10 2.19 0.23

Wait Time at Wendy’s

Wait Time at McDonald’s

3-35

Page 36: Statistics Numerical Representation of Data Part 2 – Measure of Variation

EXAMPLE Comparing Standard Deviations

Determine the standard deviation waiting time for Wendy’s and McDonald’s. Which is larger? Why?

Sample standard deviation for Wendy’s:

0.738 minutes

Sample standard deviation for McDonald’s:

1.265 minutes

3-36

Page 37: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Deviation, Variance, and Standard Deviation

Population Variance

Population Standard Deviation

22 ( )x

N

Sum of squares, SSx

22 ( )x

N

Page 38: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Finding the Population Variance & Standard Deviation

In Words In Symbols1. Find the mean of the

population data set.

2. Find deviation of each entry.

3. Square each deviation.

4. Add to get the sum of squares.

x

N

x – μ

(x – μ)2

SSx = Σ(x – μ)2

Page 39: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Finding the Population Variance & Standard Deviation

5. Divide by N to get the population variance.

6. Find the square root to get the population standard deviation.

22 ( )x

N

2( )x

N

In Words In Symbols

Page 40: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Finding the Population Standard Deviation

A corporation hired 10 graduates. The starting salaries for each graduate are shown. Find the population variance and standard deviation of the starting salaries.

Starting salaries (1000s of dollars)

41 38 39 45 47 41 44 41 37 42

Recall μ = 41.5.

Page 41: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Population Standard Deviation

Determine SSx

N = 10Salary, x Deviation: x – μ Squares: (x – μ)2

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

38 38 – 41.5 = –3.5 (–3.5)2 = 12.25

39 39 – 41.5 = –2.5 (–2.5)2 = 6.25

45 45 – 41.5 = 3.5 (3.5)2 = 12.25

47 47 – 41.5 = 5.5 (5.5)2 = 30.25

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

44 44 – 41.5 = 2.5 (2.5)2 = 6.25

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

37 37 – 41.5 = –4.5 (–4.5)2 = 20.25

42 42 – 41.5 = 0.5 (0.5)2 = 0.25

Σ(x – μ) = 0 SSx = 88.5

Page 42: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Population Standard Deviation

Population Variance

Population Standard Deviation

22 ( ) 88.5

8.910

x

N

2 8.85 3.0

The population standard deviation is about 3.0, or $3000.

Page 43: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Deviation, Variance, and Standard Deviation

Sample Variance

Sample Standard Deviation

22 ( )

1

x xs

n

22 ( )

1

x xs s

n

Page 44: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Finding the Sample Variance & Standard Deviation

In Words In Symbols1. Find the mean of the

sample data set.

2. Find deviation of each entry.

3. Square each deviation.

4. Add to get the sum of squares.

xx

n

2( )xSS x x

2( )x x

x x

Page 45: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Finding the Sample Variance & Standard Deviation

5. Divide by n – 1 to get the sample variance.

6. Find the square root to get the sample standard deviation.

In Words In Symbols2

2 ( )

1

x xs

n

2( )

1

x xs

n

Page 46: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Finding the Sample Standard Deviation

The starting salaries are for the Chicago branches of a corporation. The corporation has several other branches, and you plan to use the starting salaries of the Chicago branches to estimate the starting salaries for the larger population. Find the sample standard deviation of the starting salaries.

Starting salaries (1000s of dollars)

41 38 39 45 47 41 44 41 37 42

Page 47: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Sample Standard Deviation

Determine SSx

n = 10

Salary, x Deviation: x – μ Squares: (x – μ)2

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

38 38 – 41.5 = –3.5 (–3.5)2 = 12.25

39 39 – 41.5 = –2.5 (–2.5)2 = 6.25

45 45 – 41.5 = 3.5 (3.5)2 = 12.25

47 47 – 41.5 = 5.5 (5.5)2 = 30.25

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

44 44 – 41.5 = 2.5 (2.5)2 = 6.25

41 41 – 41.5 = –0.5 (–0.5)2 = 0.25

37 37 – 41.5 = –4.5 (–4.5)2 = 20.25

42 42 – 41.5 = 0.5 (0.5)2 = 0.25

Σ(x – μ) = 0 SSx = 88.5

Page 48: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Finding the Sample Standard Deviation

Sample Variance

Sample Standard Deviation

22 ( ) 88.5

9.81 10 1

x xs

n

2 88.53.1

9s s

The sample standard deviation is about 3.1, or $3100.

Page 49: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Using Technology to Find the Standard Deviation

Sample office rental rates (in dollars per square foot per year) for Miami’s central business district are shown in the table. Use a calculator or a computer to find the mean rental rate and the sample standard deviation. (Adapted from: Cushman & Wakefield Inc.)

Office Rental Rates

35.00 33.50 37.00

23.75 26.50 31.25

36.50 40.00 32.00

39.25 37.50 34.75

37.75 37.25 36.75

27.00 35.75 26.00

37.00 29.00 40.50

24.50 33.00 38.00

Page 50: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Using Technology to Find the Standard Deviation

Sample Mean

Sample Standard Deviation

Page 51: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Interpreting Standard Deviation

Standard deviation is a measure of the typical amount an entry deviates from the mean.

The more the entries are spread out, the greater the standard deviation.

Page 52: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Estimating Standard Deviation

Range Rule of Thumb S = R/4

A quick estimation tool to determine if the standard deviation calculation is approximately correct

Page 53: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Interpreting Standard Deviation: Empirical Rule (68 – 95 – 99.7 Rule)

For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics:

• About 68% of the data lie within one standard deviation of the mean.

• About 95% of the data lie within two standard deviations of the mean.

• About 99.7% of the data lie within three standard deviations of the mean.

Page 54: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Interpreting Standard Deviation: Empirical Rule (68 – 95 – 99.7 Rule)

3x s x s 2x s 3x sx s x2x s

68% within 1 standard deviation

34% 34%

99.7% within 3 standard deviations

2.35% 2.35%

95% within 2 standard deviations

13.5% 13.5%

Page 55: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Standard Deviation and Area Zσ Percentage within CI Percentage outside CI Ratio outside

CI 1σ 68.2689492% 31.7310508% 1 / 3.1514871 1.645σ 90% 10% 1 / 10 1.960σ 95% 5% 1 / 20 2σ 95.4499736% 4.5500264% 1 / 21.977894 2.576σ 99% 1% 1 / 100 3σ 99.7300204% 0.2699796% 1 / 370.398 3.2906σ 99.9% 0.1% 1 / 1000 4σ 99.993666% 0.006334% 1 / 15,788 5σ 99.9999426697% 0.0000573303% 1 / 1,744,278 6σ 99.9999998027% 0.0000001973%

1 / 506,800,000 7σ 99.9999999997440% 0.0000000002560%

1 / 390,600,000,000

Page 56: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Using the Empirical Rule

In a survey conducted by the National Center for Health Statistics, the sample mean height of women in the United States (ages 20-29) was 64 inches, with a sample standard deviation of 2.71 inches. Estimate the percent of the women whose heights are between 64 inches and 69.42 inches.

Page 57: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Using the Empirical Rule

3x s x s 2x s 3x sx s x2x s55.87 58.58 61.29 64 66.71 69.42 72.13

34%

13.5%

• Because the distribution is bell-shaped, you can use the Empirical Rule.

34% + 13.5% = 47.5% of women are between 64 and 69.42 inches tall.

Page 58: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Chebychev’s Theorem

The portion of any data set lying within k standard deviations (k > 1) of the mean is at least:

2

11

k

• k = 2: In any data set, at least 2

1 31 or 75%

2 4

of the data lie within 2 standard deviations of the mean.

• k = 3: In any data set, at least 2

1 81 or 88.9%

3 9

of the data lie within 3 standard deviations of the mean.

Page 59: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Using Chebychev’s Theorem

The age distribution for Florida is shown in the histogram. Apply Chebychev’s Theorem to the data using k = 2. What can you conclude?

Page 60: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Solution: Using Chebychev’s Theorem

k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0 since age can’t be negative)

μ + 2σ = 39.2 + 2(24.8) = 88.8

At least 75% of the population of Florida is between 0 and 88.8 years old.

Page 61: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Standard Deviation for Grouped Data

Sample standard deviation for a frequency distribution

When a frequency distribution has classes, estimate the sample mean and standard deviation by using the midpoint of each class.

2( )

1

x x fs

n

where n= Σf (the number of entries in the data set)

Page 62: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Example: Finding the Standard Deviation for Grouped Data

You collect a random sample of the number of children per household in a region. Find the sample mean and the sample standard deviation of the data set.

Number of Children in 50 Households

1 3 1 1 1

1 2 2 1 0

1 1 0 0 0

1 5 0 3 6

3 0 3 1 1

1 1 6 0 1

3 6 6 1 2

2 3 0 1 1

4 1 1 2 2

0 3 0 2 4

Page 63: Statistics Numerical Representation of Data Part 2 – Measure of Variation

x f xf

0 10 0(10) = 0

1 19 1(19) = 19

2 7 2(7) = 14

3 7 3(7) =21

4 2 4(2) = 8

5 1 5(1) = 5

6 4 6(4) = 24

Solution: Finding the Standard Deviation for Grouped Data

First construct a frequency distribution.

Find the mean of the frequency distribution.

Σf = 50 Σ(xf )= 91

911.8

50

xfx

n

The sample mean is about 1.8 children.

Page 64: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Coefficient of Variation

The coefficient of variation (or CV) for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean. This is a measure of the variability of the data. The higher the percentage, the more variable.

Sample Population

sxCV = 100% CV =

100%

Page 65: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Coefficient of Variation - Example

The average score in a calculus class is 110, with a standard deviation of 5; the average score in a statistics class is 106, with a standard deviation of 4. Which class is more variable in terms of scores?

CVc= 5/110 = 4.5% ; CVs = 4/106 = 3.8% The calculus class is more variable.

Page 66: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Summary

Determined the range of a data set Determined the variance and standard

deviation of a population and of a sample Used the Empirical Rule and Chebychev’s

Theorem to interpret standard deviation Approximated the sample standard deviation

for grouped data

Page 67: Statistics Numerical Representation of Data Part 2 – Measure of Variation

Homework

Pt 1 – Pg. 84 – 86; # 1-21 odd Pt 2 – Pg. 86 -91; # 23 – 49 odd