Measures of Central Tendencydspace.mit.edu/bitstream/handle/1721.1/55900/11... · Measures of...

Preview:

Citation preview

Measures of Central Tendency

11.220Lecture 5

22 February 2006R. Ryznar

Today’s Content

• Wrap-up from yesterday• Frequency Distributions• The Mean, Median and Mode• Levels of Measurement and Measures of

Central Tendency• Measures of Dispersion• Measures of Central Tendency and Dispersion

Together

Definitions for Frequency Distributions

• Variable– the trait or characteristic on which the classification is based.

• Class– one of the grouped categories of the variable.

• Class Boundaries– the lowest and highest values that fall within the class.

• Class Midpoints– the point halfway between the upper and lower class boundaries.

• Class Interval– the distance between the upper limit of one class and the upper

limit of the next higher class.• Class Frequency

– the number of observations or occurrences of the variable with agiven class.

• Total Frequency– the total number of observations or cases in the table.

• Bin the data• Bins must be exclusive and exhaustive• Look for overall patterns• Look for any striking deviations to the pattern• Look for any outliers• Give the center and the spread• Describe the shape

Creating and Using Frequency Distributions

14.0 16.0 18.0 20.0 22.0 24.0 26.0

Percentage

0

2

4

6

8

10Fr

eque

ncy

Mean = 20.492Std. Dev. = 2.3883N = 50

Percentage of Adult Population who are Obese byU.S. State (2001)

Stem-and-Leaf PlotPercentage of adult population who are obese (2001)

Frequency Stem & Leaf

1.00 14 . 4.00 15 .1.00 16 . 15.00 17 . 1 3 3 6 85.00 18 . 2 4 4 8 99.00 19 . 0 0 0 1 2 2 7 8 98.00 20 . 0 0 0 1 5 6 7 98.00 21 . 0 0 4 7 7 8 8 95.00 22 . 1 1 4 5 63.00 23 . 3 4 84.00 24 . 0 2 4 61.00 25 . 9

Stem width: 1.0Each leaf: 1 case(s)

Babe Ruth’s homeruns, 1920 to 1934:

54 59 35 41 46 25 47 60 54 46 49 46 41 34 22

Barry Bonds home runs, 1986 to 2004:

16 25 24 19 33 25 34 46 37 33 42 40 37 34 49 73 46 45 45

Back to back stemplot

Bonds Ruth9 6 1

5 5 4 2 2 57 7 4 4 3 3 3 4 5

9 6 6 5 5 2 0 4 1 1 6 6 6 7 95 4 4 96 0

3 7

Number of Crews

Tons of Garbage Normal Moore

50 - 59 15 2260 - 69 25 3770 - 79 30 4980 - 89 20 3690 - 99 10 21Total Crews 100 165

Percentage of Work Crews

Tons of Garbage Normal Moore

50 - 59 15 1360 - 69 25 2270 - 79 30 3080 - 89 20 2290 - 99 10 13Total Percent 100 100N = 100 165

Opinion on president's performance(number of responses by gender)

0

50

100

150

200

250

300

Strongly approve Approve Neither approvenor disapprove

Disapprove Stronglydisapprove

Females (n=110)Males (n=683)

Opinion on president's performance(percentage of responses by gender)

0%

10%

20%

30%

40%

50%

60%

Strongly approve Approve Neither approvenor disapprove

Disapprove Stronglydisapprove

FemalesMales

Response Time of Metro Fire Department

0

10

20

30

40

50

60

70

80

90

100

Under 5minutes

Under 10minutes

Under 15minutes

Under 20Minutes

Cum

ulat

ive

Perc

enta

ge

Response Time Percentage (Cumulative) of

Response Times

Under 5 minutes 27.6

Under 10 minutes 82.4

Under 15 minutes 98.8

Under 20 Minutes 100

Atlantis Fire Department, 2001 Metro Fire Department, 2001

Response TimePercentage

(Cumulative) of Response

TimesUnder 5 minutes 27.6Under 10 minutes 82.4Under 15 minutes 98.8Under 20 Minutes 100.0

Response Time Percentage (Cumulative) of

Response Times

Under 5 minutes 21.2Under 10 minutes 63.9Under 15 minutes 86.4Under 20 Minutes 100.0

Under 5 minutesUnder 10minutes Under 15

minutes Under 20Minutes

Atlantis

Metro0

102030405060708090

100

Fire Department Response Time

μ =Xi

i=1

N

∑N

=X1 + X2 + ...+ XN( )

N( )

nxxx

n

xx n

n

ii +++==

∑= ...211

Sample mean:Population mean:

The median is the middle number in an ordered dataset.

It can be found by ordering the values from lowest to highest and then finding the observation at (n+1)/2.

Mode?The most frequently occurring value in the dataset.

Journal of Statistics Education Volume 13, Number 2 (2005), www.amstat.org/publications/jse/v13n2/vonhippel.htmlCopyright © 2005 by Paul T. von Hippel

Classic illustration of the relationship between skew, mean, median and mode

The mean is not a resistant measure of center.

Mode

f

x

Mean

Median

Figure by MIT OCW.

Ruth BondsMean 659/15=43.93 703/19=37

Median 46 37

Mode 46 multiple

Bonds Ruth9 6 1

5 5 4 2 2 57 7 4 4 3 3 3 4 5

9 6 6 5 5 2 0 4 1 1 6 6 6 7 95 4 4 96 0

3 7

14.0 16.0 18.0 20.0 22.0 24.0 26.0

Percentage

0

2

4

6

8

10

Freq

uenc

y

Mean = 20.492Std. Dev. = 2.3883N = 50

Percentage of Adult Population who are Obese byU.S. State (2001)

Median = 20.3

Skewness = -.012

A normal distribution has a skew of 0 since it is symmetric.

Statistics

PctObese500

20.49220.300

19.0a

2.38835.704-.012.337

-.100.662

18.97520.30022.100

ValidMissing

N

MeanMedianModeStd. DeviationVarianceSkewnessStd. Error of SkewnessKurtosisStd. Error of Kurtosis

255075

Percentiles

Multiple modes exist. The smallest value is showna.

A normal distribution has kurtosis 0, since it is symmetric.

0.0 10.0 20.0 30.0 40.0 50.0

Percent Hispanic

0

5

10

15

20

Freq

uenc

y

Mean = 7.73Std. Dev. = 8.9125N = 50

Percent of Population of Hispanic Originby U.S. State (2000)

Median = 4.7

Skewness = 2.245

Kurtosis = 5.12

5.0 10.0 15.0 20.0 25.0

Percent with no Health Insurance

0

2

4

6

8Fr

eque

ncy

Mean = 13.288Std. Dev. = 3.6878N = 50

Percentage of Population without Health Insurance for One Year by U.S. State

(Three Year Average from 2000 to 2002)

Median = 13.05

Skewness = .732

Kurtosis = .262

Levels of Measurement and Measures of Central Tendency

Nominal Ordinal Interval

Mean

Median

Mode

Nominal Ordinal Interval

Mean X

Median X X

Mode X X X

Would you prefer to ride on this facility?Active Living Visual Preference Survey, Collingwood, Ontario - November 2005

-3

-2

-1

0

1

2

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Slide Number

Mea

n (N

o -3

to Y

es +

3)

0 2 4 6 8

Slide1

0

10

20

30

40

50

Freq

uenc

y

Mean = 3.31Std. Dev. = 1.895N = 184

Histogram

-4.00 -2.00 0.00 2.00 4.00

Slide40Rec

0

20

40

60

80

100

120

Freq

uenc

yMean = -1.30Std. Dev. = 2.33003N = 180

Histogram

0 10 20 30 40 50 60

Slide19

0

20

40

60

80Fr

eque

ncy

Mean = 3.97Std. Dev. = 4.097N = 186

Slide19

The Five Number SummaryMinimum, Q1, Median, Q3, Maximum

To calculate quartiles:

• Arrange the observations in increasing order and locate the median M in the ordered list of observations.

• The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

• The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

0 1male

0.00

25000.00

50000.00

75000.00

100000.00

125000.00

incw

s

40

The ends of the box (hinges) are at the quartiles, so that the length of the box is the IQR.

The median is marked by a line within the box.

The two vertical lines (called whiskers) outside the box extend to the smallest and largest observations within 1.5 X IQR of the quartiles.

Observations that fall outside of 3 X IQR are called extreme outliers and are marked, for example, with an open circle.

Observations between 1.5 X IQR and 3 X IQR are called mild outliers and are distinguished by a different mark, e.g., a closed circle.

Variance and Standard Deviation

Measures the average distance of the observations from their mean.

Variance

Standard Deviation1

)(...)()( 222

212

−−++−+−

=n

xxxxxxs n

1)( 2

−−∑

=n

xxs i

Visitors

1792 1762 - 1600 = 192 1922 = 36,8641666 1666 - 1600 = 66 662 = 4,3561362 1362 - 1600 = -238 -2382 = 56,6441614 1614 - 1600 = 14 142 = 1961460 1460 - 1600 = -140 -1402 = 19,6001867 1867 - 1600 = 267 2672 = 71,2891439 1439 - 1600 = -161 -1612 = 25,921

Sum = 0 Sum = 214,870

xxi −2)( xxi −

67.811,356870,2142 ==s

24.18967.811,35 ==s

14.0 16.0 18.0 20.0 22.0 24.0 26.0

Percentage

0

2

4

6

8

10

Freq

uenc

y

Mean = 20.492Std. Dev. = 2.3883N = 50

Percentage of Adult Population who are Obese byU.S. State (2001)

Variable 1 Variable 29.14 6.588.14 5.768.74 7.718.77 8.849.26 8.478.10 7.046.13 5.253.10 5.569.13 7.917.26 6.894.74 12.50

3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00

Variable 1

0

1

2

3

4

Freq

uenc

y

Mean = 7.5009Std. Dev. = 2.03166N = 11

Histogram

4.00 6.00 8.00 10.00 12.00 14.00

Variable 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Freq

uenc

y

Mean = 7.5009Std. Dev. = 2.03058N = 11

Histogram

Recommended