Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Measures of Central Tendency
11.220Lecture 5
22 February 2006R. Ryznar
Today’s Content
• Wrap-up from yesterday• Frequency Distributions• The Mean, Median and Mode• Levels of Measurement and Measures of
Central Tendency• Measures of Dispersion• Measures of Central Tendency and Dispersion
Together
Definitions for Frequency Distributions
• Variable– the trait or characteristic on which the classification is based.
• Class– one of the grouped categories of the variable.
• Class Boundaries– the lowest and highest values that fall within the class.
• Class Midpoints– the point halfway between the upper and lower class boundaries.
• Class Interval– the distance between the upper limit of one class and the upper
limit of the next higher class.• Class Frequency
– the number of observations or occurrences of the variable with agiven class.
• Total Frequency– the total number of observations or cases in the table.
• Bin the data• Bins must be exclusive and exhaustive• Look for overall patterns• Look for any striking deviations to the pattern• Look for any outliers• Give the center and the spread• Describe the shape
Creating and Using Frequency Distributions
14.0 16.0 18.0 20.0 22.0 24.0 26.0
Percentage
0
2
4
6
8
10Fr
eque
ncy
Mean = 20.492Std. Dev. = 2.3883N = 50
Percentage of Adult Population who are Obese byU.S. State (2001)
Stem-and-Leaf PlotPercentage of adult population who are obese (2001)
Frequency Stem & Leaf
1.00 14 . 4.00 15 .1.00 16 . 15.00 17 . 1 3 3 6 85.00 18 . 2 4 4 8 99.00 19 . 0 0 0 1 2 2 7 8 98.00 20 . 0 0 0 1 5 6 7 98.00 21 . 0 0 4 7 7 8 8 95.00 22 . 1 1 4 5 63.00 23 . 3 4 84.00 24 . 0 2 4 61.00 25 . 9
Stem width: 1.0Each leaf: 1 case(s)
Babe Ruth’s homeruns, 1920 to 1934:
54 59 35 41 46 25 47 60 54 46 49 46 41 34 22
Barry Bonds home runs, 1986 to 2004:
16 25 24 19 33 25 34 46 37 33 42 40 37 34 49 73 46 45 45
Back to back stemplot
Bonds Ruth9 6 1
5 5 4 2 2 57 7 4 4 3 3 3 4 5
9 6 6 5 5 2 0 4 1 1 6 6 6 7 95 4 4 96 0
3 7
Number of Crews
Tons of Garbage Normal Moore
50 - 59 15 2260 - 69 25 3770 - 79 30 4980 - 89 20 3690 - 99 10 21Total Crews 100 165
Percentage of Work Crews
Tons of Garbage Normal Moore
50 - 59 15 1360 - 69 25 2270 - 79 30 3080 - 89 20 2290 - 99 10 13Total Percent 100 100N = 100 165
Opinion on president's performance(number of responses by gender)
0
50
100
150
200
250
300
Strongly approve Approve Neither approvenor disapprove
Disapprove Stronglydisapprove
Females (n=110)Males (n=683)
Opinion on president's performance(percentage of responses by gender)
0%
10%
20%
30%
40%
50%
60%
Strongly approve Approve Neither approvenor disapprove
Disapprove Stronglydisapprove
FemalesMales
Response Time of Metro Fire Department
0
10
20
30
40
50
60
70
80
90
100
Under 5minutes
Under 10minutes
Under 15minutes
Under 20Minutes
Cum
ulat
ive
Perc
enta
ge
Response Time Percentage (Cumulative) of
Response Times
Under 5 minutes 27.6
Under 10 minutes 82.4
Under 15 minutes 98.8
Under 20 Minutes 100
Atlantis Fire Department, 2001 Metro Fire Department, 2001
Response TimePercentage
(Cumulative) of Response
TimesUnder 5 minutes 27.6Under 10 minutes 82.4Under 15 minutes 98.8Under 20 Minutes 100.0
Response Time Percentage (Cumulative) of
Response Times
Under 5 minutes 21.2Under 10 minutes 63.9Under 15 minutes 86.4Under 20 Minutes 100.0
Under 5 minutesUnder 10minutes Under 15
minutes Under 20Minutes
Atlantis
Metro0
102030405060708090
100
Fire Department Response Time
μ =Xi
i=1
N
∑N
=X1 + X2 + ...+ XN( )
N( )
nxxx
n
xx n
n
ii +++==
∑= ...211
Sample mean:Population mean:
The median is the middle number in an ordered dataset.
It can be found by ordering the values from lowest to highest and then finding the observation at (n+1)/2.
Mode?The most frequently occurring value in the dataset.
Journal of Statistics Education Volume 13, Number 2 (2005), www.amstat.org/publications/jse/v13n2/vonhippel.htmlCopyright © 2005 by Paul T. von Hippel
Classic illustration of the relationship between skew, mean, median and mode
The mean is not a resistant measure of center.
Mode
f
x
Mean
Median
Figure by MIT OCW.
Ruth BondsMean 659/15=43.93 703/19=37
Median 46 37
Mode 46 multiple
Bonds Ruth9 6 1
5 5 4 2 2 57 7 4 4 3 3 3 4 5
9 6 6 5 5 2 0 4 1 1 6 6 6 7 95 4 4 96 0
3 7
14.0 16.0 18.0 20.0 22.0 24.0 26.0
Percentage
0
2
4
6
8
10
Freq
uenc
y
Mean = 20.492Std. Dev. = 2.3883N = 50
Percentage of Adult Population who are Obese byU.S. State (2001)
Median = 20.3
Skewness = -.012
A normal distribution has a skew of 0 since it is symmetric.
Statistics
PctObese500
20.49220.300
19.0a
2.38835.704-.012.337
-.100.662
18.97520.30022.100
ValidMissing
N
MeanMedianModeStd. DeviationVarianceSkewnessStd. Error of SkewnessKurtosisStd. Error of Kurtosis
255075
Percentiles
Multiple modes exist. The smallest value is showna.
A normal distribution has kurtosis 0, since it is symmetric.
0.0 10.0 20.0 30.0 40.0 50.0
Percent Hispanic
0
5
10
15
20
Freq
uenc
y
Mean = 7.73Std. Dev. = 8.9125N = 50
Percent of Population of Hispanic Originby U.S. State (2000)
Median = 4.7
Skewness = 2.245
Kurtosis = 5.12
5.0 10.0 15.0 20.0 25.0
Percent with no Health Insurance
0
2
4
6
8Fr
eque
ncy
Mean = 13.288Std. Dev. = 3.6878N = 50
Percentage of Population without Health Insurance for One Year by U.S. State
(Three Year Average from 2000 to 2002)
Median = 13.05
Skewness = .732
Kurtosis = .262
Levels of Measurement and Measures of Central Tendency
Nominal Ordinal Interval
Mean
Median
Mode
Nominal Ordinal Interval
Mean X
Median X X
Mode X X X
Would you prefer to ride on this facility?Active Living Visual Preference Survey, Collingwood, Ontario - November 2005
-3
-2
-1
0
1
2
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Slide Number
Mea
n (N
o -3
to Y
es +
3)
0 2 4 6 8
Slide1
0
10
20
30
40
50
Freq
uenc
y
Mean = 3.31Std. Dev. = 1.895N = 184
Histogram
-4.00 -2.00 0.00 2.00 4.00
Slide40Rec
0
20
40
60
80
100
120
Freq
uenc
yMean = -1.30Std. Dev. = 2.33003N = 180
Histogram
0 10 20 30 40 50 60
Slide19
0
20
40
60
80Fr
eque
ncy
Mean = 3.97Std. Dev. = 4.097N = 186
Slide19
The Five Number SummaryMinimum, Q1, Median, Q3, Maximum
To calculate quartiles:
• Arrange the observations in increasing order and locate the median M in the ordered list of observations.
• The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.
• The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.
0 1male
0.00
25000.00
50000.00
75000.00
100000.00
125000.00
incw
s
40
The ends of the box (hinges) are at the quartiles, so that the length of the box is the IQR.
The median is marked by a line within the box.
The two vertical lines (called whiskers) outside the box extend to the smallest and largest observations within 1.5 X IQR of the quartiles.
Observations that fall outside of 3 X IQR are called extreme outliers and are marked, for example, with an open circle.
Observations between 1.5 X IQR and 3 X IQR are called mild outliers and are distinguished by a different mark, e.g., a closed circle.
Variance and Standard Deviation
Measures the average distance of the observations from their mean.
Variance
Standard Deviation1
)(...)()( 222
212
−−++−+−
=n
xxxxxxs n
1)( 2
−−∑
=n
xxs i
Visitors
1792 1762 - 1600 = 192 1922 = 36,8641666 1666 - 1600 = 66 662 = 4,3561362 1362 - 1600 = -238 -2382 = 56,6441614 1614 - 1600 = 14 142 = 1961460 1460 - 1600 = -140 -1402 = 19,6001867 1867 - 1600 = 267 2672 = 71,2891439 1439 - 1600 = -161 -1612 = 25,921
Sum = 0 Sum = 214,870
xxi −2)( xxi −
67.811,356870,2142 ==s
24.18967.811,35 ==s
14.0 16.0 18.0 20.0 22.0 24.0 26.0
Percentage
0
2
4
6
8
10
Freq
uenc
y
Mean = 20.492Std. Dev. = 2.3883N = 50
Percentage of Adult Population who are Obese byU.S. State (2001)
Variable 1 Variable 29.14 6.588.14 5.768.74 7.718.77 8.849.26 8.478.10 7.046.13 5.253.10 5.569.13 7.917.26 6.894.74 12.50
3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
Variable 1
0
1
2
3
4
Freq
uenc
y
Mean = 7.5009Std. Dev. = 2.03166N = 11
Histogram
4.00 6.00 8.00 10.00 12.00 14.00
Variable 2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Freq
uenc
y
Mean = 7.5009Std. Dev. = 2.03058N = 11
Histogram