Upload
marilyn-quinn
View
223
Download
2
Embed Size (px)
Citation preview
Measures of Central TendencyMeasures of Central Tendency
oror
Measures of LocationMeasures of Location
oror
Measures of AveragesMeasures of Averages
Descriptive StatisticsDescriptive Statistics
The goal of descriptive statistics is to The goal of descriptive statistics is to summarize a collection of data in a clear and summarize a collection of data in a clear and understandable way.understandable way. What is the pattern of scores over the range of What is the pattern of scores over the range of
possible values?possible values? Where, on the scale of possible scores, is a point Where, on the scale of possible scores, is a point
that best represents the set of scores?that best represents the set of scores? Do the scores cluster about their central point or do Do the scores cluster about their central point or do
they spread out around it?they spread out around it?
Central TendencyCentral Tendency
Measure of Central Tendency:Measure of Central Tendency: A single summary score that best describes the A single summary score that best describes the
central location of an entire distribution of scores.central location of an entire distribution of scores. The typical score.The typical score. The center of the distribution.The center of the distribution.
One distribution can have multiple locations where One distribution can have multiple locations where scores cluster.scores cluster. Must decide which measure is best for a given situation.Must decide which measure is best for a given situation.
Central TendencyCentral Tendency
Measures of Central Tendency:Measures of Central Tendency: MeanMean
The sum of all scores divided by the number of The sum of all scores divided by the number of scores.scores.
MedianMedian The value that divides the distribution in half The value that divides the distribution in half
when observations are ordered.when observations are ordered. ModeMode
The most frequent score.The most frequent score.
Measure of central Measure of central tendencytendency
N
N
1 iix
n X
n
1 iix
Population Sample
Arithmetic Mean (Mean)
Definition:Sum of all the observation s divided by the number of the observations
The arithmetic mean is the most common measure of the central location of a sample.
MeanMean
PopulationPopulation
SampleSample
N
X
n
XX
“mu”
“X bar”
“sigma”, the sum of X, add up all scores
“n”, the total number of scores in a sample
“N”, the total number of scores in a population
“sigma”, the sum of X, add up all scores
Mean: Example
Data: {1,3,6,7,2,3,5}
• number of observations: 7•Sum of observations: 27•Mean: 3.9
Simple Frequency Simple Frequency DistributionsDistributions
namename XX
Student1Student1 2020
Student2Student2 2323
Student3Student3 1515
Student4Student4 2121
Student5Student5 1515
Student6Student6 2121
Student7Student7 1515
Student8Student8 2020
raw-score distribution frequency distribution
ff XX
33 1515
22 2020
22 2121
11 2323
f
NMean
MeanMean
Is the balance point of a distribution.Is the balance point of a distribution. The sum of negative deviations from the mean The sum of negative deviations from the mean
exactly equals the sum of positive deviations exactly equals the sum of positive deviations from the mean.from the mean.
Pros and Cons of the Pros and Cons of the MeanMean
ProsPros Mathematical center of a Mathematical center of a
distribution.distribution. Good for interval and ratio Good for interval and ratio
data.data. Does not ignore any Does not ignore any
information.information. Inferential statistics is Inferential statistics is
based on mathematical based on mathematical properties of the mean.properties of the mean.
ConsCons Influenced by extreme Influenced by extreme
scores and skewed scores and skewed distributions.distributions.
May not exist in the data.May not exist in the data.
Some Important Properties of the MeanSome Important Properties of the Mean
Interval-Ratio Level of MeasurementInterval-Ratio Level of Measurement
Center of Gravity(the mean balances Center of Gravity(the mean balances
all the scores). all the scores).
Sensitivity to ExtremesSensitivity to Extremes
Median Definition: The value that is larger than half the population and smaller than half the population n is odd: the median score 5, 8, 9, 10, 28 median = 9
n is even: the th score
6, 17, 19, 20, 21, 27 median = 19.5
n+12
Pros and Cons of MedianPros and Cons of Median
ProsPros Not influenced by Not influenced by
extreme scores or extreme scores or skewed distributions.skewed distributions.
Good with ordinal Good with ordinal data.data.
Easier to compute Easier to compute than the mean.than the mean.
ConsCons May not exist in the May not exist in the
data.data. Doesn’t take actual Doesn’t take actual
values into account.values into account.
ModeModeMost frequently occurring Most frequently occurring valuevalue
Data {1,3,7,3,2,3,6,7}• Mode : 3
Data {1,3,7,3,2,3,6,7,1,1}• Mode : 1,3
Data {1,3,7,0,2,-3, 6,5,-1}• Mode : none
Central Tendency Central Tendency Example: ModeExample: Mode
52, 76, 100, 136, 186, 196, 205, 150, 52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891750, 791, 891
Mode: most frequent observationMode: most frequent observation Mode(s) for hotel rates:Mode(s) for hotel rates:
264, 317, 384264, 317, 384
Pros and Cons of the Pros and Cons of the ModeMode
ProsPros Good for nominal Good for nominal
data.data. Easiest to compute Easiest to compute
and understand.and understand. The score comes The score comes
from the data set.from the data set.
ConsCons Ignores most of the Ignores most of the
information in a information in a distribution.distribution.
Small samples may Small samples may not have a mode.not have a mode.
Example: Central LocationExample: Central Location
Suppose the age in years of the first 10 subjects enrolled in your study are:
34, 24, 56, 52, 21, 44, 64, 34, 42, 46
Then the mean age of this group is 41.7 years
To find the median, first order the data:21, 24, 34, 34, 42, 44, 46, 52, 56, 64
The median is 42 +44 = 43 years
2The mode is 34 years.
Comparison of Mean and Median
• Mean is sensitive to a few very large (or small) values “outliers” so sometime mean does not reflect the quantity desired.
• Median is “resistant” to outliers
• Mean is attractive mathematically50% of sample is above the median, 50% of sample is below the median.
Suppose the next patient enrolls and their age is 97 years.How does the mean and median change?
To get the median, order the data:21, 24, 34, 34, 42, 44, 46, 52, 56, 64, 97
If the age were recorded incorrectly as 977 instead of 97, what would the new median be? What would the new mean be?
Calculating the Mean Calculating the Mean from a Frequency from a Frequency DistributionDistribution
# of Children(Y)01234567
Total
Frequency(f)1225
733333183261512
1339
Frequency*Y(fY)0
2514669997321309084
3526
6.21339
3526
N
fYY
MEASURES OF MEASURES OF Central Central TendencyTendency
Geometric MeanGeometric Mean
& Harmonic Mean& Harmonic Mean
The Shape of The Shape of DistributionsDistributions
Distributions can be either Distributions can be either symmetricalsymmetrical or or skewedskewed, depending on , depending on whether there are more frequencies at whether there are more frequencies at one end of the distribution than the one end of the distribution than the other.other.
?
SymmetricalSymmetricalDistributionsDistributions
A distribution is symmetrical if the A distribution is symmetrical if the frequencies at the right and left tails of frequencies at the right and left tails of the distribution are identical, so that if the distribution are identical, so that if it is divided into two halves, each will it is divided into two halves, each will be the mirror image of the other. be the mirror image of the other.
In a symmetrical distribution the In a symmetrical distribution the mean, median, and mode are identical.mean, median, and mode are identical.
Almost Symmetrical distributionAlmost Symmetrical distribution
Mean=13.4
Mode=13.0
HIGHEST YEAR OF SCHOOL COMPLETED
20.017.515.012.510.07.55.02.50.0
HIGHEST YEAR OF SCHOOL COMPLETED
Fre
qu
en
cy
400
300
200
100
0
Std. Dev = 2.97
Mean = 13.4
N = 975.00
Skewed Distribution:Skewed Distribution:
Skewed DistributinSkewed DistributinFFew extreme values on one side of the distribution ew extreme values on one side of the distribution or on the other.or on the other. Positively skewedPositively skewed distributions: distributions:
distributions which have few distributions which have few extremely high values extremely high values (Mean>Median)(Mean>Median)
Negatively skewed distributions: Negatively skewed distributions:
distributions which have few distributions which have few
extremely low values(Mean<Median)extremely low values(Mean<Median)
Positively Skewed Positively Skewed DistributionDistribution
GOVT INVESTIGATE WORKERS ILLEGAL DRUG USE
4.03.02.01.0
GOVT INVESTIGATE WORKERS ILLEGAL DRUG USE
Fre
qu
en
cy
500
400
300
200
100
0
Std. Dev = .39
Mean = 1.1
N = 474.00
Mean=1.13
Median=1.0
Negatively Skewed Negatively Skewed distributiondistribution
FAVOR PREFERENCE IN HIRING BLACKS
4.03.02.01.0
FAVOR PREFERENCE IN HIRING BLACKS
Fre
qu
en
cy
600
500
400
300
200
100
0
Std. Dev = .98
Mean = 3.3
N = 908.00
Mean=3.3
Median=4.0
Mean, Median and Mean, Median and ModeMode
DistributionsDistributions
Bell-Shaped (also Bell-Shaped (also known as symmetric” or known as symmetric” or “normal”)“normal”)
Skewed:Skewed: positively (skewed to the positively (skewed to the
right) – it tails off toward right) – it tails off toward larger valueslarger values
negatively (skewed to negatively (skewed to the left) – it tails off the left) – it tails off toward smaller values toward smaller values
Choosing a Measure of Central Choosing a Measure of Central TendencyTendency
IF variable is Nominal..IF variable is Nominal.. ModeMode IF variable is Ordinal...IF variable is Ordinal... Mode or Median(or both)Mode or Median(or both) IF variable is Interval-Ratio and distribution is IF variable is Interval-Ratio and distribution is
Symmetrical…Symmetrical… Mode, Median or Mean Mode, Median or Mean IF variable is Interval-Ratio and distribution is IF variable is Interval-Ratio and distribution is
Skewed…Skewed… Mode or MedianMode or Median
EXAMPLE:EXAMPLE:
(1) 7,8,9,10,11 n=5, x=45, =45/5=9(1) 7,8,9,10,11 n=5, x=45, =45/5=9
(2) 3,4,9,12,15 n=5, x=45, =45/5=9(2) 3,4,9,12,15 n=5, x=45, =45/5=9
(3) 1,5,9,13,17 n=5, x=45, =45/5=9(3) 1,5,9,13,17 n=5, x=45, =45/5=9
S.D. : (1) 1.58 (2) 4.74 (3) 6.32 S.D. : (1) 1.58 (2) 4.74 (3) 6.32
x
x
x