View
232
Download
0
Category
Preview:
Citation preview
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 1/9
1
Math 103Statistics andProbability
Central Tendency and Spread
CJD
Characteristics of Data
Center: A representative or average value thatindicates where the middle of the data set is
located
Variation: A measure of the amount that the
values vary among themselves
Distribution: The nature or shape of thedistribution of data (such as bell-shaped, uniform,
or skewed) Outliers: Sample values that lie very far away
from the vast majority of other sample values
CJD
Measures of Center
a value at the center or middle of a data set
Notation :
ΣΣΣΣ denotes the addition of a set of values
x is the variable usually used to represent theindividual data values
n represents the number of data values in a sample
N represents the number of data values in apopulation
CJD
Mean
Mean (Arithmetic Mean) AVERAGE
the number obtained by adding the values anddividing the total by the number of values
µ is pronounced ‘myu’ and denotes the mean of all values
in a population
is pronounced ‘x-bar’ and denotes the mean of a set
of sample values
Calculators can calculate the mean of data
x = nΣΣΣΣ x
x
N µ =
ΣΣΣΣ x
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 2/9
2
CJD
Mean
6.72 3 .46 3.60 6.44 26.70
Example: Find the mean of the following weights (in kg)
of sample carry-on luggages presented at an airport
check-in counter in the last hour.
Solution :
Sum of all weights = 6.72 + 3.46 + 3.60 + 6.44 + 26.70 = 46.92
Number of weights = 5
Mean = 46.92 / 5 = 9.384 kg.
Notice the impact of the outlier 26.70 on the mean.
CJD
often denoted by (pronounced ‘x-tilde’)
or by (pronounced ‘myu-tilde’)
is not affected by an extreme value
Median
x~
Medianthe middle value when the original data values are
arranged in order of increasing magnitude
~
CJD
Median
6.72 3.46 3.60 6.44 26.70
3.46 3.60 6.44 6.72 26.70
(odd number of values)
exact middle
MEDIAN is 6.44
6.72 3 .46 3.60 6.44
3.46 3.60 6.44 6.72
no exact middle -- shared by two numbers3.60 + 6.44
2
(even number of values)
MEDIAN is 5.02
unsorted
sorted
unsorted
sorted
CJD
Mode
Mode
- the score that occurs most frequently- Unimodal, Bimodal, Multimodal or No Mode
- denoted by M- the only measure of central tendency that can
be used with nominal data
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 3 6 7 8 9 10
Mode is 5
Bimodal - 2 and 6
No Mode
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 3/9
3
CJD
Qualitative Data
376PajeroMitsubishi
581LancerMitsubishi
1,243JeepneySarao
459CRVHonda
960CityHonda
732CivicHonda
417InnovaToyota
725AltisToyota
104PriusToyota
1,098ViosToyota
Units SoldModelMaker
Mode: Sarao JeepneyCJD
Comparison
• can be used
also for nominaldata
• hardly requiresany calculation if
data is sorted
• may not exist ormay not be
unique
• not useful for
small n
• Second most
useful
• not affected by
outliers – givestruer average
• Easy to compute
if data is sorted orn is small
• varies greatlyfrom sample to
sample
• most useful
• easiest tocompute for large n
• uses all data
• does not vary
much from sampleto sample
• distribution ofmeans is well
known
• affected byoutliers
ModeMedianMean
CJD
Weighted Mean
x =w
ΣΣΣΣ (w • x)
ΣΣΣΣ
Each individual value x may have
a weight w associated with it.
Example: A talent show is judged 40% execution,30% difficulty, 20% originality and 10% audience impact.
If a contestant scored 8,9,6 and 7, the weighted mean is
= 3.2 + 2.7 + 1.2 + 0.7 = 7.8
CJD
Raw Data
6074745872
5882522672
6666609278
4638506650
6264686284
5466664460
8470767266
7064524078
7642506448
6440825474
Raw Data – Test Scores in a Statistics Test
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 4/94
CJD
Sorted Data
9274666050
8474666050
8472666048
8272665846
8272645844
7870645442
7870645440
7668645240
7666625238
7466625026
66
64~
7.62
=
=
=
M
µ
µ
Applying the formulas, (and using a calculator) we get …
CJD
Measures of Spread or Variation
Range
Mean Deviation
Variance
Standard Deviation
CJD
Range and Midrange
9274666050
8474666050
8472666048
8272665846
8272645844
7870645442
7870645440
7668645240
7666625238
7466625026
Range = Highest Value – Lowest Value
In Example:
Range = 92 – 26 = 66
(a measure of spread)
Mid-Range =(92+26)/2 = 59
(a measure of center)
Mid-Range = (Highest + Lowest) / 2
CJD
Mean Deviation
n
x x∑ −
N
x∑ − µ
Mean Dev of a Sample Mean Dev of a Population
6.72 3 .46 3.60 6.44 26.70
Example: Weights of Carry-on Luggages
Mean = 9.384 Range = 26.70 – 3.46 = 23.24
Mean Deviation =
926.65
632.34
5
384.970.26384.944.6384.960.3384.946.3384.972.6==
−+−+−+−+−
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 5/95
CJD
Variance
N
x
N
i
i∑=
−
= 1
2
2
)( µ
σ
1
)(1
2
2
−
−
=
∑=
n
x x
s
n
i
i
Population Variance Sample Variance
Computing Formula for Variance
)1(
1
2
1
2
2
−
−
=
∑ ∑= =
nn
x xn
s
n
i
n
i
ii
Using n-1 will reduce the biasUsing n will underestimate variance
2
1
2
1
2
2
N
x x N N
i
N
i
ii∑ ∑= =
−
=σ
CJD
Variance Example
824.45446.92
712.89026.70
41.4746.44
12.9603.60
11.9723.46
45.1586.72
i x
2
i x
6.72 3 .46 3.60 6.44 26.70
Example: Weights of sample Carry-on Luggages
∑
039.96)4(5
)92.46()454.824(5 2
2=
−=s
CJD
Standard Deviation
2σ σ =
2ss =
Population SD Sample SD
In example,
800.9039.96 ==s
CJD
Symbols for Standard Deviation
Sample Population
σ σσ σ
σσσσ x
xσ σσ σ n
s
Sx
xσ σσ σ n-1
Textbook
Some graphicscalculators
Somenon-graphicscalculators
Textbook
Some graphicscalculators
Somenon-graphics
calculators
Excel variance Excel variancevar varp
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 6/96
CJD
Comparison
• Easiest to use
and interpret
• Not useful for
large n
• Says nothing
about
distribution ofdata between
max and min
• Considers all data
relative to themean
• Simple butawkward to
compute
• most useful
• Considers all datarelative to the mean
• Computation moreinvolved
• interpretation notstraight-forward
RangeMean DeviationVariance / SD
CJD
Example : The prelim exam grades of a Statistics class
and a Calculus class are summarized below:
Coefficient of Variation
%100×= µ
σ CV
31.50%17.856.5Calculus32.26%22.068.2Statistics
Coefficientof Variation
StandardDeviation
MeanSubject
To compare spreads of samples/populations with different
means
Therefore, the statistics grades are relatively only slightly
more variable than the calculus grades.
%100×= x
sCV
CJD
z scores
z Score (or standard score)
- A measure of position relative to other data
- the number of standard deviations that a givenvalue x is above or below the mean
Sample
z = x - x
s
Population
z = x - µσ σσ σ
Round to 2 decimal places
CJD
Example
A student scored 67 in a calculus test and 74 in astatistics test. If the calculus test has a mean of 53 with
SD of 8, and the statistics test has a mean of 65 with SDof 6, did the student fare better relative to his classmates
in calculus or in statistics ?
Calculus: z = (67-53)/8 = 1.75
Statistics: z = (74-65)/6 = 1.50
Conclusion: The student fared better in Calculus
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 7/97
CJD
Interpreting z scores
- 3 - 2 - 1 0 1 2 3Z
UnusualValues
UnusualValues
OrdinaryValues
CJD
Measures of Location or Position
Percentiles – 100 parts (in 1%)
Deciles – 10 parts (in 10%)
Quartiles – 4 parts (in 25%)
Fractiles or Quantiles
CJD
Percentiles
Percentiles
P1, P
2, P
3, …, P
98, P
99
i % of the data falls below (<=) Pi
CJD
Deciles
D1, D2, D3, D4, D5, D6, D7, D8, D9
divides ranked data into ten equal parts
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
i*10% of the data falls below (<=) Di
D9 is the 90th Percentile or P90
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 8/98
CJD
Quartiles
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%
Q3Q2Q1(minimum) (maximum)
(median)
i*25% of the data falls below (<=) Qi
Q50 is the 50th percentile P50 or 5th decile D5
CJD
Example
9274666050
8474666050
8472666048
8272665846
8272645844
7870645442
7870645440
7668645240
7666625238
7466625026P5 =P4 =
P94=
P99=
D4 =
D9 =Q2 =
Q3 =
40 (5/100)*50 rounds up to 3rd
39 (4/100)*50=2 :get mid 38&40
83 (get mid 47th and 48th)
92 (99/100)*50 rds up to 50th
61 (4/10)*50 :get mid 20th&21st
80(9/10)*50 :get mid 45th&46th
64 (2/4)*50 :get mid 25th&26th
72 (3/4)*50 rounds up to 38th
CJD
Decile of score x = • 10
Quartile of score x = • 4
Quantile of a Score
Percentile of score x = • 100number of scores <= x
total number of scores
number of scores <= x
total number of scores
number of scores <= x
total number of scores
If result is not an integer, Round up to the next higher integer
Example: 32 of 50 test scores are <= 66.
32/50*100=64, 32/50*10=6.4, 32/50*4=2.56
So test score 66 is in P64, D7 and Q3
To improve estimate, include only half of other scores equal to x in the numerator.
CJD
Interquartile and Percentile Range
Percentile Range = P90 – P10
9274666050
8474666050
8472666048
8272665846
8272645844
7870645442
78706454407668645240
7666625238
7466625026 Q3 = 72
Q1 = 52IQR = 20
Range = 92 – 26 = 66
P10 = 43P90 = 80P10 to P90 Range
= 80-43 = 37
Interquartile Range (or IQR) = Q3 – Q1
7/27/2019 Math 103 02 Central Tendency and Spread_1
http://slidepdf.com/reader/full/math-103-02-central-tendency-and-spread1 9/99
CJD
Boxplots
Simple graph to indicate Median, IQR and Outliers
Also known as the box and whisker plot
9274666050
8474666050
8472666048
8272665846
8272645844
7870645442
7870645440
7668645240
7666625238
7466625026
26 92
Q1=52 Q2=64 Q3=72 IQR=20
Variation:Whiskers extend only to 1.5*IQR below
Q1 and 1.5*IQR above Q3. Outside data
are marked with circles to mark outliers.
Basic Boxplot:Q1 Q2 Q3
CJD
End
Recommended