Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Mathematical Statistics
Anna Janicka
Lecture II, 24.02.2020
DESCRIPTIVE STATISTICS, PART II
Plan for today
1. Descriptive Statistics, part II:
mode
quantiles
measures of variability
measures of asymmetry
the boxplot
Measures of central tendency – reminder
Classic:
arithmetic mean
Position (order, rank):
median
mode
quartile
Example 1 – cont.
Mode – examples
Quartiles – example
Variance – example
Grade Number Frequency
2 74 29.84%
3 76 30.65%
3.5 48 19.35%
4 31 12.50%
4.5 9 3.63%
5 10 4.03%
Total 248 100%
Example 3 – cont.
Interval Class
mark Number Frequency
Cumulative
number cni
Cumulative
frequency cfi
(30,40] 35 11 0,11 11 0,11
(40,50] 45 23 0,23 34 0,34
(50,60] 55 33 0,33 67 0,67
(60,70] 65 12 0,12 79 0,79
(70,80] 75 6 0,06 85 0,85
(80,90] 85 8 0,08 93 0,93
(90,100] 95 3 0,03 96 0,96
(100,110] 105 2 0,02 98 0,98
(110,120] 115 2 0,02 100 1
Total 100 1 Mode – example
Quartiles – example
Variance – example
Mode
Mode
the value that appears most often
for grouped data:
Mo = most frequent value
for grouped class interval data:
where
nMo – number of elements in mode’s class,
cL, b – analogous to the median
bnnnn
nncMo
MoMoMoMo
MoMoL
)()( 11
1
Mode – examples
Example 1:
Mo = 3
Example 3:
the mode’s interval is (50,60], with 33 elements
nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12
23.5310)1233()2333(
233350
Mo
Example 1 –
cont.
Example 3 –
cont.
Which measure should we choose?
Arithmetic mean: for typical data series
(single max, monotonous frequencies)
Mode: for typical data series, grouped data
(the lengths of the mode’s class and
neighboring classes should be equal)
Median: no restrictions. The most robust (in
case of outlier observations, fluctuations
etc.)
Quantiles, quartiles
p-th quantile (quantile of rank p): number
such that the fraction of observations less
than or equal to it is at least p, and values
greater than or equal to it at least 1-p
Q1 : first quartile = quantile of rank ¼
Second quartile = median
= quantile of rank ½
Q3: Third quartile = quantile of rank ¾
Quantiles – cont.
Empirical quantile of rank p:
ZnpX
ZnpXX
Q
nnp
nnpnnp
p
:1][
:1:
2
Quartiles – cont.
Quantiles for p = ¼ and p = ¾.
For grouped class interval data –
analogous to the median
for k=1 or 3
where M1, M3 – number of the quartile’s class
b – length of quartile class interval
cL – lower end of the quartile class interval
1
14
k
k
M
i
i
M
Lk nnk
n
bcQ
Quartiles – examples
Example 1:
so
Example 3:
so
75100 25100 43
41
4M ,2 31 M
67,66)6775(12
106009,46)1125(
23
1040 31 QQ
Example1 –
cont.
Example 3 –
cont.
18624862248 41 4
3
5322321
2481872481862486324862 ., :::: XXXX
Q Q
Variability measures
Classical measures
variance, standard deviation
average (absolute) deviation
coefficient of variation
Measures based on order statistics
range
interquartile range
quartile deviation
coefficients of variation (based on order stats)
median absolute deviation
Measures based on order statistics
Range
the most simple measure, does not take into
account anything but the extreme values
Inter Quartile Range (midspread, middle fifty)
more robust than the range
nnn XXr :1:
13 QQIQR
may be further used to calculate quartile deviation Q= IQR/2, and
coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile
variation coefficient) or the typical range: [Med – Q, Med + Q]
length of the interval that covers the
middle 50% observations
Range, interquartile range – examples
Example 1:
Example 3:
(in reality
5.125.3
,325
IQR
r
58,2009,4667,66
)45,8645329118
9030120
IQR
,-,
r
Classical measures of dispersion
Variance
raw data
grouped data
grouped class interval data
+ Sheppard’s correction
in general
2
1
21
1
212 )()(ˆ
n
i
in
n
i
inXXXXS
2
1
21
1
212 )()(ˆ
k
i
iin
k
i
iinXXnXXnS
2
1
21
1
212 )()(ˆ
k
i
iin
k
i
iinXcnXcnS
12
22 2ˆ cSS
c=length of
class
interval (for
equal
intervals)
2
1
112122 )(ˆ
k
i
iiinccnSS
Variance – examples
Example 1:
Example 3:
in reality
1006359063543106344806353760633740632 222222
2481 ).()..().()..().().(
710
2
.
ˆ
S
98.32212
1031.331
31.331
ˆ
22
10012
S
S
)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(
12)7.5865(33)7.5855(23)7.5845(11)7.5835((
22222
2222
85.333ˆ2 S
Example 1 – cont.
Example 3 – cont.
distrubution not normal or
sample too small for
Sheppard’s correction –
larger errors from small
sample size than from class
grouping.
Standard deviation
In the same units as the initial variable
Example 1:
Example 3:
22 ,ˆˆ SSSS
[grade] S 840.ˆ
][m 22.18ˆ S
Average (absolute) deviation, mean deviation
Nowadays seldom used. Simple
calculations.
for raw data
etc...
We have: d<S
n
i
inXXd
1
1 ||
Coefficient of variation (classical)
For comparisons of the same varaible
accross populations or different
variables for the same population
%)100( or
%),100(ˆ
X
dV
X
SV
d
S
Skewness (asymmetry)
left symmetry right
(negative) (zero) (positive)
(typical order)
MoMedX MoMedX MoMedX
Measures of asymmetry
Skewness
where M3 is the third
central moment
Skewness coefficient
Quartile skewness coefficient
3
3
S
MA
ˆ
or ˆ 11
S
MedXA
S
MoXA
13
132
2
QMedQA
measures skewness for the
middle 50% observations only
Interpretation
positive values= positive asymmetry (right
skewed distribution)
negative values = negative asymmetry
(left skewed distribution)
For the skewness coefficient (with the
median) and the quartile skewness
coefficient the strength of asymmetry
(absolute value):
0 – 0.33: weak
0.34 – 0.66: medium
0.67 – 1: strong
Asymmetry – examples
Example 1:
Example 3:
15.009.4667.66
09.4685.54267.66
24.02.18
85.547.583.0
2.18
23.537.58
,15.1
2
11
A
)( A or )( A
A
MedMo
3
1
253
23253
070840
3063
070840
3063
280
2
1
1
.
.
..
.
..
.
.
A
)Mo( A
)Med( A
A
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
25 35 45 55 65 75 85 95 105 115 125
Fre
qu
en
cy
Surface area
0%
5%
10%
15%
20%
25%
30%
35%
2 3 3.5 4 4.5 5
Fre
qu
en
cy
Grade
0%
5%
10%
15%
20%
25%
30%
35%
2 2.5 3 3.5 4 4.5 5
Fre
qu
en
cy
Grade
Boxplot
(Box and whisker plot)
Allows to compare two (or more)
populations
outliers:
xmax
outliers
X*
Q3
Med
Q1
X*
outliers
xmin
]},[:max{
]},[:min{
23
33
123
1
IQRQQXXX
QIQRQXXX
ii
ii
XxXx or
Boxpolot – example of comparison
05
10
15
1 2
Examples (1)
Source: CSO Poland, 2009
Examples (2)
Source: European
Commission
Dispersion (coeffcient of variability) of unemployment
rates
Examples (3)
Growth charts
Source: WHO
Examples (4)
Gross hourly earinings
Source: European Commission
Examples(5)
Salary by occupational group and gender
Source: New Zealand State Services