32
Mathematical Statistics Anna Janicka Lecture II, 24.02.2020 DESCRIPTIVE STATISTICS, PART II

Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Mathematical Statistics

Anna Janicka

Lecture II, 24.02.2020

DESCRIPTIVE STATISTICS, PART II

Page 2: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Plan for today

1. Descriptive Statistics, part II:

mode

quantiles

measures of variability

measures of asymmetry

the boxplot

Page 3: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Measures of central tendency – reminder

Classic:

arithmetic mean

Position (order, rank):

median

mode

quartile

Page 4: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Example 1 – cont.

Mode – examples

Quartiles – example

Variance – example

Grade Number Frequency

2 74 29.84%

3 76 30.65%

3.5 48 19.35%

4 31 12.50%

4.5 9 3.63%

5 10 4.03%

Total 248 100%

Page 5: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Example 3 – cont.

Interval Class

mark Number Frequency

Cumulative

number cni

Cumulative

frequency cfi

(30,40] 35 11 0,11 11 0,11

(40,50] 45 23 0,23 34 0,34

(50,60] 55 33 0,33 67 0,67

(60,70] 65 12 0,12 79 0,79

(70,80] 75 6 0,06 85 0,85

(80,90] 85 8 0,08 93 0,93

(90,100] 95 3 0,03 96 0,96

(100,110] 105 2 0,02 98 0,98

(110,120] 115 2 0,02 100 1

Total 100 1 Mode – example

Quartiles – example

Variance – example

Page 6: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Mode

Mode

the value that appears most often

for grouped data:

Mo = most frequent value

for grouped class interval data:

where

nMo – number of elements in mode’s class,

cL, b – analogous to the median

bnnnn

nncMo

MoMoMoMo

MoMoL

)()( 11

1

Page 7: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Mode – examples

Example 1:

Mo = 3

Example 3:

the mode’s interval is (50,60], with 33 elements

nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12

23.5310)1233()2333(

233350

Mo

Example 1 –

cont.

Example 3 –

cont.

Page 8: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Which measure should we choose?

Arithmetic mean: for typical data series

(single max, monotonous frequencies)

Mode: for typical data series, grouped data

(the lengths of the mode’s class and

neighboring classes should be equal)

Median: no restrictions. The most robust (in

case of outlier observations, fluctuations

etc.)

Page 9: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Quantiles, quartiles

p-th quantile (quantile of rank p): number

such that the fraction of observations less

than or equal to it is at least p, and values

greater than or equal to it at least 1-p

Q1 : first quartile = quantile of rank ¼

Second quartile = median

= quantile of rank ½

Q3: Third quartile = quantile of rank ¾

Page 10: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Quantiles – cont.

Empirical quantile of rank p:

ZnpX

ZnpXX

Q

nnp

nnpnnp

p

:1][

:1:

2

Page 11: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Quartiles – cont.

Quantiles for p = ¼ and p = ¾.

For grouped class interval data –

analogous to the median

for k=1 or 3

where M1, M3 – number of the quartile’s class

b – length of quartile class interval

cL – lower end of the quartile class interval

1

14

k

k

M

i

i

M

Lk nnk

n

bcQ

Page 12: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Quartiles – examples

Example 1:

so

Example 3:

so

75100 25100 43

41

4M ,2 31 M

67,66)6775(12

106009,46)1125(

23

1040 31 QQ

Example1 –

cont.

Example 3 –

cont.

18624862248 41 4

3

5322321

2481872481862486324862 ., :::: XXXX

Q Q

Page 13: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Variability measures

Classical measures

variance, standard deviation

average (absolute) deviation

coefficient of variation

Measures based on order statistics

range

interquartile range

quartile deviation

coefficients of variation (based on order stats)

median absolute deviation

Page 14: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Measures based on order statistics

Range

the most simple measure, does not take into

account anything but the extreme values

Inter Quartile Range (midspread, middle fifty)

more robust than the range

nnn XXr :1:

13 QQIQR

may be further used to calculate quartile deviation Q= IQR/2, and

coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile

variation coefficient) or the typical range: [Med – Q, Med + Q]

length of the interval that covers the

middle 50% observations

Page 15: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Range, interquartile range – examples

Example 1:

Example 3:

(in reality

5.125.3

,325

IQR

r

58,2009,4667,66

)45,8645329118

9030120

IQR

,-,

r

Page 16: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Classical measures of dispersion

Variance

raw data

grouped data

grouped class interval data

+ Sheppard’s correction

in general

2

1

21

1

212 )()(ˆ

n

i

in

n

i

inXXXXS

2

1

21

1

212 )()(ˆ

k

i

iin

k

i

iinXXnXXnS

2

1

21

1

212 )()(ˆ

k

i

iin

k

i

iinXcnXcnS

12

22 2ˆ cSS

c=length of

class

interval (for

equal

intervals)

2

1

112122 )(ˆ

k

i

iiinccnSS

Page 17: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Variance – examples

Example 1:

Example 3:

in reality

1006359063543106344806353760633740632 222222

2481 ).()..().()..().().(

710

2

.

ˆ

S

98.32212

1031.331

31.331

ˆ

22

10012

S

S

)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(

12)7.5865(33)7.5855(23)7.5845(11)7.5835((

22222

2222

85.333ˆ2 S

Example 1 – cont.

Example 3 – cont.

distrubution not normal or

sample too small for

Sheppard’s correction –

larger errors from small

sample size than from class

grouping.

Page 18: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Standard deviation

In the same units as the initial variable

Example 1:

Example 3:

22 ,ˆˆ SSSS

[grade] S 840.ˆ

][m 22.18ˆ S

Page 19: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Average (absolute) deviation, mean deviation

Nowadays seldom used. Simple

calculations.

for raw data

etc...

We have: d<S

n

i

inXXd

1

1 ||

Page 20: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Coefficient of variation (classical)

For comparisons of the same varaible

accross populations or different

variables for the same population

%)100( or

%),100(ˆ

X

dV

X

SV

d

S

Page 21: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Skewness (asymmetry)

left symmetry right

(negative) (zero) (positive)

(typical order)

MoMedX MoMedX MoMedX

Page 22: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Measures of asymmetry

Skewness

where M3 is the third

central moment

Skewness coefficient

Quartile skewness coefficient

3

3

S

MA

ˆ

or ˆ 11

S

MedXA

S

MoXA

13

132

2

QQ

QMedQA

measures skewness for the

middle 50% observations only

Page 23: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Interpretation

positive values= positive asymmetry (right

skewed distribution)

negative values = negative asymmetry

(left skewed distribution)

For the skewness coefficient (with the

median) and the quartile skewness

coefficient the strength of asymmetry

(absolute value):

0 – 0.33: weak

0.34 – 0.66: medium

0.67 – 1: strong

Page 24: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Asymmetry – examples

Example 1:

Example 3:

15.009.4667.66

09.4685.54267.66

24.02.18

85.547.583.0

2.18

23.537.58

,15.1

2

11

A

)( A or )( A

A

MedMo

3

1

253

23253

070840

3063

070840

3063

280

2

1

1

.

.

..

.

..

.

.

A

)Mo( A

)Med( A

A

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

25 35 45 55 65 75 85 95 105 115 125

Fre

qu

en

cy

Surface area

0%

5%

10%

15%

20%

25%

30%

35%

2 3 3.5 4 4.5 5

Fre

qu

en

cy

Grade

0%

5%

10%

15%

20%

25%

30%

35%

2 2.5 3 3.5 4 4.5 5

Fre

qu

en

cy

Grade

Page 25: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Boxplot

(Box and whisker plot)

Allows to compare two (or more)

populations

outliers:

xmax

outliers

X*

Q3

Med

Q1

X*

outliers

xmin

]},[:max{

]},[:min{

23

33

123

1

IQRQQXXX

QIQRQXXX

ii

ii

XxXx or

Page 26: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Boxpolot – example of comparison

05

10

15

1 2

Page 27: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Examples (1)

Source: CSO Poland, 2009

Page 28: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Examples (2)

Source: European

Commission

Dispersion (coeffcient of variability) of unemployment

rates

Page 29: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Examples (3)

Growth charts

Source: WHO

Page 30: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Examples (4)

Gross hourly earinings

Source: European Commission

Page 31: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values

Examples(5)

Salary by occupational group and gender

Source: New Zealand State Services

Page 32: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values