WHY WE USE EXPLORATORY DATA ANALYSIS

Preview:

DESCRIPTION

ESTIMATES BASED. ON NORMAL DISTRIB. DATA. YES. NO. WHY ?. OUTLIERS. CAN WE. KURTOSIS ,. EXTR EMS. REMOVED THEM ?. SKEWNESS. YES. NO. QUANTILE. (ROBUST). TRANSFORMA TIONS. ESTIMATES. QUANTILE. (ROBUST). ESTIMATES. WHY WE USE EXPLORATORY DATA ANALYSIS. - PowerPoint PPT Presentation

Citation preview

1

WHY WE USE EXPLORATORY DATA ANALYSIS

DATA YES

NO

ESTIMATES BASEDON NORMAL DISTRIB.

KURTOSIS, SKEWNESS

TRANSFORMATIONS

QUANTILE (ROBUST)

ESTIMATES

OUTLIERS

EXTREMS YES

NO

QUANTILE (ROBUST)

ESTIMATES

WHY ?

CAN WEREMOVED THEM ?

DO DATA COME FROM NORMAL DISTRIBUTION?

TRANSFORMATIONS

2

METHODS OF EDA

Graphical:

dot plot

box plot

notched box plot

QQ plot

histogram

density plots

Tests:

tests of normality

minimal sample size

3

DOT PLOT

4

BOX PLOT

lowerquartil

upperkvartil

fenceouter inner

fenceinner outer

interquartilerange (H)

číselná osa

median

5

NOTCHED BOX PLOT

interval estimate of median

FD,H

1,57 RI = M ±

n

RF

6

Q-Q PLOT

X: theoretical quantiles of analysed distribution

Y: sample quantilesideal coincidence of sample values and theoretical distribution

measured values

7

Q-Q GRAF

25 30 35 40 45 50 55 60 65

Pozorovaná hodnota

-3

-2

-1

0

1

2

3

Oče

káva

ná n

orm

ální

hod

nota

8

Q-Q GRAF

-20 0 20 40 60 80 100 120

Pozorovaná hodnota

-3

-2

-1

0

1

2

3

Očekávaná n

orm

áln

í hodnota

9

Q-Q plot

right sided – skewed to left

left sided – skewed to right

platycurtic („flat“) leptocurtic(„steep“)

10

11

12

HISTOGRAM

Histogram - Sheet1 - TLOUSTKYČetnost

TLOUSTKY

20 30 40 50 60 700

10

20

30

13

HISTOGRAM

correct width of interval:

0,4int 2,46 ( 1)L n nL 2int

14

HISTOGRAM – kernel density function

Odhad hustoty - Sheet1 - TLOUSTKYHustota

TLOUSTKY

10 20 30 40 50 60 70 800.000

0.010

0.020

0.030

0.040

0.050

0.060

15

TRANSFORMATION

Aim of transformation:reduction of variance better level of symmetry(normality) of data

Transformation function:non-linear function monotonic function

16

TRANSFORMATION – basic concept

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 0.5 1 1.5 2 2.5 3 3.5

Original data (tree-rings widths in mm)

Tra

nsf

orm

ed d

ata

mean of original data

transformed mean and its

projection to original data set

17

TRANSFORMATION – logaritmic transformation

lnx x

0.0

5.0

10.0

15.0

0.0 266.7 533.3 800.0

Histogram

C2

Count

0.0

3.3

6.7

10.0

3.0 4.3 5.7 7.0

Histogram

C7

Count

18

TRANSFORMATION – power transformation

0

( ) ln 0

0

x

x x for

x

19

TRANSFORMATION – Box-Cox

0xln

01x

)x(

20

TRANSFORMATION – Box-Cox

21

TRANSFORMATION– estimate of optimal

logarithm oflikelihood function

for various values of optimal

interval estimate of parameter

= 1 is not included in intervalestimate of . It means that

transformation will be probably

successful

1.00

maxLF – 0,5*quantile 2

Recommended