22
Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers 1

Summary Statistics

  • Upload
    dalton

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Summary Statistics. Jake Blanchard Spring 2008. Summarizing and Interpreting Data. It is useful to have some metrics for summarizing statistical data (both input and output) 3 key characteristics are central tendency (mean, median, mode) Dispersion (variance) Shape ( skewness , kurtosis). - PowerPoint PPT Presentation

Citation preview

Page 1: Summary Statistics

Uncertainty Analysis for Engineers 1

Summary StatisticsJake BlanchardSpring 2008

Page 2: Summary Statistics

Uncertainty Analysis for Engineers 2

Summarizing and Interpreting Data

It is useful to have some metrics for summarizing statistical data (both input and output)

3 key characteristics are ◦central tendency (mean, median,

mode)◦Dispersion (variance)◦Shape (skewness, kurtosis)

Page 3: Summary Statistics

Uncertainty Analysis for Engineers 3

Central TendencyMean

Median=point such that exactly half of the probability is associated with lower values and half with greater values

Mode=most likely value (maximum of pdf)

dxxfxxEpxxE i

n

ii )()()(

1

z

dxxf 5.0)(

Page 4: Summary Statistics

Uncertainty Analysis for Engineers 4

For 1 Dice

5.3mod5.3

5.3)(616

615

614

613

612

611)()(

6

1

exmedianxE

xpxxE

mean

ix

ii

Page 5: Summary Statistics

Uncertainty Analysis for Engineers 5

Radioactive DecayFor our example, the mean, median,

and mode are given by

The mode is x=0

)2ln(

5.0

1)()(

0

0

z

dte

median

dtetdtttftE

mean

zt

t

Page 6: Summary Statistics

Uncertainty Analysis for Engineers 6

Other CharacteristicsWe can calculate the expected

value of any function of our random variable as

iii xpxh

dxxfxh

xhE

)()(

Page 7: Summary Statistics

Uncertainty Analysis for Engineers 7

Some Results

n

jjj

n

jjj

n

jj

n

jj

xEbxbE

xExE

xcEcxEccE

11

11

)()()(

Page 8: Summary Statistics

Uncertainty Analysis for Engineers 8

ii

ki

k

kk

xpx

dxxfx

xE

dxxfx

)(

)(

)(

1

1

1

1

Moments of DistributionsWe can define many of these

parameters in terms of moments of the distribution

Mean is first moment. Variance is second momentThird and fourth moments are related

to skewness and kurtosis

Page 9: Summary Statistics

Uncertainty Analysis for Engineers 9

Spread (Variance)Variance is a measure of spread or dispersion

For discrete data sets, the biased variance is:

and the unbiased variance is

The standard deviation is the square root of the variance

dxxfxxE )(21

212

2

n

i

xxn

s1

22 1

n

i

xxn

s1

22

11

Page 10: Summary Statistics

Uncertainty Analysis for Engineers 10

Skewnessskewness is a measure of

asymmetry

For discrete data sets, the biased skewness is related to:

The skewness is often defined as

dxxfxxE )(31

313

n

i

xxn

m1

331

33

1

Page 11: Summary Statistics

Uncertainty Analysis for Engineers 11

Skewness

Page 12: Summary Statistics

Uncertainty Analysis for Engineers 12

Kurtosiskurtosis is a measure of

peakedness

For discrete data sets, the biased kurtosis is related to:

The kurtosis is often defined as

dxxfxxE )(41

414

n

i

xxn

m1

441

344

2

Page 13: Summary Statistics

Uncertainty Analysis for Engineers 13

Kurtosis

Pdf of Pearson type VII distribution with kurtosis of infinity (red), 2 (blue), and 0 (black)

Page 14: Summary Statistics

Uncertainty Analysis for Engineers 14

Using MatlabSample data is length of time a

person was able to hold their breath (40 attempts)

Try a scatter plotload RobPracticeHolds; y = ones(size(breathholds));h1 = figure('Position',[100 100 400

100],'Color','w');scatter(breathholds,y);

Page 15: Summary Statistics

Uncertainty Analysis for Engineers 15

Adding Informationdisp(['The mean is ',num2str(mean(breathholds)),' seconds

(green line).']);disp(['The median is ',num2str(median(breathholds)),'

seconds (red line).']);hold all;line([mean(breathholds) mean(breathholds)],[0.5

1.5],'color','g');line([median(breathholds) median(breathholds)],[0.5

1.5],'color','r');

Page 16: Summary Statistics

Uncertainty Analysis for Engineers 16

Box Plottitle('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines'); xlabel(''); h3 = figure('Position',[100 100 400 100],'Color','w'); boxplot(breathholds,'orientation','horizontal','widths',.5); set(gca,'XLim',[40 140]); title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]); ylabel('');

Page 17: Summary Statistics

Uncertainty Analysis for Engineers 17

Box Plot

Min

MaxMedia

nOutlier

Box represents

inter-quartile

range (half of data)

Page 18: Summary Statistics

Uncertainty Analysis for Engineers 18

Empirical cdfh3 = figure('Position',[100 100 600

400],'Color','w');cdfplot(breathholds);

Page 19: Summary Statistics

Uncertainty Analysis for Engineers 19

Multivariate Data SetsWhen there are multiple input

variables, we need some additional ways to characterize the data

If x and y are independent, then Cov(x,y)=0

)()()(),(

,),(

),(),(),(

yExExyEyxCov

discreteyxpyxh

continuousdxdyyxfyxhyxhE

i jjiji

Page 20: Summary Statistics

Uncertainty Analysis for Engineers 20

Correlation CoefficientsTwo random variables may be relatedDefine correlation coefficient of input (x)

and output (y) as

=1 implies linear dependence, positive slope

=0 no dependence=-1 implies linear dependence, negative

slope

)()(

),(

1 1

22

1, yx

yxCov

yyxx

yyxxm

k

m

k kk

m

k kkyx

Page 21: Summary Statistics

Uncertainty Analysis for Engineers 21

Example

=0.98

=-0.38

=1

=-0.98

Page 22: Summary Statistics

Uncertainty Analysis for Engineers 22

Examplex=rand(25,1)-0.5;y=x;corrcoef(x,y)subplot(2,2,1), plot(x,y,'o')y2=x+0.2*rand(25,1);corrcoef(x,y2)subplot(2,2,2), plot(x,y2,'o')y3=-x+0.2*rand(25,1);corrcoef(x,y3)subplot(2,2,3), plot(x,y3,'o')y4=rand(25,1)-0.5;corrcoef(x,y4)subplot(2,2,4), plot(x,y4,'o')