22
Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers 1

Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Embed Size (px)

Citation preview

Page 1: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 1

Summary StatisticsJake BlanchardSpring 2008

Page 2: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 2

Summarizing and Interpreting Data

It is useful to have some metrics for summarizing statistical data (both input and output)

3 key characteristics are ◦central tendency (mean, median,

mode)◦Dispersion (variance)◦Shape (skewness, kurtosis)

Page 3: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 3

Central TendencyMean

Median=point such that exactly half of the probability is associated with lower values and half with greater values

Mode=most likely value (maximum of pdf)

dxxfxxEpxxE i

n

ii )()()(

1

z

dxxf 5.0)(

Page 4: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 4

For 1 Dice

5.3mod

5.3

5.3)(

6

16

6

15

6

14

6

13

6

12

6

11)()(

6

1

e

x

median

xE

xpxxE

mean

ix

i

i

Page 5: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 5

Radioactive DecayFor our example, the mean, median,

and mode are given by

The mode is x=0

)2ln(

5.0

1)()(

0

0

z

dte

median

dtetdtttftE

mean

zt

t

Page 6: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 6

Other CharacteristicsWe can calculate the expected

value of any function of our random variable as

iii xpxh

dxxfxh

xhE

)()(

Page 7: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 7

Some Results

n

jjj

n

jjj

n

jj

n

jj

xEbxbE

xExE

xcEcxE

ccE

11

11

)()(

)(

Page 8: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 8

ii

ki

k

kk

xpx

dxxfx

xE

dxxfx

)(

)(

)(

1

1

1

1

Moments of DistributionsWe can define many of these

parameters in terms of moments of the distribution

Mean is first moment. Variance is second momentThird and fourth moments are

related to skewness and kurtosis

Page 9: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 9

Spread (Variance)Variance is a measure of spread or

dispersion

For discrete data sets, the biased variance is:

and the unbiased variance is

The standard deviation is the square root of the variance

dxxfxxE )(21

212

2

n

i

xxn

s1

22 1

n

i

xxn

s1

22

1

1

Page 10: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 10

Skewnessskewness is a measure of

asymmetry

For discrete data sets, the biased skewness is related to:

The skewness is often defined as

dxxfxxE )(31

313

n

i

xxn

m1

33

1

33

1

Page 11: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 11

Skewness

Page 12: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 12

Kurtosiskurtosis is a measure of

peakedness

For discrete data sets, the biased kurtosis is related to:

The kurtosis is often defined as

dxxfxxE )(41

414

n

i

xxn

m1

44

1

344

2

Page 13: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 13

Kurtosis

Pdf of Pearson type VII distribution with kurtosis of infinity (red), 2 (blue), and 0 (black)

Page 14: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 14

Using Matlab

Sample data is length of time a person was able to hold their breath (40 attempts)

Try a scatter plotload RobPracticeHolds; y = ones(size(breathholds));h1 = figure('Position',[100 100 400

100],'Color','w');scatter(breathholds,y);

Page 15: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 15

Adding Informationdisp(['The mean is ',num2str(mean(breathholds)),' seconds

(green line).']);

disp(['The median is ',num2str(median(breathholds)),' seconds (red line).']);

hold all;

line([mean(breathholds) mean(breathholds)],[0.5 1.5],'color','g');

line([median(breathholds) median(breathholds)],[0.5 1.5],'color','r');

Page 16: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 16

Box Plot

title('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines');

xlabel('');

h3 = figure('Position',[100 100 400 100],'Color','w'); boxplot(breathholds,'orientation','horizontal','widths',.5);

set(gca,'XLim',[40 140]);

title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]); ylabel('');

Page 17: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 17

Box Plot

Min

MaxMedia

n

Outlier

Box represents

inter-quartile

range (half of data)

Page 18: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 18

Empirical cdfh3 = figure('Position',[100 100 600

400],'Color','w');

cdfplot(breathholds);

Page 19: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 19

Multivariate Data SetsWhen there are multiple input

variables, we need some additional ways to characterize the data

If x and y are independent, then Cov(x,y)=0

)()()(),(

,),(

),(),(),(

yExExyEyxCov

discreteyxpyxh

continuousdxdyyxfyxhyxhE

i jjiji

Page 20: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 20

Correlation Coefficients

Two random variables may be relatedDefine correlation coefficient of input (x)

and output (y) as

=1 implies linear dependence, positive slope

=0 no dependence=-1 implies linear dependence, negative

slope

)()(

),(

1 1

22

1, yx

yxCov

yyxx

yyxxm

k

m

k kk

m

k kkyx

Page 21: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 21

Example

=0.98

=-0.38

=1

=-0.98

Page 22: Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1

Uncertainty Analysis for Engineers 22

Examplex=rand(25,1)-0.5;y=x;corrcoef(x,y)subplot(2,2,1), plot(x,y,'o')y2=x+0.2*rand(25,1);corrcoef(x,y2)subplot(2,2,2), plot(x,y2,'o')y3=-x+0.2*rand(25,1);corrcoef(x,y3)subplot(2,2,3), plot(x,y3,'o')y4=rand(25,1)-0.5;corrcoef(x,y4)subplot(2,2,4), plot(x,y4,'o')