27
Random Variable Probability Distribution X=Amt in next bottle X=total of 2 tossed dice X=#G in 4 N(μ=10.2,σ=0.16) B(n=4,p=.5) 2 3 4 5 6 7 8 9 10 11 12 Summary Characteristics Mean Median Mode Std dev Variance skew We must admit that we cannot know exactly what value X will take…. …so that we can do the intelligent thing and talk about something we CAN know, the probability distribution of X. There are summary characteristics of any probability distribution… But knowing these summary measures do not replace our need to know the probability distribution.

Random Variable Probability Distribution X=Amt in next bottle X=total of 2 tossed dice X=#G in 4 N(μ=10.2,σ=0.16) B(n=4,p=.5) 2 3 4 5 6 7 8 9 10 11 12

Embed Size (px)

Citation preview

Random Variable Probability Distribution

X=Amt in next bottle

X=total of 2 tossed dice

X=#G in 4

N(μ=10.2,σ=0.16)

B(n=4,p=.5)

2 3 4 5 6 7 8 9 10 11 12

Summary Characteristics

MeanMedianMode

Std devVariance

skew

We must admit that we cannot know exactly

what value X will take…. …so that we can do the intelligent thing and

talk about something we CAN know, the probability distribution of X.

There are summary characteristics of any probability distribution… But

knowing these summary measures do not replace our need to know the

probability distribution.

Class 06: Descriptive Statistics

EMBS: 3.1, 3.2, first part of 3.3

Characteristics of probability distributions

• Measures of Location– Mean– Median– Mode

• Measures of Variability– Standard Deviation– Variance

• Measure of Shape– skewness

Descriptive Statistics (for numerical data)

• Measures of Location– Sample Mean– Sample Median– Sample Mode

• Measures of Variability– Sample StDev– Sample Variance

• Measure of Shape– Sample skewness

A positively-skewed pdfMode is the most likely

value P(X<median) = P(X>median) = 0.5

Mean is the probability-weighted

average

Skewness > 0

http://dept.econ.yorku.ca/~jbsmith/ec2500_1998/lecture9/Lecture9.html

A negatively-skewed pdf

Skewness < 0

An exhibit at MOMA invites visitors to mark their heights on a wall. A normal distribution results:

Well, not quite. The distribution is actually slightly negatively skewed by the confounding presence of children, who are obviously shorter than adults - you can see this in the great number of names well below the central band which are not mirrored by names higher up. Rest assured, however, that the ex-children distribution is itself Gaussian.

http://www.thisisthegreenroom.com/2009/bell-curves-in-action/

The Normal pdf

http://www.comfsm.fm/~dleeling/statistics/notes06.html

Mean = μmedian = μmode = μ

Skewness = 0

Measures of Variability

http://www.google.com/imgres?q=standard+deviation+curve&hl=en&gbv=2&biw=1226&bih=866&tbm=isch&tbnid=pppxDi8aC37y8M:&imgrefurl=http://www.comfsm.fm/~dleeling/statistics/notes06.html&docid=Hu1RM-siu0MevM&imgurl=http://www.comfsm.fm/~dleeling/statistics/normal_curve_diff_sx.gif&w=401&h=322&ei=9qAqT8KXAcPptgfC3uX0Dw&zoom=1&iact=hc&vpx=748&vpy=508&dur=1013&hovh=201&hovw=251&tx=142&ty=111&sig=106136691078404837864&page=1&tbnh=149&tbnw=186&start=0&ndsp=20&ved=1t:429,r:13,s:0

σ = 0.7σ = 1.0

σ = 1.5

Skewed pdfs can also have different standard deviations

Which pdf has the largest σ?

Pdfs Can have different means, but identical standard deviations

Which pdf has the largest σ?

Which pdf has the largest μ?

Characteristics of probability distributions

• Measures of Location– Mean– Median– Mode

• Measures of Variability– Standard Deviation– Variance

• Measure of Shape– skewness

Descriptive Statistics (for numerical data)

• Measures of Location– Sample Mean– Sample Median– Sample Mode

• Measures of Variability– Sample StDev– Sample Variance

• Measure of Shape– Sample skewness

Probability weighted average

50% point

Most likely

Expected squared distance from mean

Neg if skewed left, 0 if symmetric, pos

if skewed right.

Characteristics of probability distributions

• Measures of Location– Mean– Median– Mode

• Measures of Variability– Standard Deviation– Variance

• Measure of Shape– skewness

Descriptive Statistics (for numerical data)

• Measures of Location– Sample Mean– Sample Median– Sample Mode

• Measures of Variability– Sample StDev– Sample Variance

• Measure of Shape– Sample skewness

Probability weighted average

50% point

Most likely

Expected squared distance from mean

Neg if skewed left, 0 if symmetric, pos

if skewed right.

=average()

=median()

=mode()

=stdev()

=var()

=skew()

Characteristics of probability distributions

• Measures of Location– Mean– Median– Mode

• Measures of Variability– Standard Deviation– Variance

• Measure of Shape– skewness

Descriptive Statistics (for numerical data)

• Measures of Location– Sample Mean– Sample Median– Sample Mode

• Measures of Variability– Sample StDev– Sample Variance

• Measure of Shape– Sample skewness

Probability weighted average

50% point

Most likely

Expected squared distance from mean

Neg if skewed left, 0 if symmetric, pos

if skewed right.

GET THEM ALL USING

DATA ANALYSIS,

DESCRIPTIVE STATISTICS, SUMMARY STATISTCS

Characteristics of probability distributions

• Measures of Location– Mean– Median– Mode

• Measures of Variability– Standard Deviation– Variance

• Measure of Shape– skewness

Descriptive Statistics (for numerical data)

• Measures of Location– Sample Mean– Sample Median– Sample Mode

• Measures of Variability– Sample StDev– Sample Variance

• Measure of Shape– Sample skewness

Probability weighted average

50% point

Most likely

Expected squared distance from mean

Neg if skewed left, 0 if symmetric, pos

if skewed right.

RANGE

COUNT

The sample standard deviation

Understanding sample standard deviation

0 104 14

10 2016 2620 30

stdev 8.25 8.25

0 0 04 6 8

10 10 1016 14 1220 20 20

stdev 8.25 7.62 7.21

0 02 104 16

10 1820 20

stdev 8.07 8.07

It measures variability about

the mean.

All the data contribute to the

measure.

It measures variability …. In either direction.

X X X X X

X X X X X

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Our Data

Section ND ID Gender (M=1) HS Stat? Ht Value4 901526453 0 0 67 04 901533561 0 1 63 0.064 901536075 1 0 70 0. . . . . .. . . . . .. . . . . .5 901636399 1 0 76 05 901643915 1 0 72 0.15 901643995 0 0 64 0

Data/DataAnalysis/DescriptiveStatisticsSummaryStatistics

Section   ND ID   Gender (M=1)   HS Stat?   Ht   Value  

Mean 4.493 Mean 901589800.3 Mean 0.609 Mean 0.217 Mean 69.351 Mean 0.185Standard Error 0.061

Standard Error 4992.821

Standard Error 0.059

Standard Error 0.050

Standard Error 0.477

Standard Error 0.056

Median 4 Median 901596170 Median 1 Median 0 Median 70 Median 0Mode 4 Mode #N/A Mode 1 Mode 0 Mode 71 Mode 0Standard Deviation 0.504

Standard Deviation 41473.487

Standard Deviation 0.492

Standard Deviation 0.415

Standard Deviation 3.959

Standard Deviation 0.465

Sample Variance 0.254

Sample Variance 1720050147

Sample Variance 0.242

Sample Variance 0.173

Sample Variance 15.673

Sample Variance 0.216

Kurtosis -2.060 Kurtosis 0.555 Kurtosis -1.847 Kurtosis -0.039 Kurtosis -0.793 Kurtosis 6.385Skewness 0.030 Skewness -0.581 Skewness -0.455 Skewness 1.401 Skewness -0.307 Skewness 2.706Range 1 Range 228090 Range 1 Range 1 Range 16 Range 2Minimum 4 Minimum 901444465 Minimum 0 Minimum 0 Minimum 60 Minimum 0Maximum 5 Maximum 901672555 Maximum 1 Maximum 1 Maximum 76 Maximum 2

Sum 310 Sum 62209696222 Sum 42 Sum 15 Sum 4785.25 Sum 12.75Count 69 Count 69 Count 69 Count 69 Count 69 Count 69

Data/DataAnalysis/DescriptiveStatisticsSummaryStatistics

Section   ND ID  

Mean 4.493 Mean 901589800.3Standard Error 0.061 Standard Error 4992.821Median 4 Median 901596170Mode 4 Mode #N/AStandard Deviation 0.504

Standard Deviation 41473.487

Sample Variance 0.254 Sample Variance 1720050147Kurtosis -2.060 Kurtosis 0.555Skewness 0.030 Skewness -0.581Range 1 Range 228090Minimum 4 Minimum 901444465Maximum 5 Maximum 901672555Sum 310 Sum 62209696222Count 69 Count 69

Data/DataAnalysis/DescriptiveStatisticsSummaryStatistics

Gender (M=1)   HS Stat?  

Mean 0.609 Mean 0.217Standard Error 0.059 Standard Error 0.050Median 1 Median 0Mode 1 Mode 0Standard Deviation 0.492 Standard Deviation 0.415Sample Variance 0.242 Sample Variance 0.173Kurtosis -1.847 Kurtosis -0.039Skewness -0.455 Skewness 1.401Range 1 Range 1Minimum 0 Minimum 0Maximum 1 Maximum 1Sum 42 Sum 15Count 69 Count 69

Data/DataAnalysis/DescriptiveStatisticsSummaryStatistics

Ht   Value  

Mean 69.351 Mean 0.185Standard Error 0.477 Standard Error 0.056Median 70 Median 0Mode 71 Mode 0Standard Deviation 3.959 Standard Deviation 0.465Sample Variance 15.673 Sample Variance 0.216Kurtosis -0.793 Kurtosis 6.385Skewness -0.307 Skewness 2.706Range 16 Range 2Minimum 60 Minimum 0Maximum 76 Maximum 2Sum 4785.25 Sum 12.75Count 69 Count 69

Fill Test DataNormal(10.2,0.16)?

EXHIBIT 2LOREX PHARMACEUTICALS

Filling Line Test Results with Target = 10.2

9.89 10.41 10.53 10.20 10.23 10.1510.17 10.17 10.32 10.04 10.48 10.1110.29 10.35 10.16 10.16 10.17 10.1910.00 10.06 10.21 10.22 9.76 10.2210.04 10.19 10.09 10.12 10.06 10.1010.35 10.17 10.02 10.36 10.17 9.9910.05 10.07 10.32 10.24 10.04 10.4010.19 10.27 10.14 10.07 10.41 10.7610.21 10.13 10.11 10.40 10.27 10.209.79 10.24 10.20 10.29 10.00 10.31

10.53 10.14 10.35 10.21 10.23 10.1610.47 9.84 9.96 10.10 10.11 10.2310.24 10.36 10.30 10.23 10.19 10.1710.17 10.11 10.33 10.19 9.97 10.0010.15 10.42 10.36 10.19 10.05 10.1110.06 10.16 10.17 10.29 10.12 10.3010.13 10.21 10.15 10.25 10.33 10.6410.04 10.01 10.14 10.18 10.18 10.1010.20 10.25 10.07 10.42 10.54 10.2310.37 10.44 10.37 9.85 9.91 10.4510.24 10.44 10.40 10.45 10.28 10.1710.03 10.44 10.25 10.37 10.23 10.1910.01 10.13 10.24 10.22 9.98 9.9810.20 10.29 10.03 10.19 9.99 10.13

Fill Test DataDescriptive StatisticsSummary Statistics

Amount

Mean 10.198Standard Error 0.014Median 10.190Mode #N/AStandard Deviation 0.163Sample Variance 0.026Kurtosis 0.771Skewness 0.245Range 0.997Minimum 9.758Maximum 10.756Sum 1468.542Count 144

Fill Test DataHistogram

Bin Frequency9.758 19.841 29.925 3

10.008 1010.091 1710.174 3310.257 3610.340 1410.423 1610.506 710.590 310.673 1More 1

9.7589.841

9.925

10.008

10.091

10.174

10.257

10.340

10.423

10.506

10.590

10.673More

0

5

10

15

20

25

30

35

40

Histogram

Frequency

Bin

Freq

uency

1 data point was < 9.758

2 data points were between 9.758 and 9.841

1 was above 10.673

DataData Analysis

HistogramCheck chart output

Preview of Coming Attractions

• Class 07– Find out how to use these

counts to test H0: these data came from N(10.2,.16)

– Find out how to use the Denmark family counts to test H0: those data came from Binomial(4,.5)

9.7589.925

10.091

10.257

10.423

10.590More

05

10152025303540

Histogram

Frequency

Bin

Freq

uency