22
Statistics: Statistics: Data Analysis and Data Analysis and Presentation Presentation Fr Clinic II Fr Clinic II

Statistics: Data Analysis and Presentation Fr Clinic II

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics: Data Analysis and Presentation Fr Clinic II

Statistics:Statistics:Data Analysis and Data Analysis and

PresentationPresentation

Fr Clinic IIFr Clinic II

Page 2: Statistics: Data Analysis and Presentation Fr Clinic II

OverviewOverview Tables and GraphsTables and Graphs Populations and SamplesPopulations and Samples Mean, Median, and Standard DeviationMean, Median, and Standard Deviation Standard Error & 95% Confidence Interval Standard Error & 95% Confidence Interval

(CI)(CI) Error BarsError Bars Comparing Means of Two Data SetsComparing Means of Two Data Sets Linear Regression (LR)Linear Regression (LR)

Page 3: Statistics: Data Analysis and Presentation Fr Clinic II

WarningWarning Statistics is a huge field, I’ve simplified Statistics is a huge field, I’ve simplified

considerably here. For example:considerably here. For example:– Mean, Median, and Standard DeviationMean, Median, and Standard Deviation

There are alternative formulasThere are alternative formulas

– Standard Error and the 95% Confidence IntervalStandard Error and the 95% Confidence Interval There are other ways to calculate CIs (e.g., z statistic There are other ways to calculate CIs (e.g., z statistic

instead of t; difference between two means, rather than instead of t; difference between two means, rather than single mean…)single mean…)

– Error BarsError Bars Don’t go beyond the interpretations I give here!Don’t go beyond the interpretations I give here!

– Comparing Means of Two Data SetsComparing Means of Two Data Sets We just cover the t test for two means when the variances We just cover the t test for two means when the variances

are unknown but equal, there are other testsare unknown but equal, there are other tests

– Linear RegressionLinear Regression We only look at simple LR and only calculate the intercept, We only look at simple LR and only calculate the intercept,

slope and Rslope and R22. There is much more to LR!. There is much more to LR!

Page 4: Statistics: Data Analysis and Presentation Fr Clinic II

TablesTables

Water

(1)

Turbidity (NTU)

(2)

True Color (Pt-Co)

(3)

Apparent Color

(Pt-Co) (4)

Pond Water 10 13 30

Sweetwater 4 5 12

Hiker 3 8 11

MiniWorks 2 3 5

Standard 5a 15 15

a Level at which humans can visually detect turbidity

Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters

Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns

4 5 12

Page 5: Statistics: Data Analysis and Presentation Fr Clinic II

FiguresFigures

11

Figure 1: Turbidity of Pond Water, Treated and Untreated

0

5

10

15

20

25

Pond Water Sweetwater Miniworks Hiker Pioneer Voyager

Filter

Tu

rbid

ity

(NT

U)

20

10

75

1

11

Consistent Format, Title, UnitsGood Axis Titles, Big Fonts

Page 6: Statistics: Data Analysis and Presentation Fr Clinic II

Populations and SamplesPopulations and Samples PopulationPopulation

– All of the possible outcomes of experiment or All of the possible outcomes of experiment or observation observation

US populationUS population Particular type of steel beamParticular type of steel beam

SampleSample– A finite number of outcomes measured or A finite number of outcomes measured or

observations madeobservations made 1000 US citizens1000 US citizens 5 beams5 beams

We use samples to estimate population We use samples to estimate population propertiesproperties– Mean, Variability (e.g. standard deviation), Mean, Variability (e.g. standard deviation),

DistributionDistribution Height of 1000 US citizens used to estimate mean of US Height of 1000 US citizens used to estimate mean of US

populationpopulation

Page 7: Statistics: Data Analysis and Presentation Fr Clinic II

Mean and MedianMean and Median

Turbidity of Treated Water (NTU)Turbidity of Treated Water (NTU)

Mean Mean = Sum of values divided by number of = Sum of values divided by number of samples samples

= (= (1+3+3+6+8+101+3+3+6+8+10)/6 )/6 = 5.2 NTU= 5.2 NTU

Median = The middle number Median = The middle number Rank - Rank - 1 2 3 4 5 61 2 3 4 5 6Number - Number - 1 3 3 6 8 101 3 3 6 8 10

For even number of sample points, average middle twoFor even number of sample points, average middle two

= (3+6)/2 = 4.5= (3+6)/2 = 4.5

13368

10

Excel: Mean – AVERAGE; Median - MEDIAN

Page 8: Statistics: Data Analysis and Presentation Fr Clinic II

VarianceVariance

Measure of variabilityMeasure of variability– sum of the square of the deviation about sum of the square of the deviation about

the mean divided by degrees of freedomthe mean divided by degrees of freedom

1n

xxs

2i2

n = number of data points

Excel: variance – VAR

Page 9: Statistics: Data Analysis and Presentation Fr Clinic II

Square-root of the varianceSquare-root of the variance For phenomena following a For phenomena following a Normal Normal

DistributionDistribution (bell curve), 95% of (bell curve), 95% of population values lie within 1.96 population values lie within 1.96 standard deviations of the meanstandard deviations of the mean

Area under curve is Area under curve is probability of getting probability of getting value within specified value within specified rangerange

Standard Deviation, sStandard Deviation, s

Normal Distribution

-4 -2 0 2 4

Standard Deviation

-1.96 1.96

95%

Standard Deviations from Mean

2ss

Excel: standard deviation – STDEV

Page 10: Statistics: Data Analysis and Presentation Fr Clinic II

Standard error of meanStandard error of mean – Of sample of size Of sample of size nn – taken from population with standard deviation taken from population with standard deviation ss

– Estimate of mean depends on sample selectedEstimate of mean depends on sample selected– As n As n , variance of mean estimate goes down, i.e., , variance of mean estimate goes down, i.e.,

estimate of population mean improvesestimate of population mean improves– As n As n , mean estimate distribution approaches , mean estimate distribution approaches

normal, regardless of population distributionnormal, regardless of population distribution

Standard Error of MeanStandard Error of Mean

n

ssX

Page 11: Statistics: Data Analysis and Presentation Fr Clinic II

Interval within which we are 95 % confident the Interval within which we are 95 % confident the true mean lies true mean lies

tt95%,n-1 95%,n-1 is t-statistic for 95% CI if sample size = nis t-statistic for 95% CI if sample size = n– If n If n 30, let t 30, let t95%,n-195%,n-1 = = 1.96 (Normal Distribution)1.96 (Normal Distribution)– Otherwise, use Excel formula: Otherwise, use Excel formula: TINV(0.05,n-1)TINV(0.05,n-1)

n = number of data pointsn = number of data points

95% Confidence Interval (CI) for 95% Confidence Interval (CI) for MeanMean

X1n%,95 stX

Page 12: Statistics: Data Analysis and Presentation Fr Clinic II

Show data variability on plot of mean Show data variability on plot of mean values values

Types of error bars include:Types of error bars include: ± ± Standard Deviation, Standard Deviation, ± ± Standard Error, Standard Error, ± 95% ± 95%

CICI Maximum and minimum valueMaximum and minimum value

Error BarsError Bars

0

2

4

6

8

10

Filter 1 Filter 2 Filter 3

Filter Type

Turb

idity

(NTU

)

Page 13: Statistics: Data Analysis and Presentation Fr Clinic II

Standard DeviationStandard Deviation– Demonstrates data variability, but no comparison Demonstrates data variability, but no comparison

possiblepossible Standard ErrorStandard Error

– If bars overlap, any difference in means is not If bars overlap, any difference in means is not statistically significantstatistically significant

– If bars do not overlap, indicates nothing!If bars do not overlap, indicates nothing! 95% Confidence Interval95% Confidence Interval

– If bars overlap, indicates nothing!If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically If bars do not overlap, difference is statistically

significantsignificant We’ll use 95 % CIWe’ll use 95 % CI

Using Error Bars to compare dataUsing Error Bars to compare data

Page 14: Statistics: Data Analysis and Presentation Fr Clinic II

Example 1Example 1Turbidity Data

1 2 3 mean St Dev n St Error t95%,2 +/- 95% CI

NTU NTU NTU NTU NTU NTUFilter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38

2.1

4.2 4.3

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Filter 1 Filter 2 Filter 3

Portable Water Filter

Tu

rbid

ity

(NT

U)

Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.

Page 15: Statistics: Data Analysis and Presentation Fr Clinic II

Example 2Example 2Turbidity Measurements

Time 1 2 3 mean St Dev n St Error t95,2 +/- 95% CIMin NTU NTU NTU NTU NTU NTU1 4.3 4.5 4.6 4.5 0.15 3 0.09 4.30 0.382 4.4 4.4 4.5 4.4 0.06 3 0.03 4.30 0.143 4.3 4.2 4.2 4.2 0.06 3 0.03 4.30 0.14

0.0

1.0

2.0

3.0

4.0

5.0

6.0

0 1 2 3 4

Time (min)

Tu

rbid

ity

(NT

U)

Page 16: Statistics: Data Analysis and Presentation Fr Clinic II

What can we do?What can we do?

Plot mean water quality data for Plot mean water quality data for various filters with error barsvarious filters with error bars

Plot mean water quality over time with Plot mean water quality over time with error barserror bars

Page 17: Statistics: Data Analysis and Presentation Fr Clinic II

Comparing Filter PerformanceComparing Filter Performance

Use t test to determine if the mean of Use t test to determine if the mean of two populations are different.two populations are different.– Based on two data setsBased on two data sets

E.g., turbidity produced by two different filtersE.g., turbidity produced by two different filters

Page 18: Statistics: Data Analysis and Presentation Fr Clinic II

Comparing Two Data Sets using the t Comparing Two Data Sets using the t testtest

Example - You pump 20 gallons of water Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, through filter 1 and 2. After every gallon, you measure the turbidity.you measure the turbidity.– Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20– Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20

You ask the question - Do the Filters make You ask the question - Do the Filters make water with a different mean turbidity?water with a different mean turbidity?

Page 19: Statistics: Data Analysis and Presentation Fr Clinic II

Do the Filters make different Do the Filters make different water?water?

Use TTEST (Excel)Use TTEST (Excel)

Fractional probability of being wrongFractional probability of being wrong if you answer if you answer yesyes– We want probability to be small We want probability to be small

0.01 to 0.10 (1 to 10 %). 0.01 to 0.10 (1 to 10 %). Use 0.01Use 0.01

Filter 1 Filter 21.5 3

2 2.42.2 2.21.8 2.6

3 3.41.6 3.61.2 3.82.1 3.51.9 2.72.2 2.42.6 3.51.7 3.81.8 2.11.5 2.52.4 3.42.5 3.32.7 2.41.4 3.61.5 2.32.6 3.7

Page 20: Statistics: Data Analysis and Presentation Fr Clinic II

““t test” Questionst test” Questions Do two filters make different water?Do two filters make different water?

– Take multiple measurements of a particular water Take multiple measurements of a particular water quality parameter for 2 filtersquality parameter for 2 filters

Do two filters treat difference amounts of Do two filters treat difference amounts of water between cleanings?water between cleanings?– Measure amount of water filtered between Measure amount of water filtered between

cleanings for two filterscleanings for two filters Does the amount of water a filter treats Does the amount of water a filter treats

between cleaning differ after a certain between cleaning differ after a certain amount of water is treated?amount of water is treated?– For a single filter, measure the amount of water For a single filter, measure the amount of water

treated between cleanings before and after a treated between cleanings before and after a certain total amount of water is treatedcertain total amount of water is treated

Page 21: Statistics: Data Analysis and Presentation Fr Clinic II

Linear RegressionLinear Regression

Fit the best straight line to a data setFit the best straight line to a data set

y = 1.897x + 0.8667

R2 = 0.9762

0

5

10

15

20

25

0 2 4 6 8 10 12

Height (m)

Gra

de

Po

int

Av

erag

e

Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.

Page 22: Statistics: Data Analysis and Presentation Fr Clinic II

RR22 - Coefficient of multiple - Coefficient of multiple DeterminationDetermination

2

i

2i

2i

2ii2

yy

yy

yy

yy1R

ŷi = Predicted y values, from regression equationyi = Observed y values

R2 = fraction of variance explained by regression (variance = standard deviation squared)= 1 if data lies along a straight line