View
217
Download
0
Category
Tags:
Preview:
Citation preview
Statistics:Statistics:Data Analysis and Data Analysis and
PresentationPresentation
Fr Clinic IIFr Clinic II
OverviewOverview Tables and GraphsTables and Graphs Populations and SamplesPopulations and Samples Mean, Median, and Standard DeviationMean, Median, and Standard Deviation Standard Error & 95% Confidence Interval Standard Error & 95% Confidence Interval
(CI)(CI) Error BarsError Bars Comparing Means of Two Data SetsComparing Means of Two Data Sets Linear Regression (LR)Linear Regression (LR)
WarningWarning Statistics is a huge field, I’ve simplified Statistics is a huge field, I’ve simplified
considerably here. For example:considerably here. For example:– Mean, Median, and Standard DeviationMean, Median, and Standard Deviation
There are alternative formulasThere are alternative formulas
– Standard Error and the 95% Confidence IntervalStandard Error and the 95% Confidence Interval There are other ways to calculate CIs (e.g., z statistic There are other ways to calculate CIs (e.g., z statistic
instead of t; difference between two means, rather than instead of t; difference between two means, rather than single mean…)single mean…)
– Error BarsError Bars Don’t go beyond the interpretations I give here!Don’t go beyond the interpretations I give here!
– Comparing Means of Two Data SetsComparing Means of Two Data Sets We just cover the t test for two means when the variances We just cover the t test for two means when the variances
are unknown but equal, there are other testsare unknown but equal, there are other tests
– Linear RegressionLinear Regression We only look at simple LR and only calculate the intercept, We only look at simple LR and only calculate the intercept,
slope and Rslope and R22. There is much more to LR!. There is much more to LR!
TablesTables
Water
(1)
Turbidity (NTU)
(2)
True Color (Pt-Co)
(3)
Apparent Color
(Pt-Co) (4)
Pond Water 10 13 30
Sweetwater 4 5 12
Hiker 3 8 11
MiniWorks 2 3 5
Standard 5a 15 15
a Level at which humans can visually detect turbidity
Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters
Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns
4 5 12
FiguresFigures
11
Figure 1: Turbidity of Pond Water, Treated and Untreated
0
5
10
15
20
25
Pond Water Sweetwater Miniworks Hiker Pioneer Voyager
Filter
Tu
rbid
ity
(NT
U)
20
10
75
1
11
Consistent Format, Title, UnitsGood Axis Titles, Big Fonts
Populations and SamplesPopulations and Samples PopulationPopulation
– All of the possible outcomes of experiment or All of the possible outcomes of experiment or observation observation
US populationUS population Particular type of steel beamParticular type of steel beam
SampleSample– A finite number of outcomes measured or A finite number of outcomes measured or
observations madeobservations made 1000 US citizens1000 US citizens 5 beams5 beams
We use samples to estimate population We use samples to estimate population propertiesproperties– Mean, Variability (e.g. standard deviation), Mean, Variability (e.g. standard deviation),
DistributionDistribution Height of 1000 US citizens used to estimate mean of US Height of 1000 US citizens used to estimate mean of US
populationpopulation
Mean and MedianMean and Median
Turbidity of Treated Water (NTU)Turbidity of Treated Water (NTU)
Mean Mean = Sum of values divided by number of = Sum of values divided by number of samples samples
= (= (1+3+3+6+8+101+3+3+6+8+10)/6 )/6 = 5.2 NTU= 5.2 NTU
Median = The middle number Median = The middle number Rank - Rank - 1 2 3 4 5 61 2 3 4 5 6Number - Number - 1 3 3 6 8 101 3 3 6 8 10
For even number of sample points, average middle twoFor even number of sample points, average middle two
= (3+6)/2 = 4.5= (3+6)/2 = 4.5
13368
10
Excel: Mean – AVERAGE; Median - MEDIAN
VarianceVariance
Measure of variabilityMeasure of variability– sum of the square of the deviation about sum of the square of the deviation about
the mean divided by degrees of freedomthe mean divided by degrees of freedom
1n
xxs
2i2
n = number of data points
Excel: variance – VAR
Square-root of the varianceSquare-root of the variance For phenomena following a For phenomena following a Normal Normal
DistributionDistribution (bell curve), 95% of (bell curve), 95% of population values lie within 1.96 population values lie within 1.96 standard deviations of the meanstandard deviations of the mean
Area under curve is Area under curve is probability of getting probability of getting value within specified value within specified rangerange
Standard Deviation, sStandard Deviation, s
Normal Distribution
-4 -2 0 2 4
Standard Deviation
-1.96 1.96
95%
Standard Deviations from Mean
2ss
Excel: standard deviation – STDEV
Standard error of meanStandard error of mean – Of sample of size Of sample of size nn – taken from population with standard deviation taken from population with standard deviation ss
– Estimate of mean depends on sample selectedEstimate of mean depends on sample selected– As n As n , variance of mean estimate goes down, i.e., , variance of mean estimate goes down, i.e.,
estimate of population mean improvesestimate of population mean improves– As n As n , mean estimate distribution approaches , mean estimate distribution approaches
normal, regardless of population distributionnormal, regardless of population distribution
Standard Error of MeanStandard Error of Mean
n
ssX
Interval within which we are 95 % confident the Interval within which we are 95 % confident the true mean lies true mean lies
tt95%,n-1 95%,n-1 is t-statistic for 95% CI if sample size = nis t-statistic for 95% CI if sample size = n– If n If n 30, let t 30, let t95%,n-195%,n-1 = = 1.96 (Normal Distribution)1.96 (Normal Distribution)– Otherwise, use Excel formula: Otherwise, use Excel formula: TINV(0.05,n-1)TINV(0.05,n-1)
n = number of data pointsn = number of data points
95% Confidence Interval (CI) for 95% Confidence Interval (CI) for MeanMean
X1n%,95 stX
Show data variability on plot of mean Show data variability on plot of mean values values
Types of error bars include:Types of error bars include: ± ± Standard Deviation, Standard Deviation, ± ± Standard Error, Standard Error, ± 95% ± 95%
CICI Maximum and minimum valueMaximum and minimum value
Error BarsError Bars
0
2
4
6
8
10
Filter 1 Filter 2 Filter 3
Filter Type
Turb
idity
(NTU
)
Standard DeviationStandard Deviation– Demonstrates data variability, but no comparison Demonstrates data variability, but no comparison
possiblepossible Standard ErrorStandard Error
– If bars overlap, any difference in means is not If bars overlap, any difference in means is not statistically significantstatistically significant
– If bars do not overlap, indicates nothing!If bars do not overlap, indicates nothing! 95% Confidence Interval95% Confidence Interval
– If bars overlap, indicates nothing!If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically If bars do not overlap, difference is statistically
significantsignificant We’ll use 95 % CIWe’ll use 95 % CI
Using Error Bars to compare dataUsing Error Bars to compare data
Example 1Example 1Turbidity Data
1 2 3 mean St Dev n St Error t95%,2 +/- 95% CI
NTU NTU NTU NTU NTU NTUFilter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38
2.1
4.2 4.3
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Filter 1 Filter 2 Filter 3
Portable Water Filter
Tu
rbid
ity
(NT
U)
Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.
Example 2Example 2Turbidity Measurements
Time 1 2 3 mean St Dev n St Error t95,2 +/- 95% CIMin NTU NTU NTU NTU NTU NTU1 4.3 4.5 4.6 4.5 0.15 3 0.09 4.30 0.382 4.4 4.4 4.5 4.4 0.06 3 0.03 4.30 0.143 4.3 4.2 4.2 4.2 0.06 3 0.03 4.30 0.14
0.0
1.0
2.0
3.0
4.0
5.0
6.0
0 1 2 3 4
Time (min)
Tu
rbid
ity
(NT
U)
What can we do?What can we do?
Plot mean water quality data for Plot mean water quality data for various filters with error barsvarious filters with error bars
Plot mean water quality over time with Plot mean water quality over time with error barserror bars
Comparing Filter PerformanceComparing Filter Performance
Use t test to determine if the mean of Use t test to determine if the mean of two populations are different.two populations are different.– Based on two data setsBased on two data sets
E.g., turbidity produced by two different filtersE.g., turbidity produced by two different filters
Comparing Two Data Sets using the t Comparing Two Data Sets using the t testtest
Example - You pump 20 gallons of water Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, through filter 1 and 2. After every gallon, you measure the turbidity.you measure the turbidity.– Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20– Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20
You ask the question - Do the Filters make You ask the question - Do the Filters make water with a different mean turbidity?water with a different mean turbidity?
Do the Filters make different Do the Filters make different water?water?
Use TTEST (Excel)Use TTEST (Excel)
Fractional probability of being wrongFractional probability of being wrong if you answer if you answer yesyes– We want probability to be small We want probability to be small
0.01 to 0.10 (1 to 10 %). 0.01 to 0.10 (1 to 10 %). Use 0.01Use 0.01
Filter 1 Filter 21.5 3
2 2.42.2 2.21.8 2.6
3 3.41.6 3.61.2 3.82.1 3.51.9 2.72.2 2.42.6 3.51.7 3.81.8 2.11.5 2.52.4 3.42.5 3.32.7 2.41.4 3.61.5 2.32.6 3.7
““t test” Questionst test” Questions Do two filters make different water?Do two filters make different water?
– Take multiple measurements of a particular water Take multiple measurements of a particular water quality parameter for 2 filtersquality parameter for 2 filters
Do two filters treat difference amounts of Do two filters treat difference amounts of water between cleanings?water between cleanings?– Measure amount of water filtered between Measure amount of water filtered between
cleanings for two filterscleanings for two filters Does the amount of water a filter treats Does the amount of water a filter treats
between cleaning differ after a certain between cleaning differ after a certain amount of water is treated?amount of water is treated?– For a single filter, measure the amount of water For a single filter, measure the amount of water
treated between cleanings before and after a treated between cleanings before and after a certain total amount of water is treatedcertain total amount of water is treated
Linear RegressionLinear Regression
Fit the best straight line to a data setFit the best straight line to a data set
y = 1.897x + 0.8667
R2 = 0.9762
0
5
10
15
20
25
0 2 4 6 8 10 12
Height (m)
Gra
de
Po
int
Av
erag
e
Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.
RR22 - Coefficient of multiple - Coefficient of multiple DeterminationDetermination
2
i
2i
2i
2ii2
yy
yy
yy
yy1R
ŷi = Predicted y values, from regression equationyi = Observed y values
R2 = fraction of variance explained by regression (variance = standard deviation squared)= 1 if data lies along a straight line
Recommended