25
Modify—use bio. IB book Modify—use bio. IB book IB Biology Topic 1: IB Biology Topic 1: Statistical Analysis Statistical Analysis http://www.patana.ac.th/S econdary/Science/c4b/1/st at1.htm

Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis ary/Science/c4b/1/stat1.htm

Embed Size (px)

Citation preview

Page 1: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Modify—use bio. IB book Modify—use bio. IB book IB Biology Topic 1: IB Biology Topic 1: Statistical AnalysisStatistical Analysis

http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm

Page 2: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

An investigation of shell length An investigation of shell length variation in a mollusc speciesvariation in a mollusc species

• A marine gastropod (A marine gastropod (Thersites bipartitaThersites bipartita) has ) has been sampled from two different locations: been sampled from two different locations: – Sample A: Shells found in full marine conditionsSample A: Shells found in full marine conditions– Sample B: Shells found in brackish water Sample B: Shells found in brackish water

conditions. conditions.

• sample size = 10 shellssample size = 10 shells• length of the shell measured as shownlength of the shell measured as shown

Page 3: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Analysis of Gastropod DataAnalysis of Gastropod Data

• measured height of shells (ruler)measured height of shells (ruler)• Units: mm + / - 1 mm (ERROR)Units: mm + / - 1 mm (ERROR)• Significant digitsSignificant digits• Uncertainty Uncertainty

– all measuring devices! all measuring devices! – reflects the precision of the measurementreflects the precision of the measurement

• There should be no variation in the precision of raw There should be no variation in the precision of raw datadata

must be consistent.must be consistent.

Page 4: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.1 Error bars and the 1.1.1 Error bars and the representation of variability in data.representation of variability in data.

• Biological systems are subject to a genetic Biological systems are subject to a genetic program and program and environmental variationenvironmental variation

• collect a set of data collect a set of data it shows variation it shows variation• Graphs: show variation using error barsGraphs: show variation using error bars

– show range of the data or show range of the data or – standard deviationstandard deviation

Page 5: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Mean & Range for each groupMean & Range for each group

• MarineMarine

• BrackishBrackish

Page 6: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Graph Mean & Range for each groupGraph Mean & Range for each group

• Quick Quick comparison comparison of the 2 of the 2 data setsdata sets

Page 7: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.2 Calculation of Mean and Std Dev1.1.2 Calculation of Mean and Std Dev

• 3 classes of data3 classes of data• MeanMean

– arithmetic mean (avg): measure of the central arithmetic mean (avg): measure of the central tendency (middle value) tendency (middle value)

• Std DevStd Dev– Measures spread around the meanMeasures spread around the mean– Measure of variation or accuracy of measurementMeasure of variation or accuracy of measurement

Page 8: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.2 Calculation of Mean and Std Dev1.1.2 Calculation of Mean and Std Dev

• Std Dev of sample = sStd Dev of sample = s

• is for the sample is for the sample notnot the the total total populationpopulation

• Pop 1. Mean = 31.4Pop 1. Mean = 31.4

s = 5.7s = 5.7• Pop 2. Mean =41.6 Pop 2. Mean =41.6

s = 4.3s = 4.3

Page 9: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Graphing Mean and Std Dev: Error BarsGraphing Mean and Std Dev: Error Bars

• Mean +/- 1 std devMean +/- 1 std dev• no overlap between no overlap between

these two populationsthese two populations

• The question being The question being considered is:considered is:– Is there a significant Is there a significant

difference between the difference between the two samples from two samples from different locations?different locations?

• oror– Are the differences in Are the differences in

the two samples just the two samples just due to chance due to chance selection? selection?

Page 10: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Graphing Mean and Std Dev: Error BarsGraphing Mean and Std Dev: Error Bars

StdDev graph compares StdDev graph compares 68% of the population68% of the population % begins to show that % begins to show that they look different.they look different.

Range graph :Range graph : misleads us to think misleads us to think the data may be similarthe data may be similar

Page 11: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.3 Standard deviation and the 1.1.3 Standard deviation and the spread of values around the mean.spread of values around the mean.

1.1. StdDev is a measure of how spread out the StdDev is a measure of how spread out the data values are from the mean. data values are from the mean.

2.2. Assume: Assume: 1.1. normal distribution of values around the normal distribution of values around the

mean mean 2.2. data not skewed to either enddata not skewed to either end

3.3. 68%68% of all the data values in a sample can of all the data values in a sample can be found between the mean +/- 1 standard be found between the mean +/- 1 standard deviationdeviation

Page 12: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro

• Animation of mean and standard deviation

Page 13: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.3 Standard deviation and the 1.1.3 Standard deviation and the spread of values around the mean.spread of values around the mean.

4. 4. 95%95% of all the data values in a sample can of all the data values in a sample can be found between the mean + 2s and the be found between the mean + 2s and the mean -2s.mean -2s.

Page 14: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.4 Comparing means and standard 1.1.4 Comparing means and standard deviations of 2 or more samples.deviations of 2 or more samples.

Sample w/ small StdDev suggests narrow variation Sample w/ small StdDev suggests narrow variation Sample w/ larger StdDev suggests wider variationSample w/ larger StdDev suggests wider variation

Example: molluscsExample: molluscs Pop 1. Mean = 31.4 Standard deviation(s)= 5.7Pop 1. Mean = 31.4 Standard deviation(s)= 5.7

Pop 2. Mean =41.6 Standard deviation(s) = 4.3Pop 2. Mean =41.6 Standard deviation(s) = 4.3

Page 15: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.4 Comparing means and standard 1.1.4 Comparing means and standard deviations of 2 or more samples.deviations of 2 or more samples.

Pop 2 has a Pop 2 has a greater mean shell lengthgreater mean shell length but but slightly narrower variationslightly narrower variation. .

WhyWhy this is the case would require further this is the case would require further observation and experiment on observation and experiment on environmental and genetic factors.environmental and genetic factors.

http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro

Page 16: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.5 Comparing 2 samples with t-Test1.1.5 Comparing 2 samples with t-Test

Null Hypothesis:Null Hypothesis: There is no significant difference between There is no significant difference between

the two samples except as caused by the two samples except as caused by chance selection of data.chance selection of data.

OROR

Alternative hypothesis:Alternative hypothesis: There is a significant difference between There is a significant difference between

the height of shells in sample A and sample the height of shells in sample A and sample B.B.

http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro

Page 17: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.5 Comparing 2 samples with t-Test1.1.5 Comparing 2 samples with t-Test

For the examples you'll use in biology, tails is always 2 , and type can be:For the examples you'll use in biology, tails is always 2 , and type can be:1, paired1, paired2,Two samples equal variance2,Two samples equal variance3, Two samples unequal variance 3, Two samples unequal variance

Page 18: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Good idea to graph itGood idea to graph it

• Bar chartBar chart

• Error barsError bars

• Stats Stats

Page 19: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

T-test: Are the mollusc shells from the T-test: Are the mollusc shells from the two locations significantly different? two locations significantly different?

• T-test tells you the probability (P) that the 2 T-test tells you the probability (P) that the 2 sets are basically the same. (null hypothesis) sets are basically the same. (null hypothesis)

• P varies from 0 (not likely) to 1 (certain). P varies from 0 (not likely) to 1 (certain). – higher P = more likely that the two sets are the higher P = more likely that the two sets are the

samesame, and that any differences are just due to , and that any differences are just due to random chancerandom chance. .

– lower P = more likely that that the two sets are lower P = more likely that that the two sets are significantly differentsignificantly different, and that any differences are , and that any differences are realreal. .

Page 20: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

T-test: Are the mollusc shells from the T-test: Are the mollusc shells from the two locations significantly different? two locations significantly different?

• In biology the critical P is usually 0.05 (5%) In biology the critical P is usually 0.05 (5%) (biology experiments are expected to (biology experiments are expected to produce quite varied results)produce quite varied results)– If P > 5% then the two sets are the same If P > 5% then the two sets are the same

• (i.e. accept the null hypothesis).(i.e. accept the null hypothesis).

– If P < 5% then the two sets are different If P < 5% then the two sets are different • (i.e. reject the null hypothesis).(i.e. reject the null hypothesis).

• For t test, # replicates as large as possibleFor t test, # replicates as large as possible– At least > 5At least > 5

Page 21: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Drawing Conclusions Drawing Conclusions

1. State null hypothesis & alternative hypothesis 1. State null hypothesis & alternative hypothesis (based on research ?)(based on research ?)

2. Set critical P level at P=0.05 (5%)2. Set critical P level at P=0.05 (5%)3. Write the decision rule—3. Write the decision rule— If P > 5% then the two sets are the same (i.e. accept If P > 5% then the two sets are the same (i.e. accept

the null hypothesis).the null hypothesis). If P < 5% then the two sets are different (i.e. reject If P < 5% then the two sets are different (i.e. reject

the null hypothesis).the null hypothesis).4. Write a summary statement based on the decision.4. Write a summary statement based on the decision. The null hypothesis is rejected since calculated The null hypothesis is rejected since calculated

P = 0.003 (< 0.05; two-tailed test).P = 0.003 (< 0.05; two-tailed test).5. Write a statement of results in standard English.5. Write a statement of results in standard English. There is a significant difference between the height There is a significant difference between the height

of shells in sample A and sample B. of shells in sample A and sample B.

Page 22: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

1.1.6 Correlation & Causation 1.1.6 Correlation & Causation

• Sometimes you’re looking for an Sometimes you’re looking for an associationassociation between variables.between variables.

• Correlations see if 2 variables vary Correlations see if 2 variables vary togethertogether+1 = perfect positive correlation+1 = perfect positive correlation

0 = no correlation0 = no correlation

-1 = perfect negative correlation-1 = perfect negative correlation

• Relations see how 1 variable affects anotherRelations see how 1 variable affects another

Page 23: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Pearson correlation (r) Pearson correlation (r)

• Data are continuous Data are continuous & normally & normally distributeddistributed

Page 24: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Spearman’s rank-order correlation (r s) Spearman’s rank-order correlation (r s)

• Data are not continuous Data are not continuous & normally distributed& normally distributed

• Usually scatterplot for Usually scatterplot for either type of correlationeither type of correlation

• both correlation both correlation coefficients indicate a coefficients indicate a strong + corr.strong + corr.– large females pair with large females pair with

large maleslarge males– Don’t know why, but it Don’t know why, but it

shows there is a shows there is a correlation to investigate correlation to investigate further.further.

Page 25: Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis  ary/Science/c4b/1/stat1.htm

Causative: Use linear regressionCausative: Use linear regression

• Fits a Fits a straight line straight line to datato data

• Gives slope Gives slope & intercept& intercept– m and c in m and c in

the equation the equation y = mx + cy = mx + c

Doesn’t PROVE causation, but Doesn’t PROVE causation, but suggests it...need further investigation! suggests it...need further investigation!