Upload
yuwu-chen
View
25
Download
2
Embed Size (px)
Citation preview
The Tryptone Task
Group 7Yuwu ChenAlfonso R Croeze
Introduction
Staphylococcus aureus is a bacterium, commonly found on skin and in the respiratory tract, that can cause ailments such as skin infections and respiratory diseases.
Like other bacteria, Staphylococcus aureus can be grown in medical laboratories to aid in identifying and treating skin conditions.
Poor growth rates of Methicillin resistant Staphylococcus aureus (MRSA) in one laboratory prompted the investigators to experiment with different culturing conditions.
Five strains of MRSA were examined in this experiment. Due to their complex names, they are referred to as 1, 2, 3, 4, and 5 in the data.
Data Description The tryptone dataset contains bacteria counts after the culturing of
five strains of Staphylococcus aureus.
The data was collected by Gavin Cooper at the Auckland University
of Technology, New Zealand. The full dataset:
http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt
No missing values.
Tests on (a) factorial models with interactions to identify significant
factors, (b) optimal conditions estimated by partial differentiation.
Data Description Treatments:
Time - In hours: 24 and 48
Temperature - Temperature of incubation in degrees Celcius: 27,
35, 43
Concentration - The concentration of the nutrient tryptone as a
percentage: 0.6, 0.8, 1.0, 1.2, 1.4
Block:
Count column - Five count columns: 1, 2, 3, 4, 5
Redundant variable:
Row - this is the case number
Response (dependent) variable:
Strain counts - Bacteria counts: 3 to 284
Data Management Data transformation
The original dataset shows aspects of both multivariate data, where the count column variable is arranged in columns, and univariate data, where the levels of the time, temperature and concentration variables respectively are listed in three columns.
Row Count1 Count2 Count3 Count4 Count5 Time Temp Conc 1 9 3 10 14 33 24 27 0.6 2 16 12 26 20 31 24 27 0.8
Strain counts, which are analyzed in a univariate procedure, are recorded in different count columns: they must be placed in a single column. The count column variable should be in its own single column as well.
Data was transformed by SAS code:Input row count1 count2 count3 count4 count5 time temp conc;column = 1; count = count1; output strain;column = 2; count = count2; output strain;
The new dataset:
The new dataset strain and the complete SAS code are in the output files.
Obs time temp conc column count1 24 27 0.6 1 9
2 24 27 0.6 2 3
3 24 27 0.6 3 10
Data Management Balance check:
When fixing the treatment “time”, the tables below demonstrate that all 12 combinations of the other two treatments exist, and that the frequency of replicates in each combination is the same.
Similarly, when fixing variable concentration or temperature, the frequency tables show that the experiment is balanced. (These results are shown in the output files.) α = 0.05 is used for the entire analysis.
Table 1 of temp by concControlling for time=24
temp concFrequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75
Table 2 of temp by concControlling for time=48
temp concFrequency 0.6 0.8 1 1.2 1.4 Total
27 5 5 5 5 5 25
35 5 5 5 5 5 25
43 5 5 5 5 5 25
Total 15 15 15 15 15 75
Data Summary Differences in means? Symmetric data? Homogeneous variances?
Figures below (left to right): distribution of count by time, temperature and
concentration.
First impressions from the box plots: In each treatment, means at different levels are quite different. In temperature treatments, the data is less symmetric, so possibly
not normal. The other two treatments looks more symmetric. In each treatment, the variances may not be equal to each other.
Method Description Step 1: Test on factorial models with interactions to identify significant
factors.
ANOVA test on factorial RBD, full model:
The variances are separated.
ANOVA test on factorial RBD, reduced model:
Homogeneous variance is assumed and the variance is pooled.
Step 2: Test for optimal conditions estimated by partial differentiation.
Multiple polynomial regression
The current protocols for culturing this bacteria have the time at 24
hours, the temperature at 35 degrees Celsius and the tryptone
concentration at 1.0%.
Step 1: Test on factorial models with interactions to identify significant factors
Full model vs. reduced model: which one is better?
Fit Statistics-2 Res Log Likelihood 1107.3
AIC (Smaller is Better) 1169.3
AICC (Smaller is Better) 1191.9
BIC (Smaller is Better) 1157.2
Full model: Reduced
model:Fit Statistics
-2 Res Log Likelihood 1148.5
AIC (Smaller is Better) 1152.5
AICC (Smaller is Better) 1152.6
BIC (Smaller is Better) 1151.7
The reduced model has the smaller AIC value, which indicates that it is
the better model.
The sources of variation and degrees of freedom:
Assumptions: Independence, normal distribution of residuals,
homogeneity of variances
Source degrees of freedom d.f.Tmt1 (Time) t1-1 1Tmt2 (Temperature) t2-1 2Tmt3 (Concentration) t3-1 4Block (Count column) b-1 4Interaction1 (Tmt1 * Tmt2) (t1-1)(t2-1) 2Interaction2 (Tmt1 * Tmt3) (t1-1)(t3-1) 4Interaction3 (Tmt1 * Tmt2) (t2-1)(t3-1) 8Interaction4 (Tmt1 * Tmt2 * Tmt3) (t1-1)(t2-1)(t3-1) 8Experimental Error (b-1)[(t1-1) + (t2-1) + (t3-1) (t1-1)(t2-1) + (t1-1)(t2-1) +
(t1-1)(t2-1) + (t1-1)(t2-1)(t3-1)]116
Total bt1t2t3-1 149
Block interactions are pooled into a single error term because of the assumption of no block interaction in RBD
ANOVA Test on factorial RBD, reduced model
Yes, as p-values of all three treatments are <0.05, we reject H0: μ1 = μ2=…= μt in each treatment.
According to the factorial RBD (reduced) model, do different levels in each treatment have significantly different effects on strain counts?
Type 3 Tests of Fixed Effects
EffectNum
DFDen DF F Value Pr > F
time 1 116 444.27 <.0001
temp 2 116 80.12 <.0001
conc 4 116 64.86 <.0001
Is there interaction between treatments?Type 3 Tests of Fixed Effects
EffectNum
DFDen DF F Value Pr > F
time*temp 2 116 38.07 <.0001
time*conc 4 116 3.99 0.0046
temp*conc 8 116 0.85 0.5613
time*temp*conc 8 116 2.17 0.0343
The hypothesis of no significant interaction effect between time & temp was rejected.
The hypothesis of no significant interaction effect between time & conc was rejected.
The hypothesis of no significant interaction effect between temp & conc was NOT rejected.
The hypothesis of no significant interaction effect between three treatments was rejected.
ANOVA Test on factorial RBD, reduced model
Saxton’s Macro was applied to do a range test with the LSMeans output. e.g.:
Least Squares Means table gives the least squares estimate, the standard error of the estimate, etc.:
Which pairs of means in the one treatment are different, at a certain condition of other treatment levels?
Pairwise comparisons with TUKEY adjustments are shown in the “Differences of Least Squares Means” table.
Least Squares Means
Effect time temp conc EstimateStandard
Error DF t Value Pr > |t| Alpha Lower Uppertime 24 82.2800 3.4399 116 23.92 <.0001 0.05 75.4668 89.0932
time 48 162.75 3.4399 116 47.31 <.0001 0.05 155.93 169.56
temp 27 91.1200 3.9340 116 23.16 <.0001 0.05 83.3281 98.9119
Obs time temp conc EstimateStandard
Error Alpha Lower UpperLetter Group
1 48 _ _ 162.75 3.4399 0.05 155.93 169.56 A
2 24 _ _ 82.2800 3.4399 0.05 75.4668 89.0932 B
Effect=time Method=Tukey-Kramer(P<0.05) Set=1
The complete tables mentioned above are available in the output file.
ANOVA Test on factorial RBD, reduced model
Last part of the ANOVA is testing the hypothesis of normality:
P-value >0.05, so we fail to reject the hypothesis of normality in the residual distribution.
Contrasts to test linear/curved trend Temperature and concentration treatments are quantitative and equally
spaced, having 3 levels and 5 levels respectively. (Time has only 2 levels)
The results of the contrasts indicate that both linear and curved models can fit the data.
Contrasts
LabelNum
DFDen DF F Value Pr > F
linear 1 116 57.32 <.0001quadratic 1 116 102.93 <.0001linear 1 116 189.36 <.0001quadratic 1 116 19.69 <.0001cubic 1 116 32.80 <.0001quartic 1 116 17.59 <.0001
First two rows are test results for the treatment Temp.
Last four rows are test results for the treatment Conc.
Tests for NormalityTest Statistic p ValueShapiro-Wilk W 0.988251 Pr < W 0.2392
Kolmogorov-Smirnov D 0.050081 Pr > D >0.1500
Cramer-von Mises W-Sq 0.040777 Pr > W-Sq >0.2500
Anderson-Darling A-Sq 0.333665 Pr > A-Sq >0.2500
Step 2: Test for optimal conditions estimated by partial differentiation
Multiple polynomial regression Three simple polynomial regressions are done separately, each
treatment with one polynomial regression. Sequentially adjusted Type I SS were used to determine whether the
polynomial model is as good as the one with a higher order term. Regression model:
Y = β0 + β1 Xi + β2 X2i +…+ βk Xki + ei
Based on the regression model, partial differentiation is used to
determine the optimal conditions. (Not displayed in this
presentation.)
Also, the fit plots are useful in finding the maxima.
Assumptions: Independence, normal distribution of residuals,
homogeneity of variances
Polynomial regression with “Time”
Is the linear effect significant?
Fit plot (count vs time)
Time has only 2 levels, fit with a linear model.
Source DF Type I SS Mean Square F Value Pr > Ftime 1 242808.1667 242808.1667 99.48 <.0001
Yes: p-value for linear <0.05, reject H0: β1 = 0.
Polynomial regression with “Time”
Polynomial regression model
Normality test: p-value <0.05, reject the hypothesis of normalityTests for Normality
Test Statistic p ValueShapiro-Wilk W 0.97416 Pr < W 0.0063
Kolmogorov-Smirnov D 0.064387 Pr > D 0.1302
Cramer-von Mises W-Sq 0.15658 Pr > W-Sq 0.0204
Anderson-Darling A-Sq 1.023263 Pr > A-Sq 0.0106
Parameter EstimateStandard
Error t Value Pr > |t|Intercept 1.813333333 12.75597911 0.14 0.8872
time 3.352777778 0.33614956 9.97 <.0001
Count = 1.813 + 3.352*Time
According to the regression model, the strain count increases with the time increase: 48 hours might get a higher strain count than 24 hours. The current protocol for culturing this bacteria has the time at 24 hours, so the statistical results do NOT support this protocol.
Polynomial regression with “Temperature”
Is the quadratic effect significant?
Fit plot (count vs. temperature)
Temperature has 3 levels, so it is fit with a quadratic model.
Yes: p-value for quadratic <0.05, reject H0: β2 = 0.Source DF Type I SS Mean Square F Value Pr > Ftemp 1 31329.00000 31329.00000 8.92 0.0033
temp*temp 1 56252.21333 56252.21333 16.01 <.0001
Polynomial regression with “Temperature”
Polynomial regression model
Normality test: p-value <0.05, so we reject the hypothesis of normality
Count = -713.834 + 47.144*Temp – 0.642*Temp2
According to the regression model, the strain count has a maximum at Temp = 35 degrees. The current protocol for culturing this bacteria has the temperature at 35 degrees, so the results support this protocol.
Tests for NormalityTest Statistic p ValueShapiro-Wilk W 0.966924 Pr < W 0.0011
Kolmogorov-Smirnov D 0.067926 Pr > D 0.0888
Cramer-von Mises W-Sq 0.182315 Pr > W-Sq 0.0089
Anderson-Darling A-Sq 1.229754 Pr > A-Sq <0.0050
Parameter EstimateStandard
Error t Value Pr > |t|Intercept -713.8343750 191.4866910 -3.73 0.0003
temp 47.1437500 11.2532848 4.19 <.0001
temp*temp -0.6418750 0.1604124 -4.00 <.0001
Polynomial regression with “Concentration”
Is the quartic effect significant?
Temperature has 5 levels, so we fit it with a quartic model.
No: p-value for quartic >0.05, do not reject H0: β4 = 0.
Source DF Type I SS Mean Square F Value Pr > Fconc 1 103490.6133 103490.6133 32.46 <.0001
conc*conc 1 10761.6095 10761.6095 3.38 0.0682
conc*conc*conc 1 17925.8700 17925.8700 5.62 0.0190
conc*conc*conc*conc 1 9612.8805 9612.8805 3.02 0.0846
Polynomial regression with “Concentration”
Is the cubic effect significant?
Fit plot (count vs. concentration)
Now fit it with a cubic model.
Yes: p-value for quartic <0.05, reject H0: β3 = 0.Source DF Type I SS Mean Square F Value Pr > Fconc 1 103490.6133 103490.6133 32.02 <.0001
conc*conc 1 10761.6095 10761.6095 3.33 0.0701
conc*conc*conc 1 17925.8700 17925.8700 5.55 0.0198
Polynomial regression with “Concentration”
Polynomial regression model
Normality test: p-value <0.05, reject the hypothesis of normality
Count = 608.923 – 1960.155*Conc + 2289.077*Conc2 – 805.208*Conc3
According to the regression model, the strain count has a maximum at Conc = 1.2%. The current protocol for culturing this bacteria has the concentration at 1.0%, so the results do NOT support this protocol.
Tests for NormalityTest Statistic p ValueShapiro-Wilk W 0.978016 Pr < W 0.0166
Kolmogorov-Smirnov D 0.069177 Pr > D 0.0787
Cramer-von Mises W-Sq 0.161641 Pr > W-Sq 0.0179
Anderson-Darling A-Sq 1.095717 Pr > A-Sq 0.0073
Parameter EstimateStandard
Error t Value Pr > |t|Intercept 608.922857 302.692620 2.01 0.0461
conc -1960.154762 989.107532 -1.98 0.0494
conc*conc 2289.077381 1028.037019 2.23 0.0275
conc*conc*conc -805.208333 341.898415 -2.36 0.0198
Conclusion
Polynomial regression models support the temperature in the current protocol for culturing Staphylococcus aureus. However, the models do not support the time and concentration in the protocol.
An ANOVA test on the factorial RBD was done, and the reduced model is better. Different levels in each treatment have significantly different effects on strain counts. There is a significant interaction effect between temperature & concentration. Other pair-wise comparisons can be found in the output.
The polynomial regression models did not meet the assumption of normality according to the Shapiro-Wilk criteria (although they do according to the Kolmogorov-Smirnov criteria). This might make the data analysis less reliable.
Reference “Using EDA, ANOVA and Regression to Optimize some Microbiology Data.”Journal of Statistics Education, Volume 12, Number 2 (July 2004)http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html