Tryptone task

The Tryptone Task

Group 7Yuwu ChenAlfonso R Croeze

Introduction

Staphylococcus aureus is a bacterium, commonly found on skin and in the respiratory tract, that can cause ailments such as skin infections and respiratory diseases.

Like other bacteria, Staphylococcus aureus can be grown in medical laboratories to aid in identifying and treating skin conditions.

Poor growth rates of Methicillin resistant Staphylococcus aureus (MRSA) in one laboratory prompted the investigators to experiment with different culturing conditions.

Five strains of MRSA were examined in this experiment. Due to their complex names, they are referred to as 1, 2, 3, 4, and 5 in the data.

Data Description The tryptone dataset contains bacteria counts after the culturing of

five strains of Staphylococcus aureus.

The data was collected by Gavin Cooper at the Auckland University

of Technology, New Zealand. The full dataset:

http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt

No missing values.

Tests on (a) factorial models with interactions to identify significant

factors, (b) optimal conditions estimated by partial differentiation.

http://www.amstat.org/publications/jse/datasets/Tryptone.dat.txt

Data Description Treatments:

Time - In hours: 24 and 48

Temperature - Temperature of incubation in degrees Celcius: 27,

35, 43

Concentration - The concentration of the nutrient tryptone as a

percentage: 0.6, 0.8, 1.0, 1.2, 1.4

Block:

Count column - Five count columns: 1, 2, 3, 4, 5

Redundant variable:

Row - this is the case number

Response (dependent) variable:

Strain counts - Bacteria counts: 3 to 284

Data Management Data transformation

The original dataset shows aspects of both multivariate data, where the count column variable is arranged in columns, and univariate data, where the levels of the time, temperature and concentration variables respectively are listed in three columns.

Row Count1 Count2 Count3 Count4 Count5 Time Temp Conc 1 9 3 10 14 33 24 27 0.6 2 16 12 26 20 31 24 27 0.8

Strain counts, which are analyzed in a univariate procedure, are recorded in different count columns: they must be placed in a single column. The count column variable should be in its own single column as well.

Data was transformed by SAS code:Input row count1 count2 count3 count4 count5 time temp conc;column = 1; count = count1; output strain;column = 2; count = count2; output strain;

The new dataset:

The new dataset strain and the complete SAS code are in the output files.

Obs time temp conc column count1 24 27 0.6 1 9

2 24 27 0.6 2 3

3 24 27 0.6 3 10

Data Management Balance check:

When fixing the treatment “time”, the tables below demonstrate that all 12 combinations of the other two treatments exist, and that the frequency of replicates in each combination is the same.

Similarly, when fixing variable concentration or temperature, the frequency tables show that the experiment is balanced. (These results are shown in the output files.) α = 0.05 is used for the entire analysis.

Table 1 of temp by concControlling for time=24

temp concFrequency 0.6 0.8 1 1.2 1.4 Total

27 5 5 5 5 5 25

35 5 5 5 5 5 25

43 5 5 5 5 5 25

Total 15 15 15 15 15 75

Table 2 of temp by concControlling for time=48

temp concFrequency 0.6 0.8 1 1.2 1.4 Total

27 5 5 5 5 5 25

35 5 5 5 5 5 25

43 5 5 5 5 5 25

Total 15 15 15 15 15 75

Data Summary Differences in means? Symmetric data? Homogeneous variances?

Figures below (left to right): distribution of count by time, temperature and

concentration.

First impressions from the box plots: In each treatment, means at different levels are quite different. In temperature treatments, the data is less symmetric, so possibly

not normal. The other two treatments looks more symmetric. In each treatment, the variances may not be equal to each other.

Method Description Step 1: Test on factorial models with interactions to identify significant

factors.

ANOVA test on factorial RBD, full model:

The variances are separated.

ANOVA test on factorial RBD, reduced model:

Homogeneous variance is assumed and the variance is pooled.

Step 2: Test for optimal conditions estimated by partial differentiation.

Multiple polynomial regression

The current protocols for culturing this bacteria have the time at 24

hours, the temperature at 35 degrees Celsius and the tryptone

concentration at 1.0%.

Step 1: Test on factorial models with interactions to identify significant factors

Full model vs. reduced model: which one is better?

Fit Statistics-2 Res Log Likelihood 1107.3

AIC (Smaller is Better) 1169.3

AICC (Smaller is Better) 1191.9

BIC (Smaller is Better) 1157.2

Full model: Reduced

model:Fit Statistics

-2 Res Log Likelihood 1148.5

AIC (Smaller is Better) 1152.5

AICC (Smaller is Better) 1152.6

BIC (Smaller is Better) 1151.7

The reduced model has the smaller AIC value, which indicates that it is

the better model.

The sources of variation and degrees of freedom:

Assumptions: Independence, normal distribution of residuals,

homogeneity of variances

Source degrees of freedom d.f.Tmt1 (Time) t1-1 1Tmt2 (Temperature) t2-1 2Tmt3 (Concentration) t3-1 4Block (Count column) b-1 4Interaction1 (Tmt1 * Tmt2) (t1-1)(t2-1) 2Interaction2 (Tmt1 * Tmt3) (t1-1)(t3-1) 4Interaction3 (Tmt1 * Tmt2) (t2-1)(t3-1) 8Interaction4 (Tmt1 * Tmt2 * Tmt3) (t1-1)(t2-1)(t3-1) 8Experimental Error (b-1)[(t1-1) + (t2-1) + (t3-1) (t1-1)(t2-1) + (t1-1)(t2-1) +

(t1-1)(t2-1) + (t1-1)(t2-1)(t3-1)]116

Total bt1t2t3-1 149

Block interactions are pooled into a single error term because of the assumption of no block interaction in RBD

ANOVA Test on factorial RBD, reduced model

Yes, as p-values of all three treatments are <0.05, we reject H0: μ1 = μ2=…= μt in each treatment.

According to the factorial RBD (reduced) model, do different levels in each treatment have significantly different effects on strain counts?

Type 3 Tests of Fixed Effects

EffectNum

DFDen DF F Value Pr > F

time 1 116 444.27 <.0001

temp 2 116 80.12 <.0001

conc 4 116 64.86 <.0001

Is there interaction between treatments?Type 3 Tests of Fixed Effects

EffectNum


time*temp 2 116 38.07 <.0001

time*conc 4 116 3.99 0.0046

temp*conc 8 116 0.85 0.5613

time*temp*conc 8 116 2.17 0.0343

The hypothesis of no significant interaction effect between time & temp was rejected.

The hypothesis of no significant interaction effect between time & conc was rejected.

The hypothesis of no significant interaction effect between temp & conc was NOT rejected.

The hypothesis of no significant interaction effect between three treatments was rejected.


Saxton’s Macro was applied to do a range test with the LSMeans output. e.g.:

Least Squares Means table gives the least squares estimate, the standard error of the estimate, etc.:

Which pairs of means in the one treatment are different, at a certain condition of other treatment levels?

Pairwise comparisons with TUKEY adjustments are shown in the “Differences of Least Squares Means” table.

Least Squares Means

Effect time temp conc EstimateStandard

Error DF t Value Pr > |t| Alpha Lower Uppertime 24 82.2800 3.4399 116 23.92 <.0001 0.05 75.4668 89.0932

time 48 162.75 3.4399 116 47.31 <.0001 0.05 155.93 169.56

temp 27 91.1200 3.9340 116 23.16 <.0001 0.05 83.3281 98.9119

Obs time temp conc EstimateStandard

Error Alpha Lower UpperLetter Group

1 48 _ _ 162.75 3.4399 0.05 155.93 169.56 A

2 24 _ _ 82.2800 3.4399 0.05 75.4668 89.0932 B

Effect=time Method=Tukey-Kramer(P<0.05) Set=1

The complete tables mentioned above are available in the output file.


Last part of the ANOVA is testing the hypothesis of normality:

P-value >0.05, so we fail to reject the hypothesis of normality in the residual distribution.

Contrasts to test linear/curved trend Temperature and concentration treatments are quantitative and equally

spaced, having 3 levels and 5 levels respectively. (Time has only 2 levels)

The results of the contrasts indicate that both linear and curved models can fit the data.

Contrasts

LabelNum


linear 1 116 57.32 <.0001quadratic 1 116 102.93 <.0001linear 1 116 189.36 <.0001quadratic 1 116 19.69 <.0001cubic 1 116 32.80 <.0001quartic 1 116 17.59 <.0001

First two rows are test results for the treatment Temp.

Last four rows are test results for the treatment Conc.

Tests for NormalityTest Statistic p ValueShapiro-Wilk W 0.988251 Pr < W 0.2392

Kolmogorov-Smirnov D 0.050081 Pr > D >0.1500

Cramer-von Mises W-Sq 0.040777 Pr > W-Sq >0.2500

Anderson-Darling A-Sq 0.333665 Pr > A-Sq >0.2500

Step 2: Test for optimal conditions estimated by partial differentiation

Multiple polynomial regression Three simple polynomial regressions are done separately, each

treatment with one polynomial regression. Sequentially adjusted Type I SS were used to determine whether the

polynomial model is as good as the one with a higher order term. Regression model:

Y = β0 + β1 Xi + β2 X2i +…+ βk Xki + ei

Based on the regression model, partial differentiation is used to

determine the optimal conditions. (Not displayed in this

presentation.)

Also, the fit plots are useful in finding the maxima.

Assumptions: Independence, normal distribution of residuals,

homogeneity of variances

Polynomial regression with “Time”

Is the linear effect significant?

Fit plot (count vs time)

Time has only 2 levels, fit with a linear model.

Source DF Type I SS Mean Square F Value Pr > Ftime 1 242808.1667 242808.1667 99.48 <.0001

Yes: p-value for linear <0.05, reject H0: β1 = 0.

Polynomial regression with “Time”

Polynomial regression model

Normality test: p-value <0.05, reject the hypothesis of normalityTests for Normality

Test Statistic p ValueShapiro-Wilk W 0.97416 Pr < W 0.0063

Kolmogorov-Smirnov D 0.064387 Pr > D 0.1302

Cramer-von Mises W-Sq 0.15658 Pr > W-Sq 0.0204

Anderson-Darling A-Sq 1.023263 Pr > A-Sq 0.0106

Parameter EstimateStandard

Error t Value Pr > |t|Intercept 1.813333333 12.75597911 0.14 0.8872

time 3.352777778 0.33614956 9.97 <.0001

Count = 1.813 + 3.352*Time

According to the regression model, the strain count increases with the time increase: 48 hours might get a higher strain count than 24 hours. The current protocol for culturing this bacteria has the time at 24 hours, so the statistical results do NOT support this protocol.

Polynomial regression with “Temperature”

Is the quadratic effect significant?

Fit plot (count vs. temperature)

Temperature has 3 levels, so it is fit with a quadratic model.

Yes: p-value for quadratic <0.05, reject H0: β2 = 0.Source DF Type I SS Mean Square F Value Pr > Ftemp 1 31329.00000 31329.00000 8.92 0.0033

temp*temp 1 56252.21333 56252.21333 16.01 <.0001

Polynomial regression with “Temperature”


Normality test: p-value <0.05, so we reject the hypothesis of normality

Count = -713.834 + 47.144*Temp – 0.642*Temp2

According to the regression model, the strain count has a maximum at Temp = 35 degrees. The current protocol for culturing this bacteria has the temperature at 35 degrees, so the results support this protocol.




Anderson-Darling A-Sq 1.229754 Pr > A-Sq <0.0050


Error t Value Pr > |t|Intercept -713.8343750 191.4866910 -3.73 0.0003

temp 47.1437500 11.2532848 4.19 <.0001

temp*temp -0.6418750 0.1604124 -4.00 <.0001

Polynomial regression with “Concentration”

Is the quartic effect significant?

Temperature has 5 levels, so we fit it with a quartic model.

No: p-value for quartic >0.05, do not reject H0: β4 = 0.

Source DF Type I SS Mean Square F Value Pr > Fconc 1 103490.6133 103490.6133 32.46 <.0001

conc*conc 1 10761.6095 10761.6095 3.38 0.0682

conc*conc*conc 1 17925.8700 17925.8700 5.62 0.0190

conc*conc*conc*conc 1 9612.8805 9612.8805 3.02 0.0846


Is the cubic effect significant?

Fit plot (count vs. concentration)

Now fit it with a cubic model.

Yes: p-value for quartic <0.05, reject H0: β3 = 0.Source DF Type I SS Mean Square F Value Pr > Fconc 1 103490.6133 103490.6133 32.02 <.0001

conc*conc 1 10761.6095 10761.6095 3.33 0.0701

conc*conc*conc 1 17925.8700 17925.8700 5.55 0.0198



Normality test: p-value <0.05, reject the hypothesis of normality

Count = 608.923 – 1960.155*Conc + 2289.077*Conc2 – 805.208*Conc3

According to the regression model, the strain count has a maximum at Conc = 1.2%. The current protocol for culturing this bacteria has the concentration at 1.0%, so the results do NOT support this protocol.




Anderson-Darling A-Sq 1.095717 Pr > A-Sq 0.0073


Error t Value Pr > |t|Intercept 608.922857 302.692620 2.01 0.0461

conc -1960.154762 989.107532 -1.98 0.0494

conc*conc 2289.077381 1028.037019 2.23 0.0275

conc*conc*conc -805.208333 341.898415 -2.36 0.0198

Conclusion

Polynomial regression models support the temperature in the current protocol for culturing Staphylococcus aureus. However, the models do not support the time and concentration in the protocol.

An ANOVA test on the factorial RBD was done, and the reduced model is better. Different levels in each treatment have significantly different effects on strain counts. There is a significant interaction effect between temperature & concentration. Other pair-wise comparisons can be found in the output.

The polynomial regression models did not meet the assumption of normality according to the Shapiro-Wilk criteria (although they do according to the Kolmogorov-Smirnov criteria). This might make the data analysis less reliable.

Reference “Using EDA, ANOVA and Regression to Optimize some Microbiology Data.”Journal of Statistics Education, Volume 12, Number 2 (July 2004)http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html

http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html

http://www.amstat.org/publications/jse/v12n2/datasets.binnie.html

Science

Tryptone task