Upload
jose-juan-gongora-cortes
View
216
Download
0
Embed Size (px)
Citation preview
7/30/2019 C2166_ch29_Principle of Medical Statistics
1/16
29
Analysis of Variance
CONTENTS
29.1 Conceptual Background
29.1.1 Clinical Illustration
29.1.2 Analytic Principles
29.2 Fishers F Ratio
29.3 Analysis-of-Variance Table
29.4 Problems in Performance29.5 Problems of Interpretation
29.5.1 Quantitative Distinctions
29.5.2 Stochastic Nonsignificance
29.5.3 Stochastic Significance
29.5.4 Substantive Decisions
29.6 Additional Applications of ANOVA
29.6.1 Multi-Factor Arrangements
29.6.2 Nested Analyses
29.6.3 Analysis of Covariance
29.6.4 Repeated-Measures Arrangements
29.7 Non-Parametric Methods of Analysis29.8 Problems in Analysis of Trends
29.9 Use of ANOVA in Published Literature
References
The targeted analytic method called analysis of variance, sometimes cited acronymically as ANOVA,
was devised (like so many other procedures in statistics) by Sir Ronald A. Fisher. Although often marking
the conceptual boundary between elementary and advanced statistics, or between amateur fan and
professional connoisseur, ANOVA is sometimes regarded and taught as elementary enough to be used
for deriving subsequent simple procedures, such as the t test. Nevertheless, ANOVA is used much less
often today than formerly, for reasons to be noted in the discussions that follow.
29.1 Conceptual Background
The main distinguishing feature of ANOVA is that the independent variable contains polytomous
categories, which are analyzed simultaneously in relation to a dimensional or ordinal dependent (out-
come) variable.
Suppose treatments A, B, and C are tested for effects on blood pressure in a randomized trial. When
the results are examined, we want to determine whether one of the treatments differs significantly from
the others. With the statistical methods available thus far, the only way to answer this question would
be to do multiple comparisons for pairs of groups, contrasting results in group A vs. B, A vs. C, and Bvs. C. If more ambitious, we could compare A vs. the combined results of B and C, or group B vs. the
combined results of A and C, and so on. We could work out various other arrangements, but in each
7/30/2019 C2166_ch29_Principle of Medical Statistics
2/16
instance, the comparison would rely on contrasting two collected groups, because we currently know
no other strategy.
The analysis of variance allows a single simultaneous comparison for three or more groups. The result
becomes a type of screening test that indicates whether at least one group differs significantly from the
others, but further examination is needed to find the distinctive group(s). Despite this disadvantage, ANOVA
has been a widely used procedure, particularly by professional statisticians, who often like to apply it even
when simpler tactics are available. For example, when data are compared for only two groups, a t test or
Z test is simpler, and, as noted later, produces exactly the same results as ANOVA. Nevertheless, many
persons will do the two-group comparison (and report the results) with an analysis of variance.
29.1.1 Clinical Illustration
Although applicable in experimental trials, ANOVA has been most often used for observational studies.
A real-world example, shown in Figure 29.1, contains data for the survival times, in months, of a random
sample of 60 patients with lung cancer,1,2 having one of the four histologic categories of WELL (well-
differentiated), SMALL (small cell), ANAP (anaplastic), and CYTOL (cytology only). The other variable
(the five categories of TNM stage) listed in Figure 29.1 will be considered later. The main analytic
question now is whether histology in any of these groups has significantly different effects on survival.
29.1.1.1 Direct Examination The best thing to do with these data, before any formal statis -tical analyses begin, is to examine the results directly. In this instance, we can readily determine the group
sizes, means, and standard deviations for each of the four histologic categories and for the total. The results,
shown inTable 29.1, immediately suggest that the data do not have Gaussian distributions, because the
standard deviations are almost all larger than the means. Nevertheless, to allow the illustration to proceed,
the results can be further appraised. They show that the well-differentiated and small-cell groups, as
expected clinically, have the highest and lowest mean survival times, respectively. Because of relatively
small group sizes and non-Gaussian distributions, however, the distinctions may not be stochasticallysignificant.
Again before applying any advanced statistics, we can check these results stochastically by using simple
t tests. For the most obvious comparison of WELL vs. SMALL, we can use the components of Formula
[13.7] to calculate sp = = 21.96; (1/nA) + (1/nB) = (1/22) + (1/11) =.369; and = 24.43 4.45 = 19.98. These data could then be entered into Formula [13.7] toproduce t = 9.98/[(21.96)(.369)] = 2.47. At 31 d.f., the associated 2P value is about .02. From thisdistinction, we might also expect that all the other paired comparisons will not be stochastically
significant. (If you check the calculations, you will find that the appropriate 2P values are all >.05.)
29.1.1.2 Holistic and Multiple-Comparison Problems The foregoing comparisonindicates a significant difference in mean survival between the WELL and SMALL groups, but does
not answer the holistically phrased analytic question, which asked whether histology has significant
effects in any of the four groups in the entire collection. Besides, an argument could be made, using
TABLE 29.1
Summary of Survival Times in Four Histologic Groups
of Patients with Lung Cancer inFigure 29.1
Histologic
Category
Group
Size
Mean
Survival
Standard
Deviation
WELL 22 24.43 26.56
SMALL 11 4.45 3.77
ANAP 18 10.87 23.39
CYTOL 9 11.54 13.47
Total 60 14.77 22.29
21 26.56( )2 10 3.77( )2+[ ]/ 21 10+( )XA XB
7/30/2019 C2166_ch29_Principle of Medical Statistics
3/16
distinctions discussed in Section 25.2.1.1, that the contrast of WELL vs. SMALL was only one of the
six (4 3/2) possible paired comparisons for the four histologic categories. With the Bonferroni correc-tion, the working level of for each of the six comparisons would be .05/6 = .008. With the lattercriterion, the 2P value of about .02 for WELL vs. SMALL would no longer be stochastically significant.
We therefore need a new method to answer the original question. Instead of examining six pairs of
contrasted means, we can use a holistic approach by finding the grand mean of the data, determining
the deviations of each group of data from that mean, and analyzing those deviations appropriately.
OBS ID HISTOL TNMSTAGE SURVIVE
1 62 WELL I 82.32 107 WELL II 5.33 110 WELL IIIA 29.64 157 WELL I 20.35 163 WELL I 54.96 246 SMALL I 10.37 271 WELL IIIB 1.68 282 ANAP IIIA 7.69 302 WELL I 28.0
10 337 CYTOL I 12.811 344 WELL II 4.012 352 ANAP IIIA 1.313 371 WELL IIIB 14.114 387 SMALL IIIA 0.215 428 SMALL II 6.816 466 ANAP IIIB 1.417 513 ANAP I 0.118 548 ANAP IV 1.819 581 ANAP IV 6.020 605 CYTOL IV 1.021 609 CYTOL IV 6.222 628 SMALL IV 4.423 671 SMALL IV 5.524 764 SMALL IV 0.325 784 ANAP IV 1.626 804 WELL I 12.2
27 806 ANAP IIIB 6.528 815 WELL I 39.929 852 WELL IIIB 4.530 855 WELL II 1.631 891 CYTOL IIIB 8.132 892 WELL IIIB 62.033 931 CYTOL IIIB 8.834 998 WELL IIIB 0.235 1039 SMALL IV 0.636 1044 ANAP II 19.337 1054 WELL IIIB 0.638 1057 ANAP I 10.939 1155 ANAP I 0.240 1192 SMALL IV 11.241 1223 ANAP IV 0.942 1228 ANAP II 27.943 1303 ANAP IIIB 2.9
44 1309 ANAP II 99.945 1317 ANAP IV 4.746 1355 CYTOL IIIB 1.847 1361 WELL IV 1.048 1380 CYTOL IV 10.649 1405 SMALL IV 3.750 1444 WELL II 55.951 1509 SMALL IV 3.452 1515 WELL I 79.753 1521 ANAP IV 1.954 1556 ANAP IIIB 0.855 1567 SMALL IV 2.556 1608 CYTOL I 8.657 1612 WELL IIIA 13.358 1666 CYTOL IV 46.059 1702 WELL II 23.960 1738 WELL II 2.6
FIGURE 29.1Printout of data on histologic type, TNM Stage, and months of survival in a random sample of 60 patients with primary cancer
of the lung. [OBS = observation number in sample; ID = original indentification number; HISTOL = histology type; TNMSTAGE= one of five ordinal anatomic TNM stages for lung cancer; SURVIVE = survival time (mos.); WELL = well-differentiated;SMALL = small cell; ANAP = anaplastic; CYTOL = cytology only.]
7/30/2019 C2166_ch29_Principle of Medical Statistics
4/16
Many different symbols have been used to indicate the entities that are involved. In the illustration
here, Yij will represent the target variable (survival time) for person i in group j. For example, if WELL
is the first group inFigure 29.1, the eighth person in the group has Y8,1= 4.0. The mean of the valuesin group j will be = Yij/nj, where nj is the number of members in the group. Thus, for the last
group (cytology) inTable 29.1, n4= 9, Yi,4= 103.9, and = 103.9/9 = 11.54. The grand mean, ,will be (nj )/N, where N = nj= size of the total group under analysis. From the data inTable 29.1,G = [(22 24.43) + (11 4.45) + (18 10.87) + (9 11.54)]/60 = 885.93/60 = 14.77.
We can now determine the distance, , between each groups mean and the grand mean. Forthe ANAP group, the distance is 10.87 14.77 =3.90. For the other three groups, the distances are3.23 for CYTOL, 10.32 for SMALL, and +9.66 for WELL. This inspection confirms that the meansof the SMALL and WELL groups are most different from the grand mean, but the results contain no
attention to stochastic variation in the data.
29.1.2 Analytic Principles
To solve the stochastic challenge, we can use ANOVA, which like many other classical statisticalstrategies, expresses real world phenomena with mathematical models. We have already used such models
both implicitly and explicitly. In univariate statistics, the mean, , was an implicit model for fitting
a group of data from only the values in the single set of data. The measured deviations from that model,
Yi , were then converted to the groups basic variance, .In bivariate statistics for the associations in Chapters 18 and 19, we used an explicit model based on
an additional variable, expressed algebraically as = a + bXi. We then compared variances for threesets of deviations: Yi , between the items of data and the explicit model; Yi , between the itemsof data and the implicit model; and , between the explicit and implicit models. The group variances
or sums of squares associated with these deviations were called residual (or error) for ,
basic for , and model for ( )2.
29.1.2.1 Distinctions in Nomenclature The foregoing symbols and nomenclature havebeen simplified for the sake of clarity. In strict statistical reasoning, any set of observed data is regarded
as a sample from an unobserved population whose parameters are being estimated from the data. If
modeled with a straight line, the parametric population would be cited as Y = + X. When theresults for the observed data are expressed as = a + bXi, the coefficients a and b are estimates ofthe corresponding and parameters.
Also in strict reasoning, variance is an attribute of the parametric population. Terms such as
or , which are used to estimate the parametric variances, should be called sums
of squares, not group variances. The linguistic propriety has been violated here for two reasons: (1) the
distinctions are more easily understood when called variance, and (2) the violations constantly appear
in both published literature and computer print-outs. The usage here, although a departure from strictformalism, is probably better than in many discussions elsewhere where the sums of squares are called
variances instead ofgroup variances.
Another issue in nomenclature is syntactical rather than mathematical. In most English prose,
between is used for a distinction of two objects, and among for more than two. Nevertheless, in the
original description of the analysis of variance, R. A. Fisher used the preposition between rather than
among when more than two groups or classes were involved. The term between groups has been
perpetuated by subsequent writers, much to the delight of English-prose pedants who may denounce
the absence of literacy in mathematical technocracy. Nevertheless, Fisher and his successors have been
quite correct in maintaining between. Its use for the cited purpose is approved by diverse high-echelon
authorities, including the Oxford English Dictionary, which states that between has been, from its
earliest appearance, extended to more than two.3 [As one of the potential pedants, I was ready to useamong in this text until I checked the dictionary and became enlightened.]
29.1.2.2 Partition of Group Variance The same type of partitioning that was used forgroup variance in linear regression is also applied in ANOVA. Conceptually, however, the models are
Yj
Y4 GYj
Yj G
Y
Y Yi Y( )2
Yi
Yi Y
Yi Y
Yi Yi( )2
Yi Y( )2
Yi Y
Yi
Yi Y( )2
Yi Y( )2
7/30/2019 C2166_ch29_Principle of Medical Statistics
5/16
expressed differently. Symbolically, each observation can be labelled Y ij, with j representing the group
and i, the person (or other observed entity) within the group. The grand mean, , is used for the implicit
model when the basic group or system variance, , is summed for the individual values of
Yi in all of the groups. The individual group means, , become the explicit models when the total
system is partitioned into groups. The residual group variance is the sum of the values of (Yi )2
within each of the groups. [In more accurate symbolism, the two cited group variances would be written
with double subscripts and summations as (Yij )2 and (Yij )2.] The model group variance,summed for each group of njmembers with group mean , is nj( )2. These results for data inthe four groups ofFigure 29.1 andTable 29.1are shown inTable 29.2.
Except for minor differences due to rounding, the components ofTable 29.2have the same structure
noted earlier for simple linear regression in Section 19.2.2. The structure is as follows:
{Basic Group Variance} = {Model Variance between Groups} + { Residual Variance within Groups}
or Syy = SM + SR.
The structure is similar to that of the deviations
Total Deviation = Model Deviation + Residual Deviation
which arises when each individual deviation is expressed in the algebraic identity
If is moved to the first part of the right side, the equation becomes
and is consistent with a parametric algebraic model that has the form
Yij=+j+ij
In this model, each persons value of Yij consists of three contributions: (1) from the grand parametric
mean, (which is estimated by ); (2) from the parametric increment, j (estimated by ),between the grand mean and group mean; and (3) from an error term, ij (estimated by Yij ), forthe increment between the observed value of Yij and the group mean.
For stochastic appraisal of results, the null hypothesis assumption is that the m groups have the same
parametric mean, i.e., 1=2= = j= =m.
TABLE 29.2
Group-Variance Partitions of Sums of Squares for the Four Histologic Groups
inFigure 29.1andTable 29.1
Group
Basic
(Total System)
Model
(Between Groups)
Residual
(Within Groups)
WELL 16866.67 22(24.43 14.77)2 = 2052.94 14813.73SMALL 1313.52 11(4.45 14.77)2 = 1171.53 141.99ANAP 9576.88 18(10.87 14.77)2 = 273.78 9303.10CYTOL 1546.32 9(11.54 14.77)2 = 93.90 1452.42Total 29304.61* 3593.38* 25711.24
* These are the correct totals. They differ slightly from the sum of the collection of individual values, calculated
with rounding, in each column.
G
Y i G( )2
Yj
Yj
G Yj
Yj Yj G
Yij G Yj G( ) Yij Yj( )+=
G
Y ij G Yj G( ) Y ij Yj( )+ +=
G Yj GYj
7/30/2019 C2166_ch29_Principle of Medical Statistics
6/16
29.1.2.3 Mean Variances and Degrees of Freedom When divided by the associateddegrees of freedom, each of the foregoing group variances is converted to a mean value. For the basic
group variance, the total system contains N =nj members, and d.f. = N 1. For the model variance, them groups have m 1 degrees of freedom. For the residual variance, each group has nj 1 degrees of freedom,and the total d.f. for m groups is (nj 1) = N m.
The degrees of freedom are thus partitioned, like the group variances, into an expression that indicates
their sum as
N 1 = (m 1) + (N m)
The mean variances, however, no longer form an equal partition. Their symbols, and the associated
values in the example here, are as follows:
Mean Group Variance = Syy/(N 1) = 29304.61/59 = 496.69
Mean Model Variance = SM/(m 1) = 3593.38/3 = 1197.79(between groups)
Mean Residual Variance = SR/(N m) = 25711.24/56 = 459.13(within groups)
29.2 Fishers F Ratio
Under the null hypothesis of no real difference between the groupsi.e., the assumption that they have
the same parametric meaneach of the foregoing three mean variances can be regarded as a separate
estimate of the true parametric variance. Within the limits of stochastic variation in random sampling,
the three mean variances should equal one another.
To test stochastic significance, R. A. Fisher constructed a variance ratio, later designated as F, that is
expressed as
It can be cited symbolically as
[29.1]
If only two groups are being compared, some simple algebra will show that Formula [29.1] becomes
the square of the earlier Formula [13.7] for the calculation of t (or Z). This distinction is the reason why
the F ratio is sometimes used, instead of t (or Z), for contrasting two groups, as noted earlier inSection 13.3.6.
The Fisher ratio has a sampling distribution in which the associated 2P value is found for the values
of F at the two sets of degrees of freedom in values of m 1 and N m. The three components makethe distribution difficult to tabulate completely; and it is usually cited according to values for F for each
degree of freedom simultaneously at fixed values of 2P such as .1, .05, .01.
In the example under discussion here, the F ratio is 1197.79/459.13 = 2.61. In the Geigy tables4
available for the combination of 3 and 56 degrees of freedom, the required F values are 2.184 for 2P = .1,2.769 for 2P = .05, and 3.359 for 2P = .025. If only the Geigy values were available, the result wouldbe written as .05 < 2P < .1. In an appropriate computer program, however, the actual 2P value is usuallycalculated and displayed directly. In this instance, it was .0605.
If 2P is small enough to lead to rejection of the null hypothesis, the stochastic conclusion is that atleast one of the groups has a mean significantly different from the others. Because the counter-hypothesis
for the F test is always that the mean variance is larger between groups than within them, the null
hypothesis can promptly be conceded if the F ratio is < 1. In this instance, because the null hypothesiscannot be rejected at = .05, we cannot conclude that a significant difference in survival has been
Mean variance between groups
Mean variance within groups--------------------------------------------------------------------------
FSM/ m 1( )SR/ N m( )----------------------------=
7/30/2019 C2166_ch29_Principle of Medical Statistics
7/16
stochastically confirmed for the histologic categories. The observed quantitative distinctions seem
impressive, however, and would probably attain stochastic significance if the group sizes were larger.
29.3 Analysis-of-Variance Table
The results of an analysis of variance are commonly presented, in both published literature and computer
printouts, with a tabular arrangement that warrants special attention because it is used not only for
ANOVA but also for multivariable regression procedures that involve partitioning the sums of squared
deviations (SS) that form group variances.
In each situation, the results show the partition for the sums of squares of three entities: (1) the total
SS before imposition of an explicit model, (2) the SS between the explicit model and the original implicit
grand mean, and (3) the residual SS for the explicit model. The last of these entities is often called the
unexplained or error variance. Both of these terms are unfortunate because the mathematical expla-
nation is a statistical phenomenon that may have nothing to do with biologic mechanisms of explanation
and the error represents deviations between observed and estimated values, not mistakes or inaccuraciesin the basic data. In certain special arrangements, to be discussed shortly, the deviations receive an
additionally improved explanation when the model is enhanced with subdivisions of the main variable
or with the incorporation of additional variables.
Figure 29.2 shows the conventional headings for the ANOVA table of the histology example in
Figure 29.1. For this one-way analysis, the total results are divided into two rows of components. The
number of rows is appropriately expanded when more subgroups are formed (as discussed later) via
such mechanisms as subdivisions or inclusion of additional variables.
29.4 Problems in Performance
The mathematical reasoning used in many ANOVA arrangements was developed for an ideal experi-
mental world in which all the compared groups or subgroups had the same size. If four groups were
being compared, each group had the same number of members, so that n1= n2= n3= n4. If the groupswere further divided into subgroupssuch as men and women or young, middle-aged, and oldthe
subgroups had the same sizes within each group.
These equi-sized arrangements were easily attained for experiments in the world of agriculture, where
R. A. Fisher worked and developed his ideas about ANOVA. Equally sized groups and subgroups are
seldom achieved, however, in the realities of clinical and epidemiologic research. The absence of equal
sizes may then create a major problem in the operation of computer programs that rely on equal sizes,and that may be unable to manage data for other circumstances. For the latter situations, the computer
programs may divert ANOVA into the format of a general linear model, which is essentially a method
of multiple regression. One main reason, therefore, why regression methods are replacing ANOVA
Dependent Variable: SURVIVE
Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 3593.3800000 1197.7933333 2.61 0.0605
Error 56 25711.2333333 459.1291667
Corrected Total 59 29304.6133333
R-Square C.V. Root MSE SURVIVE Mean
0.122622 145.1059 21.427300 14.766667
FIGURE 29.2Printout of analysis-of-variance table for survival time in the four histologic groups ofFigure 29.1.
7/30/2019 C2166_ch29_Principle of Medical Statistics
8/16
methods today is that the automated regression methods can more easily process data for unequal-sized
groups and subgroups.
29.5 Problems of Interpretation
The results of an analysis of variance are often difficult to interpret for both quantitative and stochastic
reasons, as well as for substantive decisions.
29.5.1 Quantitative Distinctions
The results of ANOVA are almost always cited with F ratios and P values that indicate stochastic
accomplishments but not quantitative descriptive distinctions. The reader is thus left without a mechanism
to decide what has been accomplished quantitatively, while worrying that significant P values may
arise mainly from large group sizes.
Although not commonly used, a simple statistical index can provide a quantitative description of theresults. The index, called eta squared, was previously discussed in Section 27.2.2 as a counterpart of r2 for
proportionate reduction of group variance in linear regression. Labeled R-square in the printout of
Figure 29.2, the expression is
For the histologic data in Figure 29.2, this index is 3593.38/29304.61 = 0.12, representing a modestachievement, which barely exceeds the 10% noted earlier (see Section 19.3.3) as a minimum level for
quantitative significance in variance reduction.
29.5.2 Stochastic Nonsignificance
Another important issue is what to do when a result is notstochastically significant, i.e., P >. In previousanalytic methods, a confidence interval could be calculated around the nonsignificant increment, ratio,
or coefficient that described the observed d Odistinction in the results. If the upper end of this confidence
interval excluded a quantitatively significant value (such as ), the result could be called stochasticallynonsignificant. If the confidence interval included , the investigator might be reluctant to concede thenull hypothesis of no difference.
This type of reasoning would be equally pertinent for ANOVA, but is rarely used because the results
seldom receive a descriptive citation. Confidence intervals, although sometimes calculated for the mean
of each group, are almost never determined to give the value of eta the same type of upper and lowerconfidence boundaries that can be calculated around a correlation coefficient in simple linear regression.
In the absence of a confidence interval for eta, the main available descriptive approach is to examine
results in individual groups or in paired comparisons. If any of the results seem quantitatively significant,
the investigator, although still conceding the null hypothesis (because P >), can remain suspicious thata significant difference exists, but has not been confirmed stochastically. For example, in Figure 29.2,
the P value of 0.06 would not allow rejection of the null hypothesis that all group means are equal.
Nevertheless, the modestly impressive value of 0.12 for eta squared and the large increment noted earlier
between the WELL and SMALL group means suggest that the group sizes were too small for stochastic
confirmation of what is probably a quantitatively significant distinction.
29.5.3 Stochastic Significance
If P
7/30/2019 C2166_ch29_Principle of Medical Statistics
9/16
containing m groups will allow m(m 1)/2 paired comparisons when each groups mean is contrastedagainst the mean of every other group. With m additional paired comparisons between each group and
the total of the others, the total number of paired comparisons will be m(m + 1)/2. For example, thesmall-cell histologic group inTable 29.1could be compared against each of the three other groups and
also against their total. A particularly ingenious (or desperate) investigator might compare a single
group or paired groups against pairs (or yet other combinations) of the others.
This plethora of activities produces the multiple comparison problem discussed in Chapter 25, as well
as the multiple eponymous and striking titles (such as Tukeys honestly significant difference5) that have
been given to the procedures proposed for examining and solving the problem.
29.5.4 Substantive Decisions
Because the foregoing solutions all depend on arbitrary mathematical mechanisms, investigators who
are familiar with the substantive content of the data usually prefer to avoid the polytomous structure of
the analysis of variance. For example, a knowledgeable investigator might want to compare only the
SMALL vs. WELL groups with a direct 2-group contrast (such as a t test) in the histologic data, avoidingthe entire ANOVA process. An even more knowledgeable investigator, recognizing that survival can be
affected by many factors (such as TNM stage and age) other than histologic category, might not want
to do any type of histologic appraisal unless the other cogent variables have been suitably accounted for.
For all these reasons, ANOVA is a magnificent method of analyzing data if you are unfamiliar with
what the data really mean or represent. If you know the substantive content of the research, however,
and if you have specific ideas to be examined, you may want to use a simpler and more direct way of
examining them.
29.6 Additional Applications of ANOVA
From a series of mathematical models and diverse arrangements, the analysis of variance has a versatility,
analogous to that discussed earlier for chi square, that for many years made ANOVA the most commonly
used statistical procedure for analyzing complex data. In recent years, however, the ubiquitous availability
of computers has led to the frequent replacement of ANOVA by multiple regression procedures, whose
results are often easier to understand. Besides, ANOVA can mathematically be regarded as a subdivision
of the general-linear-model strategies used in multivariable regression analysis.
Accordingly, four of the many other applications of ANOVA are outlined here only briefly, mainly
so that you will have heard of them in case you meet them (particularly in older literature). Details
can be found in many statistical textbooks. The four procedures to be discussed are multi-factorarrangements, nested analyses, the analysis of covariance (ANCOVA), and repeated-measures arrange -
ments (including the intraclass correlation coefficient).
29.6.1 Multi-Factor Arrangements
The procedures discussed so far are called one-way analyses of variance, because only a single inde-
pendent variable (i.e., histologic category) was examined in relation to survival time. In many circum-
stances, however, two or more independent variables can be regarded as factors affecting the dependent
variable. When these additional factors are included, the analysis is called two-way(or two-factor), three-
way (or three-factor), etc.
For example, if the two factors of histologic category and TNM stage are considered simultaneously,the data for the 60 patients inFigure 29.1 would be arranged as shown in Table 29.3. The identification
of individual survival times would require triple subscripts: i for the person, j for the row, and k for
the column.
7/30/2019 C2166_ch29_Principle of Medical Statistics
10/16
29.6.1.1 Main Effects In the mathematical model of the two-way arrangement, thecategorical mean for each factorHistology and TNM Stagemakes a separate contribution, called
the main effect, beyond the grand mean. The remainder (or unexplained) deviation for each person is
called the residual error. Thus, a two-factor model for the two independent variables would express theobserved results as
[29.2]
The term here represents the grand mean. The next two terms represent the respective deviations of
each row mean ( ) and each column mean ( ) from the grand mean. The four components in the
last term for the residual deviation of each person are constructed as residuals that maintain the
algebraic identity. The total sum of squares in the system will be (Yijk )2, with N 1 degrees offreedom. There will be two sums of squares for the model, cited as nj( )2 for the row factor,and as nk( )2 for the column factor. The residual sum of squares will be the sum of all thevalues of (Yijk + )2.
Figure 29.3shows the printout of pertinent calculations for the data inTable 29.3. In the lower halfofFigure 29.3, the 4-category histologic variable has 3 degrees of freedom and its Type I SS (sum of
squares) and mean square, respectively, are the same 3593.38 and 1197.79 shown earlier. The 5-category
TNM-stage variable has 4 degrees of freedom and corresponding values of 3116.39 and 779.10. The
residual error group variance in the upper part of the table is now calculated differentlyas the corrected
TABLE 29.3
Two-Way Arrangement of Individual Data for Survival Time (i n Months)
of Patients with Lung Cancer
Histologic
Category
TNM Stage Mean for Total
Row CategoryI II IIIA IIIB IV
Well 82.3
20.3
54.9
28.0
12.2
39.9
79.7
5.3
4.0
1.6
55.9
23.9
2.6
29.6
13.3
1.6
14.1
4.5
62.0
0.2
0.6
1.0
24.43
Small 10.3 6.8 0.2 4.45.5
0.3
0.611.2
3.73.42.5
4.45
Anap 0.1
10.9
0.2
19.3
27.9
99.9
7.6
1.3
1.4
6.5
2.9
0.8
1.8
6.0
1.6
0.9
4.7
1.9
10.87
Cytol 12.88.6
8.18.8
1.8
1.06.2
10.6
46.0
11.54
Mean for Total
Column Category 27.7 24.72 10.40 8.72 5.96 14.77
Yijk G Yj G( ) Yk G( ) Y ij k Yj Yk G+( )+ + +=
G
Yj Yk
G
Yj G
Yk G
Yj Yk G
7/30/2019 C2166_ch29_Principle of Medical Statistics
11/16
total sum of squares minus the sum of Type I squares, which is a total of 6709.77 for the two factors
in the model. Since those two factors have 7 (=3 + 4) degrees of freedom, the mean square for the modelis 6709.77/7 = 958.54, and the d.f. in the error variance is 59 7 = 52. The mean square for the errorvariance becomes 22594.84/52 = 434.52. When calculated for this two-factor model, the F ratio of meansquares is 2.21, which now achieves a P value (marked Pr > F) just below .05. If the level is set at.05, this result is significant, whereas it was not so in the previous analysis for histology alone.
The label Type I SS is used because ANOVA calculations can also produce three other types of
sums of squares (marked II, III, and IV when presented) that vary with the order in which factors are
entered or removed in a model, and with consideration of the interactions discussed in the next section.
As shown in the lower section ofFigure 29.3, an F-ratio value can be calculated for each factor when
its mean square is divided by the error mean square. For histology, this ratio is 1197.79/434.52 = 2.76.For TNM stage, the corresponding value in the printout is 1.79. The corresponding 2P values are just
above .05 for histology and .14 for TNM stage.
29.6.1.2 Interactions In linear models, each factor is assumed to have its own separateadditive effect. In biologic reality, however, the conjunction of two factors may have an antagonistic or
synergistic effect beyond their individual actions, so that the whole differs from the sum of the parts.
For example, increasing weight and increasing blood pressure may each lead to increasing mortality,
but their combined effect may be particularly pronounced in persons who are at the extremes of obesity
and hypertension. Statisticians use the term interactions for these conjunctive effects; and the potential
for interactions is often considered whenever an analysis contains two or more factors.
To examine these effects in a two-factor analysis, the model for Yijk is expanded to contain aninteraction term. It is calculated, for the mean of each cell of the conjoined categories, as the deviation
from the product of mean values of the pertinent row and column variables for each cell. In the expression
of the equation for Yijk, the first three terms of Equation [29.2] are the same: , for the grand mean;
for each row; and for each column. Because the observed mean in each cell will be, the interaction effect will be the deviation estimated as + . The remaining
residual effect, used for calculating the residual sum of squares, is Yijk . For each sum of squares,the degrees of freedom are determined appropriately for the calculations of mean squares and F ratios.
The calculation of interaction effects can be illustrated with an example from the data ofTable 29.3for
the 7-member cell in the first row, first column. The grand mean is 14.77; the entire WELL histologic
category has a mean of 24.43; and TNM stage I has a mean of 27.71. The mean of the seven values in the
cited cell is (82.3 + 20.3 + + 79.7)/7 = 45.33. According to the algebraic equation, = 14.77; in thefirst row, ( ) = 24.43 14.77 = 9.66; and in the first column, ( ) = 27.71 14.77 = 12.94.The interaction effect in the cited cell will be estimated as 45.33 24.43 27.71 + 14.77 = 7.96. Theestimated value of the residual for each of the seven Y ijk values in the cited cell will be Yijk 7.96.
Dependent Variable: SURVIVE
Source DF Sum of Squares Mean Square F Value Pr > F
Model 7 6709.7729638 958.5389948 2.21 0.0486
Error 52 22594.8403695 434.5161610Corrected Total 59 29304.6133333
R-Square C.V. Root MSE SURVIVE Mean0.228966 141.1629 20.845051 14.766667
Source DF Type I SS Mean Square F Value Pr > F
HISTOL 3 3593.3800000 1197.7933333 2.76 0.0515TNMSTAGE 4 3116.3929638 779.0982410 1.79 0.1443
FIGURE 29.3Printout for 2-way ANOVA of data inFigure 29.1andTable 29.3.
G
Yj G Yk G
Yjk Yjk Yj Yk G
Yjk
GYj G Yk G
7/30/2019 C2166_ch29_Principle of Medical Statistics
12/16
Figure 29.4 shows the printout of the ANOVA table when an interaction model is used for the two-
factor data inTable 29.3. InFigure 29.4, the sums of squares (marked Type I SS) and mean squares for
histology and TNM stage are the same as inTable 29.3, and they also have the same degrees of freedom.
The degrees of freedom for the interaction are tricky to calculate, however. In this instance, because
some of the cells ofTable 29.3are empty or have only 1 member, we first calculate degrees for freedom
for the residual sum of squares, (Yijk )2. In each pertinent cell, located at (j, k) coordinates inthe table, the degrees of freedom will be njk 1. Working across and then downward through the cellsinTable 29.3, the sum of the njk 1 values will be 6 + 5 + 1 + 5 + 7 + 2 + 2 + 1 + 3 + 5 + 1 + 2 + 3= 43. (The values are 0 for the four cells with one member each and also for the 3 cells with no members.)This calculation shows that the model accounts for 59 43 = 16 d.f.; and as the two main factors havea total of 7 d.f., the interaction factor contributes 9 d.f. to the model, as shown in the last row of
Figure 29.4.
Calculated with the new mean square error term inFigure 29.4, the F values produce 2P values below
F
Model 16 13835.482381 864.717649 2.40 0.0114
Error 43 15469.130952 359.747231Corrected Total 59 29304.613333
R-Square C.V. Root MSE SURVIVE Mean0.472126 128.4447 18.967004 14.766667
Source DF Type I SS Mean Square F Value Pr > F
HISTOL 3 3593.3800000 1197.7933333 3.33 0.0282TNMSTAGE 4 3116.3929638 779.0982410 2.17 0.0890HISTOL*TNMSTAGE 9 7125.7094171 791.7454908 2.20 0.0408
FIGURE 29.4
Two-way ANOVA, with interaction component, for results inTable 29.3andFigure 29.3. [Printout from SAS PROC GLMcomputer program.]
Yjk
7/30/2019 C2166_ch29_Principle of Medical Statistics
13/16
allow maintenance of the ranks, TNM stage could be declared a covariate, which would then be analyzed
as though it had a dimensional scale.
The results of the covariance analysis are shown inFigure 29.5. Note that TNM stage now has only
1 degree of freedom, thus giving the model a total of 4 D.F., an F value of 3.61 and a P value of 0.0111,
despite a decline of R-square from .229 inFigure 29.3to .208 inFigure 29.5. The histology variable,
which had P = .052 inFigure 29.3now has P = .428; and TNM stage, with P = .144 in Figure 29.3, hasnow become highly significant at P = .0012. These dramatic changes indicate what can happen whenthe rank sequence is either ignored or appropriately analyzed for polytomous variables.
In past years, the effect of confounding or ranked covariates was often formally adjusted in an analysis
of covariance, using a complex set of computations and symbols. Today, however, the same adjustment
is almost always done with a multiple regression procedure. The adjustment process in ANCOVA is
actually a form of regression analysis in which the related effects of the covariate are determined by
regression and then removed from the error variance. The group means of the main factor are also
adjusted to correspond to a common value of the covariate. The subsequent analysis is presumably
more powerful in detecting the effects of the main factor, because the confounding effects have
presumably been removed. The process and results are usually much easier to understand, however,
when done with multiple linear regression.2
29.6.4 Repeated-Measures Arrangements
Repeated measures is the name given to analyses in which the same entity has been observed repeatedly.The repetitions can occur with changes over time, perhaps after interventions such as treatment, or with
examinations of the same (unchanged) entity by different observers or systems of measurement.
29.6.4.1 Temporal Changes The most common repeated-measures situation is an ordinarycrossover study, where the same patients receive treatments A and B. The effects of treatment A vs.
treatment B in each person can be subtracted and thereby reduced to a single group of increments, which
can be analyzed with a paired t test, as discussed in Section 7.8.2.2. The same analysis of increments
can be used for the before-and-after measurements of the effect in patients receiving a particular
treatment, such as the results shown earlier for blood glucose in Table 7.4.
Because the situations just described can easily be managed with paired t tests, the repeated-measures
form of ANOVA is usually reserved for situations in which the same entity has been measured at threeor more time points. The variables that become the main factors in the analysis are the times and the
groups (such as treatment). Interaction terms can be added for the effects of groups times.
Source DF Sum of Squares Mean Square F Value Pr > F
Model 4 6087.9081999 1521.9770500 3.61 0.0111Error 55 23216.7051334 422.1219115Corrected Total 59 29304.6133333
R-Square C.V. Root MSE SURVIVE Mean
0.207746 139.1350 20.545606 14.766667
Source DF Type I SS Mean Square F Value Pr > F
TNMSTAGE 1 4897.3897453 4897.3897453 11.60 0.0012HISTOL 3 1190.5184546 396.8394849 0.94 0.4276
FIGURE 29.5Printout of Analysis of Covariance for data inFigure 29.3, with TNM stage used as ranked variable.
http://c2166_ch07.pdf/http://c2166_ch07.pdf/http://c2166_ch07.pdf/http://c2166_ch07.pdf/7/30/2019 C2166_ch29_Principle of Medical Statistics
14/16
Four major problems, for which consensus solutions do not yet exist, arise when the same entity is
measured repeatedly over time:
1. Independence. The first problem is violation of the assumption that the measurements are inde-
pendent. The paired t test manages this problem by reducing the pair of measurements to their
increment, which becomes a simple new variable. This distinction may not always be suitably
employed with more than two sets of repeated measurements.
2. Incremental components. A second problem is the choice of components for calculating
incremental changes for each person. Suppose t0 is an individual baseline value, and the
subsequent values are t1, t2, and t3. Do we always measure increments from the baseline
value, i.e., t1 t0, t2 t0, and t3 t0, or should the increments be listed successively as t 1t0, t2 t1, t3 t2?
3. Summary index of response. If a treatment is imposed after the baseline value at t0, what is
the best single index for summarizing the post-therapeutic response? Should it be the mean
of the post-treatment values, the increment between t0 and the last measurement, or a regres-
sion line for the set of values?
4. Neglect of trend. This problem is discussed further in Section 29.8. As noted earlier, an
ordinary analysis of variance does not distinguish between unranked nominal and ranked
ordinal categories in the independent polytomous variable. If the variable represents serial
points in time, their ranking may produce a trend, but it will be neglected unless special
arrangements are used in the calculations.
29.6.4.2 Intraclass Correlations Studies of observer or instrument variability can also beregarded as a type of repeated measures, for which the results are commonly cited with an intraclass
correlation coefficient (ICC).
As noted in Section 20.7.3, the basic concept was developed as a way of assessing agreement for
measurements of a dimensional variable, such as height or weight, between members of the same class,such as brothers in a family. To avoid the inadequacy of a correlation coefficient, the data were appraised
with a repeated-measures analysis-of-variance. To avoid decisions about which member of a pair should
be listed as the first or second measurements, all possible pairs were listed twice, with each member as
the first measurement and then as the second. The total sums of squares could be partitioned into one
sum for variability between the individuals being rated, i.e., the subjects (SSS), and another sum of
squares due to residual error (SSE). The intraclass correlation was then calculated as
The approach was later adapted for psychometric definitions ofreliability. The appropriate means forthe sums of squares were symbolized as for variance in the subjects and for the corresponding
residual errors.Reliability was then defined as
Using the foregoing symbols, when each of a set of n persons is measured by each of a set of r raters,
the variance of a single observation, s, can be partitioned as
where is the mean of the appropriate sums of squares for the raters.
These variances can be arranged into several formulas for calculating RI. The different arrangements
depend on the models used for the sampling and the interpretation.6 In a worked example cited by
Everitt,7 vital capacity was measured by four raters for each of 20 patients. The total sum of squares for
the 80 observations, with d.f. = 79, was divided into three sets of sums of squares: (1) for the four
R ISSS SSE
SSS SSE+--------------------------=
sc2 se
2
RI sc2/ sc
2 se2+( )=
s2 sc2 sr
2 se2+ +=
sr2
7/30/2019 C2166_ch29_Principle of Medical Statistics
15/16
observers with d.f. = 3; (2) for the 20 patients with d.f. = 19; and (3) for the residual error with d.f. =3 19 = 57. The formula used by Everitt for calculating the intraclass correlation coefficient was
A counterpart formula, using SSR to represent sums of squares for raters, is
The intraclass correlation coefficient (ICC) can be used when laboratory measurements of instrument
variability are expressed in dimensional data. Nevertheless, as discussed in Chapter 20, most laboratories
prefer to use simpler pair-wise and other straightforward statistical approaches that are easier to under -
stand and interpret than the ICC.
The simpler approaches may also have mathematical advantages that have been cited by Bland and
Altman,8
who contend that the ICC, although appropriate for repetitions of the same measurement, isunsatisfactory when dealing with measurements by two different methods where there is no ordering
of the repeated measures and hence no obvious choice of X or Y. Other disadvantages ascribed to the
ICC are that it depends on the range of measurement and is not related to the actual scale of
measurement or to the size of error which might be clinically allowable. Instead, Bland and Altman
recommend their limits of agreement method, which was discussed throughout Section 20.7.1. The
method relies on examining the increments in measurement for each subject. The mean difference then
indicates bias, and the standard deviation is used to calculate a 95% descriptive zone for the limits of
agreement. A plot of the differences against the mean value of each pair will indicate whether the
discrepancies in measurement diverge as the measured values increase.
For categorical data, concordance is usually expressed (see Chapter 20) with other indexes of vari-
ability, such as kappa, which yields the same results as the intraclass coefficient in pertinent situations.
29.7 Non-Parametric Methods of Analysis
The mathematical models of ANOVA require diverse assumptions about Gaussian distributions and
homoscedastic (i.e., similar) variances. These assumptions can be avoided by converting the dimen-
sional data to ranks and analyzing the values of the ranks. The Kruskal-Wallis procedure, which is
the eponym for a one-way ANOVA using ranked data, corresponds to a WilcoxonMannWhitney
U test for 3 or more groups. The Friedman procedure, which refers to a two-way analysis of ranked
data, was proposed almost 60 years ago by Milton Friedman, who later become more famous in
economics than in statistics.
29.8 Problems in Analysis of Trends
If a variable has ordinal grades, the customary ANOVA procedure will regard the ranked categories
merely as nominal, and will not make provision for the possible or anticipated trend associated with
different ranks. The problem occurs with an ordinal variable, such as TNM stage inFigure 29.1, because
the effect of an increasing stage is ignored. The neglect of a ranked effect can be particularly importantwhen the independent variable (or factor) is time, for which the effects might be expected to occur in
a distinct temporal sequence. This problem in repeated-measures ANOVA evoked a denunciation by
Sheiner,9 who contended that the customary ANOVA methods were wholly inappropriate for many
studies of the time effects of pharmacologic agents.
R In sc
2 se2( )
ns c2 rs r
2 nr n r( )se2+ +-----------------------------------------------------------=
RISSS SSE
SSS SSE 2 SSR( )+ +----------------------------------------------------=
7/30/2019 C2166_ch29_Principle of Medical Statistics
16/16
The appropriate form of analysis can be carried out, somewhat in the manner of the chi-square test
for linear trend in an array of proportions (see Chapter 27), by assigning arbitrary coding values (such
as 1,2,3,4) to the ordinal categories. The process is usually done more easily and simply, however, as
a linear regression analysis.
29.9 Use of ANOVA in Published Literature
To find examples of ANOVA in published medical literature, the automated Colleague Medical Database
was searched for papers, in English, of human-subject research that appeared in medical journals during
199195, and in which analysis of variance was mentioned in the abstract-summary. From the list of
possibilities, 15 were selected to cover a wide array of journals and topics. The discussion that follows
is a summary of results in those 15 articles.
A one-way analysis of variance was used to check the rate of disappearance of ethanol from venous
blood in 12 subjects who drank the same dose of alcohol in orange juice on four occasions.10 The
authors concluded that the variation between subjects exceeded the variations within subjects. Anotherclassical one-way ANOVA was done to examine values of intestinal calcium absorption and serum
parathyroid hormone levels in three groups of people: normal controls and asthmatic patients receiving
either oral or inhaled steroid therapy.11 A one-way ANOVA compared diverse aspects of functional
status in two groups of patients receiving either fluorouracil or saline infusions for head and neck
cancer.12 In a complex but essentially one-way ANOVA, several dependent variables (intervention points,
days of monitoring, final cardiovascular function) were related to subgroups defined by APACHE II
severity scores in a surgical intensive care unit.13 (The results were also examined in a regression
analysis.) In another one-way analysis of variance, preference ratings for six different modes of teaching
and learning were evaluated14 among three groups, comprising first-year, second-year, and fourth-year
medical students in the United Arab Emirates. The results were also examined for the preferences of
male vs. female students.In a two-way ANOVA, neurologic dysfunction at age four years was related 15 to two main factors:
birth weight and location of birth in newborn intensive care units of either Copenhagen or Dublin.
Multifactor ANOVAs were applied,16 in 20 patients with conjunctival malignant melanoma, to the
relationship between 5-year survival and the counts of cells positive for proliferating cell nuclear antigen,
predominant cell type, maximum tumor depth, and site of tumor. The result, showing that patients with
low counts had better prognoses, was then confirmed with a Cox proportional hazards regression
analysis. (The latter approach would probably have been best used directly.)
Repeated measures ANOVA was used in the following studies: to check the effect of oat bran consumption
on serum cholesterol levels at four time points;17 to compare various effects (including blood pressure levels
and markers of alcohol consumption) in hypertensive men randomized to either a control group or to receive
special advice about methods of reducing alcohol consumption;18 to assess the time trend of bloodpressure during a 24-hour monitoring period in patients receiving placebo or an active antihypertensive
agent;19 and to monitor changes at three time points over 6 months in four indexes (body weight, serum
osmolality, serum sodium, and blood urea nitrogen/creatinine ratios) for residents of a nursing home. 19
The intraclass correlation coefficient was used in three other studies concerned with reliability (or
reproducibility) of the measurements performed in neuropathic tests,21 a brief psychiatric rating scale,22
and a method of grading photoageing in skin casts.23
References
1. Feinstein, 1990d; 2. Feinstein, 1996; 3. Oxford English Dictionary, 1971; 4. Lentner, 1982; 5. Tukey, 1968;
6. Shrout, 1979; 7. Everitt, 1989; 8. Bland, 1990; 9. Sheiner, 1992; 10. Jones, 1994; 11. Luengo, 1991;
12. Browman, 1993; 13. Civetta, 1992; 14. Paul, 1994; 15. Ellison, 1992; 16. Seregard, 1993; 17. Saudia, 1992;
18. Maheswaran, 1992; 19. Tomei, 1992; 20. Weinberg, 1994; 21. Dyck, 1991; 22. Hafkenscheid, 1993;
23. Fritschi, 1995.