C2166_ch29_Principle of Medical Statistics

Embed Size (px)

Citation preview

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    1/16

    29

    Analysis of Variance

    CONTENTS

    29.1 Conceptual Background

    29.1.1 Clinical Illustration

    29.1.2 Analytic Principles

    29.2 Fishers F Ratio

    29.3 Analysis-of-Variance Table

    29.4 Problems in Performance29.5 Problems of Interpretation

    29.5.1 Quantitative Distinctions

    29.5.2 Stochastic Nonsignificance

    29.5.3 Stochastic Significance

    29.5.4 Substantive Decisions

    29.6 Additional Applications of ANOVA

    29.6.1 Multi-Factor Arrangements

    29.6.2 Nested Analyses

    29.6.3 Analysis of Covariance

    29.6.4 Repeated-Measures Arrangements

    29.7 Non-Parametric Methods of Analysis29.8 Problems in Analysis of Trends

    29.9 Use of ANOVA in Published Literature

    References

    The targeted analytic method called analysis of variance, sometimes cited acronymically as ANOVA,

    was devised (like so many other procedures in statistics) by Sir Ronald A. Fisher. Although often marking

    the conceptual boundary between elementary and advanced statistics, or between amateur fan and

    professional connoisseur, ANOVA is sometimes regarded and taught as elementary enough to be used

    for deriving subsequent simple procedures, such as the t test. Nevertheless, ANOVA is used much less

    often today than formerly, for reasons to be noted in the discussions that follow.

    29.1 Conceptual Background

    The main distinguishing feature of ANOVA is that the independent variable contains polytomous

    categories, which are analyzed simultaneously in relation to a dimensional or ordinal dependent (out-

    come) variable.

    Suppose treatments A, B, and C are tested for effects on blood pressure in a randomized trial. When

    the results are examined, we want to determine whether one of the treatments differs significantly from

    the others. With the statistical methods available thus far, the only way to answer this question would

    be to do multiple comparisons for pairs of groups, contrasting results in group A vs. B, A vs. C, and Bvs. C. If more ambitious, we could compare A vs. the combined results of B and C, or group B vs. the

    combined results of A and C, and so on. We could work out various other arrangements, but in each

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    2/16

    instance, the comparison would rely on contrasting two collected groups, because we currently know

    no other strategy.

    The analysis of variance allows a single simultaneous comparison for three or more groups. The result

    becomes a type of screening test that indicates whether at least one group differs significantly from the

    others, but further examination is needed to find the distinctive group(s). Despite this disadvantage, ANOVA

    has been a widely used procedure, particularly by professional statisticians, who often like to apply it even

    when simpler tactics are available. For example, when data are compared for only two groups, a t test or

    Z test is simpler, and, as noted later, produces exactly the same results as ANOVA. Nevertheless, many

    persons will do the two-group comparison (and report the results) with an analysis of variance.

    29.1.1 Clinical Illustration

    Although applicable in experimental trials, ANOVA has been most often used for observational studies.

    A real-world example, shown in Figure 29.1, contains data for the survival times, in months, of a random

    sample of 60 patients with lung cancer,1,2 having one of the four histologic categories of WELL (well-

    differentiated), SMALL (small cell), ANAP (anaplastic), and CYTOL (cytology only). The other variable

    (the five categories of TNM stage) listed in Figure 29.1 will be considered later. The main analytic

    question now is whether histology in any of these groups has significantly different effects on survival.

    29.1.1.1 Direct Examination The best thing to do with these data, before any formal statis -tical analyses begin, is to examine the results directly. In this instance, we can readily determine the group

    sizes, means, and standard deviations for each of the four histologic categories and for the total. The results,

    shown inTable 29.1, immediately suggest that the data do not have Gaussian distributions, because the

    standard deviations are almost all larger than the means. Nevertheless, to allow the illustration to proceed,

    the results can be further appraised. They show that the well-differentiated and small-cell groups, as

    expected clinically, have the highest and lowest mean survival times, respectively. Because of relatively

    small group sizes and non-Gaussian distributions, however, the distinctions may not be stochasticallysignificant.

    Again before applying any advanced statistics, we can check these results stochastically by using simple

    t tests. For the most obvious comparison of WELL vs. SMALL, we can use the components of Formula

    [13.7] to calculate sp = = 21.96; (1/nA) + (1/nB) = (1/22) + (1/11) =.369; and = 24.43 4.45 = 19.98. These data could then be entered into Formula [13.7] toproduce t = 9.98/[(21.96)(.369)] = 2.47. At 31 d.f., the associated 2P value is about .02. From thisdistinction, we might also expect that all the other paired comparisons will not be stochastically

    significant. (If you check the calculations, you will find that the appropriate 2P values are all >.05.)

    29.1.1.2 Holistic and Multiple-Comparison Problems The foregoing comparisonindicates a significant difference in mean survival between the WELL and SMALL groups, but does

    not answer the holistically phrased analytic question, which asked whether histology has significant

    effects in any of the four groups in the entire collection. Besides, an argument could be made, using

    TABLE 29.1

    Summary of Survival Times in Four Histologic Groups

    of Patients with Lung Cancer inFigure 29.1

    Histologic

    Category

    Group

    Size

    Mean

    Survival

    Standard

    Deviation

    WELL 22 24.43 26.56

    SMALL 11 4.45 3.77

    ANAP 18 10.87 23.39

    CYTOL 9 11.54 13.47

    Total 60 14.77 22.29

    21 26.56( )2 10 3.77( )2+[ ]/ 21 10+( )XA XB

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    3/16

    distinctions discussed in Section 25.2.1.1, that the contrast of WELL vs. SMALL was only one of the

    six (4 3/2) possible paired comparisons for the four histologic categories. With the Bonferroni correc-tion, the working level of for each of the six comparisons would be .05/6 = .008. With the lattercriterion, the 2P value of about .02 for WELL vs. SMALL would no longer be stochastically significant.

    We therefore need a new method to answer the original question. Instead of examining six pairs of

    contrasted means, we can use a holistic approach by finding the grand mean of the data, determining

    the deviations of each group of data from that mean, and analyzing those deviations appropriately.

    OBS ID HISTOL TNMSTAGE SURVIVE

    1 62 WELL I 82.32 107 WELL II 5.33 110 WELL IIIA 29.64 157 WELL I 20.35 163 WELL I 54.96 246 SMALL I 10.37 271 WELL IIIB 1.68 282 ANAP IIIA 7.69 302 WELL I 28.0

    10 337 CYTOL I 12.811 344 WELL II 4.012 352 ANAP IIIA 1.313 371 WELL IIIB 14.114 387 SMALL IIIA 0.215 428 SMALL II 6.816 466 ANAP IIIB 1.417 513 ANAP I 0.118 548 ANAP IV 1.819 581 ANAP IV 6.020 605 CYTOL IV 1.021 609 CYTOL IV 6.222 628 SMALL IV 4.423 671 SMALL IV 5.524 764 SMALL IV 0.325 784 ANAP IV 1.626 804 WELL I 12.2

    27 806 ANAP IIIB 6.528 815 WELL I 39.929 852 WELL IIIB 4.530 855 WELL II 1.631 891 CYTOL IIIB 8.132 892 WELL IIIB 62.033 931 CYTOL IIIB 8.834 998 WELL IIIB 0.235 1039 SMALL IV 0.636 1044 ANAP II 19.337 1054 WELL IIIB 0.638 1057 ANAP I 10.939 1155 ANAP I 0.240 1192 SMALL IV 11.241 1223 ANAP IV 0.942 1228 ANAP II 27.943 1303 ANAP IIIB 2.9

    44 1309 ANAP II 99.945 1317 ANAP IV 4.746 1355 CYTOL IIIB 1.847 1361 WELL IV 1.048 1380 CYTOL IV 10.649 1405 SMALL IV 3.750 1444 WELL II 55.951 1509 SMALL IV 3.452 1515 WELL I 79.753 1521 ANAP IV 1.954 1556 ANAP IIIB 0.855 1567 SMALL IV 2.556 1608 CYTOL I 8.657 1612 WELL IIIA 13.358 1666 CYTOL IV 46.059 1702 WELL II 23.960 1738 WELL II 2.6

    FIGURE 29.1Printout of data on histologic type, TNM Stage, and months of survival in a random sample of 60 patients with primary cancer

    of the lung. [OBS = observation number in sample; ID = original indentification number; HISTOL = histology type; TNMSTAGE= one of five ordinal anatomic TNM stages for lung cancer; SURVIVE = survival time (mos.); WELL = well-differentiated;SMALL = small cell; ANAP = anaplastic; CYTOL = cytology only.]

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    4/16

    Many different symbols have been used to indicate the entities that are involved. In the illustration

    here, Yij will represent the target variable (survival time) for person i in group j. For example, if WELL

    is the first group inFigure 29.1, the eighth person in the group has Y8,1= 4.0. The mean of the valuesin group j will be = Yij/nj, where nj is the number of members in the group. Thus, for the last

    group (cytology) inTable 29.1, n4= 9, Yi,4= 103.9, and = 103.9/9 = 11.54. The grand mean, ,will be (nj )/N, where N = nj= size of the total group under analysis. From the data inTable 29.1,G = [(22 24.43) + (11 4.45) + (18 10.87) + (9 11.54)]/60 = 885.93/60 = 14.77.

    We can now determine the distance, , between each groups mean and the grand mean. Forthe ANAP group, the distance is 10.87 14.77 =3.90. For the other three groups, the distances are3.23 for CYTOL, 10.32 for SMALL, and +9.66 for WELL. This inspection confirms that the meansof the SMALL and WELL groups are most different from the grand mean, but the results contain no

    attention to stochastic variation in the data.

    29.1.2 Analytic Principles

    To solve the stochastic challenge, we can use ANOVA, which like many other classical statisticalstrategies, expresses real world phenomena with mathematical models. We have already used such models

    both implicitly and explicitly. In univariate statistics, the mean, , was an implicit model for fitting

    a group of data from only the values in the single set of data. The measured deviations from that model,

    Yi , were then converted to the groups basic variance, .In bivariate statistics for the associations in Chapters 18 and 19, we used an explicit model based on

    an additional variable, expressed algebraically as = a + bXi. We then compared variances for threesets of deviations: Yi , between the items of data and the explicit model; Yi , between the itemsof data and the implicit model; and , between the explicit and implicit models. The group variances

    or sums of squares associated with these deviations were called residual (or error) for ,

    basic for , and model for ( )2.

    29.1.2.1 Distinctions in Nomenclature The foregoing symbols and nomenclature havebeen simplified for the sake of clarity. In strict statistical reasoning, any set of observed data is regarded

    as a sample from an unobserved population whose parameters are being estimated from the data. If

    modeled with a straight line, the parametric population would be cited as Y = + X. When theresults for the observed data are expressed as = a + bXi, the coefficients a and b are estimates ofthe corresponding and parameters.

    Also in strict reasoning, variance is an attribute of the parametric population. Terms such as

    or , which are used to estimate the parametric variances, should be called sums

    of squares, not group variances. The linguistic propriety has been violated here for two reasons: (1) the

    distinctions are more easily understood when called variance, and (2) the violations constantly appear

    in both published literature and computer print-outs. The usage here, although a departure from strictformalism, is probably better than in many discussions elsewhere where the sums of squares are called

    variances instead ofgroup variances.

    Another issue in nomenclature is syntactical rather than mathematical. In most English prose,

    between is used for a distinction of two objects, and among for more than two. Nevertheless, in the

    original description of the analysis of variance, R. A. Fisher used the preposition between rather than

    among when more than two groups or classes were involved. The term between groups has been

    perpetuated by subsequent writers, much to the delight of English-prose pedants who may denounce

    the absence of literacy in mathematical technocracy. Nevertheless, Fisher and his successors have been

    quite correct in maintaining between. Its use for the cited purpose is approved by diverse high-echelon

    authorities, including the Oxford English Dictionary, which states that between has been, from its

    earliest appearance, extended to more than two.3 [As one of the potential pedants, I was ready to useamong in this text until I checked the dictionary and became enlightened.]

    29.1.2.2 Partition of Group Variance The same type of partitioning that was used forgroup variance in linear regression is also applied in ANOVA. Conceptually, however, the models are

    Yj

    Y4 GYj

    Yj G

    Y

    Y Yi Y( )2

    Yi

    Yi Y

    Yi Y

    Yi Yi( )2

    Yi Y( )2

    Yi Y

    Yi

    Yi Y( )2

    Yi Y( )2

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    5/16

    expressed differently. Symbolically, each observation can be labelled Y ij, with j representing the group

    and i, the person (or other observed entity) within the group. The grand mean, , is used for the implicit

    model when the basic group or system variance, , is summed for the individual values of

    Yi in all of the groups. The individual group means, , become the explicit models when the total

    system is partitioned into groups. The residual group variance is the sum of the values of (Yi )2

    within each of the groups. [In more accurate symbolism, the two cited group variances would be written

    with double subscripts and summations as (Yij )2 and (Yij )2.] The model group variance,summed for each group of njmembers with group mean , is nj( )2. These results for data inthe four groups ofFigure 29.1 andTable 29.1are shown inTable 29.2.

    Except for minor differences due to rounding, the components ofTable 29.2have the same structure

    noted earlier for simple linear regression in Section 19.2.2. The structure is as follows:

    {Basic Group Variance} = {Model Variance between Groups} + { Residual Variance within Groups}

    or Syy = SM + SR.

    The structure is similar to that of the deviations

    Total Deviation = Model Deviation + Residual Deviation

    which arises when each individual deviation is expressed in the algebraic identity

    If is moved to the first part of the right side, the equation becomes

    and is consistent with a parametric algebraic model that has the form

    Yij=+j+ij

    In this model, each persons value of Yij consists of three contributions: (1) from the grand parametric

    mean, (which is estimated by ); (2) from the parametric increment, j (estimated by ),between the grand mean and group mean; and (3) from an error term, ij (estimated by Yij ), forthe increment between the observed value of Yij and the group mean.

    For stochastic appraisal of results, the null hypothesis assumption is that the m groups have the same

    parametric mean, i.e., 1=2= = j= =m.

    TABLE 29.2

    Group-Variance Partitions of Sums of Squares for the Four Histologic Groups

    inFigure 29.1andTable 29.1

    Group

    Basic

    (Total System)

    Model

    (Between Groups)

    Residual

    (Within Groups)

    WELL 16866.67 22(24.43 14.77)2 = 2052.94 14813.73SMALL 1313.52 11(4.45 14.77)2 = 1171.53 141.99ANAP 9576.88 18(10.87 14.77)2 = 273.78 9303.10CYTOL 1546.32 9(11.54 14.77)2 = 93.90 1452.42Total 29304.61* 3593.38* 25711.24

    * These are the correct totals. They differ slightly from the sum of the collection of individual values, calculated

    with rounding, in each column.

    G

    Y i G( )2

    Yj

    Yj

    G Yj

    Yj Yj G

    Yij G Yj G( ) Yij Yj( )+=

    G

    Y ij G Yj G( ) Y ij Yj( )+ +=

    G Yj GYj

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    6/16

    29.1.2.3 Mean Variances and Degrees of Freedom When divided by the associateddegrees of freedom, each of the foregoing group variances is converted to a mean value. For the basic

    group variance, the total system contains N =nj members, and d.f. = N 1. For the model variance, them groups have m 1 degrees of freedom. For the residual variance, each group has nj 1 degrees of freedom,and the total d.f. for m groups is (nj 1) = N m.

    The degrees of freedom are thus partitioned, like the group variances, into an expression that indicates

    their sum as

    N 1 = (m 1) + (N m)

    The mean variances, however, no longer form an equal partition. Their symbols, and the associated

    values in the example here, are as follows:

    Mean Group Variance = Syy/(N 1) = 29304.61/59 = 496.69

    Mean Model Variance = SM/(m 1) = 3593.38/3 = 1197.79(between groups)

    Mean Residual Variance = SR/(N m) = 25711.24/56 = 459.13(within groups)

    29.2 Fishers F Ratio

    Under the null hypothesis of no real difference between the groupsi.e., the assumption that they have

    the same parametric meaneach of the foregoing three mean variances can be regarded as a separate

    estimate of the true parametric variance. Within the limits of stochastic variation in random sampling,

    the three mean variances should equal one another.

    To test stochastic significance, R. A. Fisher constructed a variance ratio, later designated as F, that is

    expressed as

    It can be cited symbolically as

    [29.1]

    If only two groups are being compared, some simple algebra will show that Formula [29.1] becomes

    the square of the earlier Formula [13.7] for the calculation of t (or Z). This distinction is the reason why

    the F ratio is sometimes used, instead of t (or Z), for contrasting two groups, as noted earlier inSection 13.3.6.

    The Fisher ratio has a sampling distribution in which the associated 2P value is found for the values

    of F at the two sets of degrees of freedom in values of m 1 and N m. The three components makethe distribution difficult to tabulate completely; and it is usually cited according to values for F for each

    degree of freedom simultaneously at fixed values of 2P such as .1, .05, .01.

    In the example under discussion here, the F ratio is 1197.79/459.13 = 2.61. In the Geigy tables4

    available for the combination of 3 and 56 degrees of freedom, the required F values are 2.184 for 2P = .1,2.769 for 2P = .05, and 3.359 for 2P = .025. If only the Geigy values were available, the result wouldbe written as .05 < 2P < .1. In an appropriate computer program, however, the actual 2P value is usuallycalculated and displayed directly. In this instance, it was .0605.

    If 2P is small enough to lead to rejection of the null hypothesis, the stochastic conclusion is that atleast one of the groups has a mean significantly different from the others. Because the counter-hypothesis

    for the F test is always that the mean variance is larger between groups than within them, the null

    hypothesis can promptly be conceded if the F ratio is < 1. In this instance, because the null hypothesiscannot be rejected at = .05, we cannot conclude that a significant difference in survival has been

    Mean variance between groups

    Mean variance within groups--------------------------------------------------------------------------

    FSM/ m 1( )SR/ N m( )----------------------------=

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    7/16

    stochastically confirmed for the histologic categories. The observed quantitative distinctions seem

    impressive, however, and would probably attain stochastic significance if the group sizes were larger.

    29.3 Analysis-of-Variance Table

    The results of an analysis of variance are commonly presented, in both published literature and computer

    printouts, with a tabular arrangement that warrants special attention because it is used not only for

    ANOVA but also for multivariable regression procedures that involve partitioning the sums of squared

    deviations (SS) that form group variances.

    In each situation, the results show the partition for the sums of squares of three entities: (1) the total

    SS before imposition of an explicit model, (2) the SS between the explicit model and the original implicit

    grand mean, and (3) the residual SS for the explicit model. The last of these entities is often called the

    unexplained or error variance. Both of these terms are unfortunate because the mathematical expla-

    nation is a statistical phenomenon that may have nothing to do with biologic mechanisms of explanation

    and the error represents deviations between observed and estimated values, not mistakes or inaccuraciesin the basic data. In certain special arrangements, to be discussed shortly, the deviations receive an

    additionally improved explanation when the model is enhanced with subdivisions of the main variable

    or with the incorporation of additional variables.

    Figure 29.2 shows the conventional headings for the ANOVA table of the histology example in

    Figure 29.1. For this one-way analysis, the total results are divided into two rows of components. The

    number of rows is appropriately expanded when more subgroups are formed (as discussed later) via

    such mechanisms as subdivisions or inclusion of additional variables.

    29.4 Problems in Performance

    The mathematical reasoning used in many ANOVA arrangements was developed for an ideal experi-

    mental world in which all the compared groups or subgroups had the same size. If four groups were

    being compared, each group had the same number of members, so that n1= n2= n3= n4. If the groupswere further divided into subgroupssuch as men and women or young, middle-aged, and oldthe

    subgroups had the same sizes within each group.

    These equi-sized arrangements were easily attained for experiments in the world of agriculture, where

    R. A. Fisher worked and developed his ideas about ANOVA. Equally sized groups and subgroups are

    seldom achieved, however, in the realities of clinical and epidemiologic research. The absence of equal

    sizes may then create a major problem in the operation of computer programs that rely on equal sizes,and that may be unable to manage data for other circumstances. For the latter situations, the computer

    programs may divert ANOVA into the format of a general linear model, which is essentially a method

    of multiple regression. One main reason, therefore, why regression methods are replacing ANOVA

    Dependent Variable: SURVIVE

    Source DF Sum of Squares Mean Square F Value Pr > F

    Model 3 3593.3800000 1197.7933333 2.61 0.0605

    Error 56 25711.2333333 459.1291667

    Corrected Total 59 29304.6133333

    R-Square C.V. Root MSE SURVIVE Mean

    0.122622 145.1059 21.427300 14.766667

    FIGURE 29.2Printout of analysis-of-variance table for survival time in the four histologic groups ofFigure 29.1.

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    8/16

    methods today is that the automated regression methods can more easily process data for unequal-sized

    groups and subgroups.

    29.5 Problems of Interpretation

    The results of an analysis of variance are often difficult to interpret for both quantitative and stochastic

    reasons, as well as for substantive decisions.

    29.5.1 Quantitative Distinctions

    The results of ANOVA are almost always cited with F ratios and P values that indicate stochastic

    accomplishments but not quantitative descriptive distinctions. The reader is thus left without a mechanism

    to decide what has been accomplished quantitatively, while worrying that significant P values may

    arise mainly from large group sizes.

    Although not commonly used, a simple statistical index can provide a quantitative description of theresults. The index, called eta squared, was previously discussed in Section 27.2.2 as a counterpart of r2 for

    proportionate reduction of group variance in linear regression. Labeled R-square in the printout of

    Figure 29.2, the expression is

    For the histologic data in Figure 29.2, this index is 3593.38/29304.61 = 0.12, representing a modestachievement, which barely exceeds the 10% noted earlier (see Section 19.3.3) as a minimum level for

    quantitative significance in variance reduction.

    29.5.2 Stochastic Nonsignificance

    Another important issue is what to do when a result is notstochastically significant, i.e., P >. In previousanalytic methods, a confidence interval could be calculated around the nonsignificant increment, ratio,

    or coefficient that described the observed d Odistinction in the results. If the upper end of this confidence

    interval excluded a quantitatively significant value (such as ), the result could be called stochasticallynonsignificant. If the confidence interval included , the investigator might be reluctant to concede thenull hypothesis of no difference.

    This type of reasoning would be equally pertinent for ANOVA, but is rarely used because the results

    seldom receive a descriptive citation. Confidence intervals, although sometimes calculated for the mean

    of each group, are almost never determined to give the value of eta the same type of upper and lowerconfidence boundaries that can be calculated around a correlation coefficient in simple linear regression.

    In the absence of a confidence interval for eta, the main available descriptive approach is to examine

    results in individual groups or in paired comparisons. If any of the results seem quantitatively significant,

    the investigator, although still conceding the null hypothesis (because P >), can remain suspicious thata significant difference exists, but has not been confirmed stochastically. For example, in Figure 29.2,

    the P value of 0.06 would not allow rejection of the null hypothesis that all group means are equal.

    Nevertheless, the modestly impressive value of 0.12 for eta squared and the large increment noted earlier

    between the WELL and SMALL group means suggest that the group sizes were too small for stochastic

    confirmation of what is probably a quantitatively significant distinction.

    29.5.3 Stochastic Significance

    If P

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    9/16

    containing m groups will allow m(m 1)/2 paired comparisons when each groups mean is contrastedagainst the mean of every other group. With m additional paired comparisons between each group and

    the total of the others, the total number of paired comparisons will be m(m + 1)/2. For example, thesmall-cell histologic group inTable 29.1could be compared against each of the three other groups and

    also against their total. A particularly ingenious (or desperate) investigator might compare a single

    group or paired groups against pairs (or yet other combinations) of the others.

    This plethora of activities produces the multiple comparison problem discussed in Chapter 25, as well

    as the multiple eponymous and striking titles (such as Tukeys honestly significant difference5) that have

    been given to the procedures proposed for examining and solving the problem.

    29.5.4 Substantive Decisions

    Because the foregoing solutions all depend on arbitrary mathematical mechanisms, investigators who

    are familiar with the substantive content of the data usually prefer to avoid the polytomous structure of

    the analysis of variance. For example, a knowledgeable investigator might want to compare only the

    SMALL vs. WELL groups with a direct 2-group contrast (such as a t test) in the histologic data, avoidingthe entire ANOVA process. An even more knowledgeable investigator, recognizing that survival can be

    affected by many factors (such as TNM stage and age) other than histologic category, might not want

    to do any type of histologic appraisal unless the other cogent variables have been suitably accounted for.

    For all these reasons, ANOVA is a magnificent method of analyzing data if you are unfamiliar with

    what the data really mean or represent. If you know the substantive content of the research, however,

    and if you have specific ideas to be examined, you may want to use a simpler and more direct way of

    examining them.

    29.6 Additional Applications of ANOVA

    From a series of mathematical models and diverse arrangements, the analysis of variance has a versatility,

    analogous to that discussed earlier for chi square, that for many years made ANOVA the most commonly

    used statistical procedure for analyzing complex data. In recent years, however, the ubiquitous availability

    of computers has led to the frequent replacement of ANOVA by multiple regression procedures, whose

    results are often easier to understand. Besides, ANOVA can mathematically be regarded as a subdivision

    of the general-linear-model strategies used in multivariable regression analysis.

    Accordingly, four of the many other applications of ANOVA are outlined here only briefly, mainly

    so that you will have heard of them in case you meet them (particularly in older literature). Details

    can be found in many statistical textbooks. The four procedures to be discussed are multi-factorarrangements, nested analyses, the analysis of covariance (ANCOVA), and repeated-measures arrange -

    ments (including the intraclass correlation coefficient).

    29.6.1 Multi-Factor Arrangements

    The procedures discussed so far are called one-way analyses of variance, because only a single inde-

    pendent variable (i.e., histologic category) was examined in relation to survival time. In many circum-

    stances, however, two or more independent variables can be regarded as factors affecting the dependent

    variable. When these additional factors are included, the analysis is called two-way(or two-factor), three-

    way (or three-factor), etc.

    For example, if the two factors of histologic category and TNM stage are considered simultaneously,the data for the 60 patients inFigure 29.1 would be arranged as shown in Table 29.3. The identification

    of individual survival times would require triple subscripts: i for the person, j for the row, and k for

    the column.

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    10/16

    29.6.1.1 Main Effects In the mathematical model of the two-way arrangement, thecategorical mean for each factorHistology and TNM Stagemakes a separate contribution, called

    the main effect, beyond the grand mean. The remainder (or unexplained) deviation for each person is

    called the residual error. Thus, a two-factor model for the two independent variables would express theobserved results as

    [29.2]

    The term here represents the grand mean. The next two terms represent the respective deviations of

    each row mean ( ) and each column mean ( ) from the grand mean. The four components in the

    last term for the residual deviation of each person are constructed as residuals that maintain the

    algebraic identity. The total sum of squares in the system will be (Yijk )2, with N 1 degrees offreedom. There will be two sums of squares for the model, cited as nj( )2 for the row factor,and as nk( )2 for the column factor. The residual sum of squares will be the sum of all thevalues of (Yijk + )2.

    Figure 29.3shows the printout of pertinent calculations for the data inTable 29.3. In the lower halfofFigure 29.3, the 4-category histologic variable has 3 degrees of freedom and its Type I SS (sum of

    squares) and mean square, respectively, are the same 3593.38 and 1197.79 shown earlier. The 5-category

    TNM-stage variable has 4 degrees of freedom and corresponding values of 3116.39 and 779.10. The

    residual error group variance in the upper part of the table is now calculated differentlyas the corrected

    TABLE 29.3

    Two-Way Arrangement of Individual Data for Survival Time (i n Months)

    of Patients with Lung Cancer

    Histologic

    Category

    TNM Stage Mean for Total

    Row CategoryI II IIIA IIIB IV

    Well 82.3

    20.3

    54.9

    28.0

    12.2

    39.9

    79.7

    5.3

    4.0

    1.6

    55.9

    23.9

    2.6

    29.6

    13.3

    1.6

    14.1

    4.5

    62.0

    0.2

    0.6

    1.0

    24.43

    Small 10.3 6.8 0.2 4.45.5

    0.3

    0.611.2

    3.73.42.5

    4.45

    Anap 0.1

    10.9

    0.2

    19.3

    27.9

    99.9

    7.6

    1.3

    1.4

    6.5

    2.9

    0.8

    1.8

    6.0

    1.6

    0.9

    4.7

    1.9

    10.87

    Cytol 12.88.6

    8.18.8

    1.8

    1.06.2

    10.6

    46.0

    11.54

    Mean for Total

    Column Category 27.7 24.72 10.40 8.72 5.96 14.77

    Yijk G Yj G( ) Yk G( ) Y ij k Yj Yk G+( )+ + +=

    G

    Yj Yk

    G

    Yj G

    Yk G

    Yj Yk G

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    11/16

    total sum of squares minus the sum of Type I squares, which is a total of 6709.77 for the two factors

    in the model. Since those two factors have 7 (=3 + 4) degrees of freedom, the mean square for the modelis 6709.77/7 = 958.54, and the d.f. in the error variance is 59 7 = 52. The mean square for the errorvariance becomes 22594.84/52 = 434.52. When calculated for this two-factor model, the F ratio of meansquares is 2.21, which now achieves a P value (marked Pr > F) just below .05. If the level is set at.05, this result is significant, whereas it was not so in the previous analysis for histology alone.

    The label Type I SS is used because ANOVA calculations can also produce three other types of

    sums of squares (marked II, III, and IV when presented) that vary with the order in which factors are

    entered or removed in a model, and with consideration of the interactions discussed in the next section.

    As shown in the lower section ofFigure 29.3, an F-ratio value can be calculated for each factor when

    its mean square is divided by the error mean square. For histology, this ratio is 1197.79/434.52 = 2.76.For TNM stage, the corresponding value in the printout is 1.79. The corresponding 2P values are just

    above .05 for histology and .14 for TNM stage.

    29.6.1.2 Interactions In linear models, each factor is assumed to have its own separateadditive effect. In biologic reality, however, the conjunction of two factors may have an antagonistic or

    synergistic effect beyond their individual actions, so that the whole differs from the sum of the parts.

    For example, increasing weight and increasing blood pressure may each lead to increasing mortality,

    but their combined effect may be particularly pronounced in persons who are at the extremes of obesity

    and hypertension. Statisticians use the term interactions for these conjunctive effects; and the potential

    for interactions is often considered whenever an analysis contains two or more factors.

    To examine these effects in a two-factor analysis, the model for Yijk is expanded to contain aninteraction term. It is calculated, for the mean of each cell of the conjoined categories, as the deviation

    from the product of mean values of the pertinent row and column variables for each cell. In the expression

    of the equation for Yijk, the first three terms of Equation [29.2] are the same: , for the grand mean;

    for each row; and for each column. Because the observed mean in each cell will be, the interaction effect will be the deviation estimated as + . The remaining

    residual effect, used for calculating the residual sum of squares, is Yijk . For each sum of squares,the degrees of freedom are determined appropriately for the calculations of mean squares and F ratios.

    The calculation of interaction effects can be illustrated with an example from the data ofTable 29.3for

    the 7-member cell in the first row, first column. The grand mean is 14.77; the entire WELL histologic

    category has a mean of 24.43; and TNM stage I has a mean of 27.71. The mean of the seven values in the

    cited cell is (82.3 + 20.3 + + 79.7)/7 = 45.33. According to the algebraic equation, = 14.77; in thefirst row, ( ) = 24.43 14.77 = 9.66; and in the first column, ( ) = 27.71 14.77 = 12.94.The interaction effect in the cited cell will be estimated as 45.33 24.43 27.71 + 14.77 = 7.96. Theestimated value of the residual for each of the seven Y ijk values in the cited cell will be Yijk 7.96.

    Dependent Variable: SURVIVE

    Source DF Sum of Squares Mean Square F Value Pr > F

    Model 7 6709.7729638 958.5389948 2.21 0.0486

    Error 52 22594.8403695 434.5161610Corrected Total 59 29304.6133333

    R-Square C.V. Root MSE SURVIVE Mean0.228966 141.1629 20.845051 14.766667

    Source DF Type I SS Mean Square F Value Pr > F

    HISTOL 3 3593.3800000 1197.7933333 2.76 0.0515TNMSTAGE 4 3116.3929638 779.0982410 1.79 0.1443

    FIGURE 29.3Printout for 2-way ANOVA of data inFigure 29.1andTable 29.3.

    G

    Yj G Yk G

    Yjk Yjk Yj Yk G

    Yjk

    GYj G Yk G

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    12/16

    Figure 29.4 shows the printout of the ANOVA table when an interaction model is used for the two-

    factor data inTable 29.3. InFigure 29.4, the sums of squares (marked Type I SS) and mean squares for

    histology and TNM stage are the same as inTable 29.3, and they also have the same degrees of freedom.

    The degrees of freedom for the interaction are tricky to calculate, however. In this instance, because

    some of the cells ofTable 29.3are empty or have only 1 member, we first calculate degrees for freedom

    for the residual sum of squares, (Yijk )2. In each pertinent cell, located at (j, k) coordinates inthe table, the degrees of freedom will be njk 1. Working across and then downward through the cellsinTable 29.3, the sum of the njk 1 values will be 6 + 5 + 1 + 5 + 7 + 2 + 2 + 1 + 3 + 5 + 1 + 2 + 3= 43. (The values are 0 for the four cells with one member each and also for the 3 cells with no members.)This calculation shows that the model accounts for 59 43 = 16 d.f.; and as the two main factors havea total of 7 d.f., the interaction factor contributes 9 d.f. to the model, as shown in the last row of

    Figure 29.4.

    Calculated with the new mean square error term inFigure 29.4, the F values produce 2P values below

    F

    Model 16 13835.482381 864.717649 2.40 0.0114

    Error 43 15469.130952 359.747231Corrected Total 59 29304.613333

    R-Square C.V. Root MSE SURVIVE Mean0.472126 128.4447 18.967004 14.766667

    Source DF Type I SS Mean Square F Value Pr > F

    HISTOL 3 3593.3800000 1197.7933333 3.33 0.0282TNMSTAGE 4 3116.3929638 779.0982410 2.17 0.0890HISTOL*TNMSTAGE 9 7125.7094171 791.7454908 2.20 0.0408

    FIGURE 29.4

    Two-way ANOVA, with interaction component, for results inTable 29.3andFigure 29.3. [Printout from SAS PROC GLMcomputer program.]

    Yjk

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    13/16

    allow maintenance of the ranks, TNM stage could be declared a covariate, which would then be analyzed

    as though it had a dimensional scale.

    The results of the covariance analysis are shown inFigure 29.5. Note that TNM stage now has only

    1 degree of freedom, thus giving the model a total of 4 D.F., an F value of 3.61 and a P value of 0.0111,

    despite a decline of R-square from .229 inFigure 29.3to .208 inFigure 29.5. The histology variable,

    which had P = .052 inFigure 29.3now has P = .428; and TNM stage, with P = .144 in Figure 29.3, hasnow become highly significant at P = .0012. These dramatic changes indicate what can happen whenthe rank sequence is either ignored or appropriately analyzed for polytomous variables.

    In past years, the effect of confounding or ranked covariates was often formally adjusted in an analysis

    of covariance, using a complex set of computations and symbols. Today, however, the same adjustment

    is almost always done with a multiple regression procedure. The adjustment process in ANCOVA is

    actually a form of regression analysis in which the related effects of the covariate are determined by

    regression and then removed from the error variance. The group means of the main factor are also

    adjusted to correspond to a common value of the covariate. The subsequent analysis is presumably

    more powerful in detecting the effects of the main factor, because the confounding effects have

    presumably been removed. The process and results are usually much easier to understand, however,

    when done with multiple linear regression.2

    29.6.4 Repeated-Measures Arrangements

    Repeated measures is the name given to analyses in which the same entity has been observed repeatedly.The repetitions can occur with changes over time, perhaps after interventions such as treatment, or with

    examinations of the same (unchanged) entity by different observers or systems of measurement.

    29.6.4.1 Temporal Changes The most common repeated-measures situation is an ordinarycrossover study, where the same patients receive treatments A and B. The effects of treatment A vs.

    treatment B in each person can be subtracted and thereby reduced to a single group of increments, which

    can be analyzed with a paired t test, as discussed in Section 7.8.2.2. The same analysis of increments

    can be used for the before-and-after measurements of the effect in patients receiving a particular

    treatment, such as the results shown earlier for blood glucose in Table 7.4.

    Because the situations just described can easily be managed with paired t tests, the repeated-measures

    form of ANOVA is usually reserved for situations in which the same entity has been measured at threeor more time points. The variables that become the main factors in the analysis are the times and the

    groups (such as treatment). Interaction terms can be added for the effects of groups times.

    Source DF Sum of Squares Mean Square F Value Pr > F

    Model 4 6087.9081999 1521.9770500 3.61 0.0111Error 55 23216.7051334 422.1219115Corrected Total 59 29304.6133333

    R-Square C.V. Root MSE SURVIVE Mean

    0.207746 139.1350 20.545606 14.766667

    Source DF Type I SS Mean Square F Value Pr > F

    TNMSTAGE 1 4897.3897453 4897.3897453 11.60 0.0012HISTOL 3 1190.5184546 396.8394849 0.94 0.4276

    FIGURE 29.5Printout of Analysis of Covariance for data inFigure 29.3, with TNM stage used as ranked variable.

    http://c2166_ch07.pdf/http://c2166_ch07.pdf/http://c2166_ch07.pdf/http://c2166_ch07.pdf/
  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    14/16

    Four major problems, for which consensus solutions do not yet exist, arise when the same entity is

    measured repeatedly over time:

    1. Independence. The first problem is violation of the assumption that the measurements are inde-

    pendent. The paired t test manages this problem by reducing the pair of measurements to their

    increment, which becomes a simple new variable. This distinction may not always be suitably

    employed with more than two sets of repeated measurements.

    2. Incremental components. A second problem is the choice of components for calculating

    incremental changes for each person. Suppose t0 is an individual baseline value, and the

    subsequent values are t1, t2, and t3. Do we always measure increments from the baseline

    value, i.e., t1 t0, t2 t0, and t3 t0, or should the increments be listed successively as t 1t0, t2 t1, t3 t2?

    3. Summary index of response. If a treatment is imposed after the baseline value at t0, what is

    the best single index for summarizing the post-therapeutic response? Should it be the mean

    of the post-treatment values, the increment between t0 and the last measurement, or a regres-

    sion line for the set of values?

    4. Neglect of trend. This problem is discussed further in Section 29.8. As noted earlier, an

    ordinary analysis of variance does not distinguish between unranked nominal and ranked

    ordinal categories in the independent polytomous variable. If the variable represents serial

    points in time, their ranking may produce a trend, but it will be neglected unless special

    arrangements are used in the calculations.

    29.6.4.2 Intraclass Correlations Studies of observer or instrument variability can also beregarded as a type of repeated measures, for which the results are commonly cited with an intraclass

    correlation coefficient (ICC).

    As noted in Section 20.7.3, the basic concept was developed as a way of assessing agreement for

    measurements of a dimensional variable, such as height or weight, between members of the same class,such as brothers in a family. To avoid the inadequacy of a correlation coefficient, the data were appraised

    with a repeated-measures analysis-of-variance. To avoid decisions about which member of a pair should

    be listed as the first or second measurements, all possible pairs were listed twice, with each member as

    the first measurement and then as the second. The total sums of squares could be partitioned into one

    sum for variability between the individuals being rated, i.e., the subjects (SSS), and another sum of

    squares due to residual error (SSE). The intraclass correlation was then calculated as

    The approach was later adapted for psychometric definitions ofreliability. The appropriate means forthe sums of squares were symbolized as for variance in the subjects and for the corresponding

    residual errors.Reliability was then defined as

    Using the foregoing symbols, when each of a set of n persons is measured by each of a set of r raters,

    the variance of a single observation, s, can be partitioned as

    where is the mean of the appropriate sums of squares for the raters.

    These variances can be arranged into several formulas for calculating RI. The different arrangements

    depend on the models used for the sampling and the interpretation.6 In a worked example cited by

    Everitt,7 vital capacity was measured by four raters for each of 20 patients. The total sum of squares for

    the 80 observations, with d.f. = 79, was divided into three sets of sums of squares: (1) for the four

    R ISSS SSE

    SSS SSE+--------------------------=

    sc2 se

    2

    RI sc2/ sc

    2 se2+( )=

    s2 sc2 sr

    2 se2+ +=

    sr2

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    15/16

    observers with d.f. = 3; (2) for the 20 patients with d.f. = 19; and (3) for the residual error with d.f. =3 19 = 57. The formula used by Everitt for calculating the intraclass correlation coefficient was

    A counterpart formula, using SSR to represent sums of squares for raters, is

    The intraclass correlation coefficient (ICC) can be used when laboratory measurements of instrument

    variability are expressed in dimensional data. Nevertheless, as discussed in Chapter 20, most laboratories

    prefer to use simpler pair-wise and other straightforward statistical approaches that are easier to under -

    stand and interpret than the ICC.

    The simpler approaches may also have mathematical advantages that have been cited by Bland and

    Altman,8

    who contend that the ICC, although appropriate for repetitions of the same measurement, isunsatisfactory when dealing with measurements by two different methods where there is no ordering

    of the repeated measures and hence no obvious choice of X or Y. Other disadvantages ascribed to the

    ICC are that it depends on the range of measurement and is not related to the actual scale of

    measurement or to the size of error which might be clinically allowable. Instead, Bland and Altman

    recommend their limits of agreement method, which was discussed throughout Section 20.7.1. The

    method relies on examining the increments in measurement for each subject. The mean difference then

    indicates bias, and the standard deviation is used to calculate a 95% descriptive zone for the limits of

    agreement. A plot of the differences against the mean value of each pair will indicate whether the

    discrepancies in measurement diverge as the measured values increase.

    For categorical data, concordance is usually expressed (see Chapter 20) with other indexes of vari-

    ability, such as kappa, which yields the same results as the intraclass coefficient in pertinent situations.

    29.7 Non-Parametric Methods of Analysis

    The mathematical models of ANOVA require diverse assumptions about Gaussian distributions and

    homoscedastic (i.e., similar) variances. These assumptions can be avoided by converting the dimen-

    sional data to ranks and analyzing the values of the ranks. The Kruskal-Wallis procedure, which is

    the eponym for a one-way ANOVA using ranked data, corresponds to a WilcoxonMannWhitney

    U test for 3 or more groups. The Friedman procedure, which refers to a two-way analysis of ranked

    data, was proposed almost 60 years ago by Milton Friedman, who later become more famous in

    economics than in statistics.

    29.8 Problems in Analysis of Trends

    If a variable has ordinal grades, the customary ANOVA procedure will regard the ranked categories

    merely as nominal, and will not make provision for the possible or anticipated trend associated with

    different ranks. The problem occurs with an ordinal variable, such as TNM stage inFigure 29.1, because

    the effect of an increasing stage is ignored. The neglect of a ranked effect can be particularly importantwhen the independent variable (or factor) is time, for which the effects might be expected to occur in

    a distinct temporal sequence. This problem in repeated-measures ANOVA evoked a denunciation by

    Sheiner,9 who contended that the customary ANOVA methods were wholly inappropriate for many

    studies of the time effects of pharmacologic agents.

    R In sc

    2 se2( )

    ns c2 rs r

    2 nr n r( )se2+ +-----------------------------------------------------------=

    RISSS SSE

    SSS SSE 2 SSR( )+ +----------------------------------------------------=

  • 7/30/2019 C2166_ch29_Principle of Medical Statistics

    16/16

    The appropriate form of analysis can be carried out, somewhat in the manner of the chi-square test

    for linear trend in an array of proportions (see Chapter 27), by assigning arbitrary coding values (such

    as 1,2,3,4) to the ordinal categories. The process is usually done more easily and simply, however, as

    a linear regression analysis.

    29.9 Use of ANOVA in Published Literature

    To find examples of ANOVA in published medical literature, the automated Colleague Medical Database

    was searched for papers, in English, of human-subject research that appeared in medical journals during

    199195, and in which analysis of variance was mentioned in the abstract-summary. From the list of

    possibilities, 15 were selected to cover a wide array of journals and topics. The discussion that follows

    is a summary of results in those 15 articles.

    A one-way analysis of variance was used to check the rate of disappearance of ethanol from venous

    blood in 12 subjects who drank the same dose of alcohol in orange juice on four occasions.10 The

    authors concluded that the variation between subjects exceeded the variations within subjects. Anotherclassical one-way ANOVA was done to examine values of intestinal calcium absorption and serum

    parathyroid hormone levels in three groups of people: normal controls and asthmatic patients receiving

    either oral or inhaled steroid therapy.11 A one-way ANOVA compared diverse aspects of functional

    status in two groups of patients receiving either fluorouracil or saline infusions for head and neck

    cancer.12 In a complex but essentially one-way ANOVA, several dependent variables (intervention points,

    days of monitoring, final cardiovascular function) were related to subgroups defined by APACHE II

    severity scores in a surgical intensive care unit.13 (The results were also examined in a regression

    analysis.) In another one-way analysis of variance, preference ratings for six different modes of teaching

    and learning were evaluated14 among three groups, comprising first-year, second-year, and fourth-year

    medical students in the United Arab Emirates. The results were also examined for the preferences of

    male vs. female students.In a two-way ANOVA, neurologic dysfunction at age four years was related 15 to two main factors:

    birth weight and location of birth in newborn intensive care units of either Copenhagen or Dublin.

    Multifactor ANOVAs were applied,16 in 20 patients with conjunctival malignant melanoma, to the

    relationship between 5-year survival and the counts of cells positive for proliferating cell nuclear antigen,

    predominant cell type, maximum tumor depth, and site of tumor. The result, showing that patients with

    low counts had better prognoses, was then confirmed with a Cox proportional hazards regression

    analysis. (The latter approach would probably have been best used directly.)

    Repeated measures ANOVA was used in the following studies: to check the effect of oat bran consumption

    on serum cholesterol levels at four time points;17 to compare various effects (including blood pressure levels

    and markers of alcohol consumption) in hypertensive men randomized to either a control group or to receive

    special advice about methods of reducing alcohol consumption;18 to assess the time trend of bloodpressure during a 24-hour monitoring period in patients receiving placebo or an active antihypertensive

    agent;19 and to monitor changes at three time points over 6 months in four indexes (body weight, serum

    osmolality, serum sodium, and blood urea nitrogen/creatinine ratios) for residents of a nursing home. 19

    The intraclass correlation coefficient was used in three other studies concerned with reliability (or

    reproducibility) of the measurements performed in neuropathic tests,21 a brief psychiatric rating scale,22

    and a method of grading photoageing in skin casts.23

    References

    1. Feinstein, 1990d; 2. Feinstein, 1996; 3. Oxford English Dictionary, 1971; 4. Lentner, 1982; 5. Tukey, 1968;

    6. Shrout, 1979; 7. Everitt, 1989; 8. Bland, 1990; 9. Sheiner, 1992; 10. Jones, 1994; 11. Luengo, 1991;

    12. Browman, 1993; 13. Civetta, 1992; 14. Paul, 1994; 15. Ellison, 1992; 16. Seregard, 1993; 17. Saudia, 1992;

    18. Maheswaran, 1992; 19. Tomei, 1992; 20. Weinberg, 1994; 21. Dyck, 1991; 22. Hafkenscheid, 1993;

    23. Fritschi, 1995.