e08_e01

7/28/2019 e08_e01

http://slidepdf.com/reader/full/e08e01 1/38

c 2006 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim10.1002/14356007.e08 e01.pub2

Design of Experiments 1

Design of ExperimentsSergio Soravia , Process Technology, Degussa AG, Hanau, Germany (Chap. 1, 2, 3 and 8)Andreas Orth , University of Applied Sciences, Frankfurt am Main, Germany (Chap. 4, 5, 6, 7 and 8)

1. Introduction . . . . . . . . . . . . . . . . . 11.1. General Remarks . . . . . . . . . . . . . 11.2. Application in Industry . . . . . . . . . 21.3. Historical Sidelights . . . . . . . . . . . . 31.4. Aim and Scope . . . . . . . . . . . . . . . 32. Procedure for Conducting

Experimental Investigations:Basic Principles . . . . . . . . . . . . . . . 3

2.1. System Analysis and Clear Denitionof Objectives . . . . . . . . . . . . . . . . 4

2.2. Response Variables and ExperimentalFactors . . . . . . . . . . . . . . . . . . . . 4

2.3. Replication, Blocking, andRandomization . . . . . . . . . . . . . . . 5

2.4. Interactions . . . . . . . . . . . . . . . . . 62.5. Different Experimental Strategies . . 62.6. Drawback of the

One-Factor-at-a-Time Method . . . . . 73. Factorial Designs . . . . . . . . . . . . . . 8

3.1. Basic Concepts . . . . . . . . . . . . . . . 83.2. The 2 2 Factorial Design . . . . . . . . . 93.3. The 2 3 Factorial Design . . . . . . . . . 123.4. Fractional Factorial Designs . . . . . . 154. Response Surface Designs . . . . . . . . 184.1. The Idea of Using Basic Empirical

Models . . . . . . . . . . . . . . . . . . . . 194.2. The Class of Models Used in DoE . . . 194.3. Standard DoE Models and

Corresponding Designs . . . . . . . . . 20

4.4. Using Regression Analysis to FitModels to Experimental Data . . . . . 22

5. Methods for Assessing, Improving,and Visualizing Models . . . . . . . . . 22

5.1. R2 Regression Measure and Q2

Prediction Measure . . . . . . . . . . . . 235.2. ANOVA (Analysis of Variance) and

Lack-of-Fit Test . . . . . . . . . . . . . . 245.3. Analysis of Observations and

Residuals . . . . . . . . . . . . . . . . . . . 265.4. Heuristics for Improving Model

Performance . . . . . . . . . . . . . . . . . 265.5. Graphical Visualization of Response

Surfaces . . . . . . . . . . . . . . . . . . . . 276. Optimization Methods . . . . . . . . . . 286.1. Basic EVOPApproachUsingFactorial

Designs . . . . . . . . . . . . . . . . . . . . 286.2. Model-Based Approach . . . . . . . . . 296.3. Multi-Response Optimization with

Desirability Functions . . . . . . . . . . 296.4. Validation of Predicted Optima . . . . 307. Designs for Special Purposes . . . . . . 317.1. Mixture Designs . . . . . . . . . . . . . . 317.2. Designs for Categorical Factors . . . . 337.3. Optimal Designs . . . . . . . . . . . . . . 357.4. Robust Design as a Tool for Quality

Engineering . . . . . . . . . . . . . . . . . 368. Software . . . . . . . . . . . . . . . . . . . 379. References . . . . . . . . . . . . . . . . . . 37

1. Introduction

1.1. General Remarks

Research and development in the academic orindustrial context makes extensive use of exper-imentation to gain a better understanding of aprocess or system under study. The methodol-ogy of Design of Experiments (DoE) providesproven strategies and methods of experimentaldesign for performing and analyzing test seriesin a systematic andefcientway. All experimen-tal parameters are varied in an intelligent and

balanced fashion so that a maximum of infor-mation is gained from the analysis of the ex-perimental results. In most cases, the time andmoney spent on the experimental investigationwill be greatly reduced. In all cases, an optimalratio between the number of experimental trialsand the information content of the results willbe achieved.

DoE is a powerful target-oriented tool. If

it is properly employed, creative minds witha scientic and technical background will bestdeploy their resources to reach a well-dened

7/28/2019 e08_e01


2 Design of Experiments

goal of their studies. In contrast to what re-searchers sometimes fear, experimenters willnot be hampered in their creativity, but willbe empowered for structuring their innovativeideas. Of course, adopting DoE requires disci-pline from the user, and it has proved very help-ful to take the initial steps together with an ex-pert with experience in the eld. The rewardsof this systematic approach are useful, reliable,and well-documented results in a clear time andcost frame. A comprehensible presentation anddocumentation of experimental investigations isgratefully acknowledged by colleagues or suc-cessors in research and development teams. Theapplication of DoE is particularly essential andindispensable when processes involving many

factors or parameters are the subject of empiri-cal investigations of cause – effect relationships.DoE is a scientic approach to experimen-

tation which incorporates statistical principles.This ensures an objective investigation, so thatvalid and convincing conclusions can be drawnfrom an experimental study. In particular, anhonest approach to dealing with process andmeasurement errors is encouraged, since exper-iments that are repeated under identical condi-tions will seldom lead to the same results. This

may be caused by the measuring equipment, theexperimenter, changes in ambient conditions, orthe natural variability of the object under study.Note that this inherent experimental error, ingeneral, comprises more than the bare repeata-bility and reproducibility of a measurement sys-tem. DoE provides basic principles to distin-guish between experimental error and a real ef-fect caused by consciously changing experimen-tal conditions. This prevents experimenters fromdrawing erroneous conclusions and, as a conse-quence, from making wrong decisions.

1.2. Application in Industry

In industry, increasingly harsh market condi-tions force companies to make every effort toreach and maintain their competitive edge. Thisapplies, in particular, in view of the followinggoals:

– Quality of products and services in confor-mance to market requirements

– Low costs to ensure adequate prots

– Short development periods for new or im-proved products and production processes(time to market)

Quality engineering techniques are powerfulelements of modern quality management sys-

tems and make it possible to reach these goals.One important challenge in this context is notto ensure quality downstream at the end of theproduction line, but to ensure product quality bya stable and capable production process whichis under control. By this means, ongoing testsand checks to prove that the product conformsto specication requirements are avoided. Thiscan be realized by knowing the important andcritical parameters or factors governing the sys-tem and through the implementation of intelli-gent process management strategies.

A methodical approach, sometimes referredto as off-line quality engineering [1, 2], focuseseven further upstream. By considering quality-relevantaspects in theearly stagesof product andprocess development, quality is ensured preven-tively in terms of fault prevention [3, 4]. Natu-rally, there is considerable cost-saving potentialin the early stages of product and process de-sign, where manufacturing costs are xed to alarge extent. The losses incurred for each de-sign modication of a product or process gradu-ally increase with time. In addition,designerrorswith the most serious consequences are knownto be committed in these early stages.

As an outstanding quality engineering tool,DoE occupies a key position in this context. Theemphasis is on engineering quality into theprod-ucts and processes. At the same time, DoE opensup great economic potential during theentire de-velopment and improvement period. It is well-

known that the implementation and use of cor-responding methods increases competitiveness[5 – 9]. DoE is applied successfully in all high-technology branches of industry. In the processindustry, it makes essential contributions to op-timizing a large variety of procedures and cov-ers the entire lifecycle of a product, startingfrom product design in chemical research (e.g.,screening of raw materials, nding the best mix-ture or formulation, optimizing a chemical reac-tion), via process development in process engi-

neering (e.g., test novel technological solutions,determine best operating conditions, optimizethe performance of processes), up to production

7/28/2019 e08_e01



(e.g., start up a plant smoothly, nd an operatingwindow which meets customer requirements atlow cost, high capacity, and under stable con-ditions) and application technology (give com-petent advice concerning the application of theproduct, customize the properties of products tospecic customer needs). In particular, technolo-gies like high-throughput screening or combi-natorial synthesis with automated workstationsrequire DoE to employ resources reasonably.

1.3. Historical Sidelights

The foundations of modern statistical experi-mental design methods were laid in the 1920sby R. A. Fisher [10] in Great Britain and wererst used in areas such as agricultural science,biology, and medicine. By the 1950s and 1960s,some of these methods had already spreadinto chemical research and process develop-ment where they were successfully employed[11 – 13]. During this period and later, G. E. P.

Box et al. made essential contributions to the ad-vancement and application of this methodology[5, 14 – 18].

In 1960, J. Kiefer and J. Wolfowitz initi-

ated a profound research of the mathematicaltheory behind optimal designs, and in the early1970s the rst efcient algorithms for so-calledD-optimal designs were developed.

Around the same time, G. Taguchi integratedDoEmethods into theproduct andprocess devel-opment of numerous Japanese companies [19].One of his key ideas is the concept of robust de-sign [1, 2, 4]. It involves designing products andprocesses with suitable parameters and param-eter settings so that their functionality remains

as insensitive as possible to unavoidable disturb-ing inuences. Taguchi’s ideas were discussedfruitfully in the United States during the 1980s[1, 5, 20, 21] and caused an increasing interestin this subject in Western industries. At the sametime, another set of tools, which in general is notsuitable for chemical or chemical engineeringapplications, became popular under the name of D. Shainin [22, 23].

The 1980s also saw the advent and spreadof software for DoE. Various powerful softwaretools have been commercially available for sev-eral years now. They essentially support the gen-eration, analysis, and documentation of experi-

mental designs. Experience has shown that suchsoftware can be used by experimenters once thebasic principles of the methods applied havebeen learned. A list of software tools is givenin Chapter 8.

Despite these stimulating developments, themajority of scientists and engineers in industryand academic research have still not yet usedDoE.

1.4. Aim and Scope

The target group of this article consists of scien-tists and engineers in industry or academia. Theintention is

– To give an insight into the basicprinciples andstrategies of experimental design

– To convey an understanding of the importantdesign and analysis methods

– To give an overview of optimization tech-niques and some advanced methods

– To demonstrate basically the power of provenmethods of DoE and to encourage their use inpractice

References for further reading and detailed

study are given on all subjects. In particular, [16,24], and [25] may be regarded as standard mono-graphs on DoE in their respective languages.

2. Procedure for ConductingExperimental Investigations: BasicPrinciples

DoE should become an integral part of the regu-

larworking tools of allexperimenters. Themeth-ods provided by this approach may be used inexperimental investigations on a smaller scaleas well as in large projects which, depending ontheir importance and scope, may involve puttingtogether an interdisciplinary team of appropri-ate size and composition. It proved to be ad-vantageous to also include staff who take careof the equipment or facility on-site. They of-ten bring in aspects and experiences with whichthe decision-makers involved are largely unfa-miliar. Moreover, involving, for instance, labo-ratory staff in the planning phase of experimentshas a positive effect on their motivation. A DoE

7/28/2019 e08_e01


7/28/2019 e08_e01



The results of experimental runs, i.e., the val-ues of response variables, are affected by vari-ous factors. In practically all applications therearedisturbing environmental factors, which maycause undesired variations in the response vari-ables. Someof themare hard to control oruncon-trollable, others simply may not even be known.Examples of such variables are different batchesof a starting material, various items of equip-ment with the same functionality, a change of experimenters, atmospheric conditions, and —last but not least— time-related trends such aswarming-up of a machine, fouling of a heat ex-changer, clogging of a lter, or drifting of a mea-suring instrument.

However, besides collecting and discussing

the uncontrollable and disturbing variables, it isessential to collect and weigh up those factors orparameters that canbe controlled or adjusted (in-dependent variables), such as temperature, pres-sure, or theamountof catalyst used. Thedecisionas to which of these experimental factors are tobe kept constant during a test series and whichare to be purposefully and systematically variedmust also be carefully weighed up. The entirescientic and technical system know-how avail-able by then from literature and experience, as

well as intuition, must decisively inuence notonly the choice of the factors to be varied (oneshould focus on the important ones here accord-ing to the latest knowledge) but also the deter-mination of the experimental region, i.e., of thespecic range over which each factor will be var-ied (e.g., temperature between 120 and 180 ◦ Cand pressure between 1200 and 1800 mbar). Agood experimental design will efciently coverthis experimental domainsuch that thequestionsrelated to the objectives may be answered whenthe experimental results are analyzed.

2.3. Replication, Blocking, andRandomization

To take the effects of disturbing environmen-tal variables into account and decrease their im-pact to a large extent, the principles of replica-tion , blocking ,and randomization are employed.

Moreover, by considering these principles, therisk of misleading interpretation of the results isminimized.

Replicates serve to make results more reliableand to obtain information about their variability.A genuine replicate should consider all possibleenvironmental inuences leading to variationswithin the response variables, i.e., the wholeexperiment with its corresponding factor-levelcombination should be repeated from the be-ginning and with some time delay in between.The results of data analyses must always be seenin the light of this inherent experimental error.Hence, replication does not mean, for instance,analyzing a sample of a product several times.This variability of a response variable is solelya measure of the precision of the analytical testprocedure (laboratory assistant, measuring in-strument).

It is of crucial importance that environmen-tal factors andexperimental factors of interest donot vary together such that changes in a responsevariable cannot be unambiguously attributed tothe factors varied. For example, if two catalysttypes were to be compared at various reactiontemperatures and if two differently sized reac-tors were available for this purpose, it would beunwise to conduct all experiments with one cat-alyst in the smaller reactor and all experimentswith the other catalyst in the larger reactor. If, in

this case, the results differed from each other, itwould be impossible to decide whether the cat-alyst type or the reactor type or both caused thedeviations. Theobjective of blocking is to prede-termine relatively similar blocks — in this case,the two reactors — in which test conditions aremore homogeneous and which allow a more de-tailed study of the experimental factors of inter-est. Regarding the selection of the catalyst typeand reaction temperature, the experiments to beconducted in each of the two reactors must besimilar in the sense that variations in the valuesof a response variable can be interpreted cor-rectly.

It is not always possible, however, to clearlyidentify unwanted inuences and to take theminto account, as is the case when blocking isused. Yet these side effects can be counterbal-anced by a general use of randomization. Here,in contrast to systematically determining whichexperiments are to be conducted, the order of theexperiments is randomized. In particular, falseassignments of time-related trends are avoided.Letus consider a rectifying column, for instance,in which the effects of operating pressure, reux

7/28/2019 e08_e01



ratio, and reboiler duty on the purity of the topproduct are to be examined. Let us assume thatthe unit is started up in the morning and that thewhole test series could be realized within oneday. Now, if one conducted all experiments in-volving a low reux ratio before those involvinga high reux ratio, the effect of the reux ra-tio could be falsied more or less by the unit’swarming up, depending on how strong this inu-ence is (poor design in Fig. 3). Such an uncon-trolled mixing of effects is prevented by choos-ing the order of the experimental runs at random(good design in Fig. 3).

Figure 3. Time-related effects caused, e.g., by instrument

drifts may falsify the analysis of the results when factor set-tings are not changed randomly (poor design). If environ-mental conditions vary during the course of an experiment,their effect will be damped or averaged out by randomizingthe sequence of the experiments (good design)

The decisive reason for employing the princi-ples of replication, blocking, and randomizationis therefore to prevent the analysis of system-atically varied experimental factors from beingunnecessarily contaminated by the inuences of unwanted andoften hiddenfactors. While block-ing and randomization basically do not involveadditional experimental runs, each replicate is acompletely new realization of a combination of factor settings. An appropriate relation betweenthe number of experimental runs and the relia-bility of its results must be established here onan individual basis.

2.4. Interactions

To avoid an overly limited view on the behav-ior of systems, it is of great importance to knowabout the joint effects of experimental factors,

that is, their interactions . Two variables are saidto interact if, by changing one, the extent of im-pact on a third, namely, a response variable, de-pends on the setting of the other variable. Inother words, interaction between two experi-mental factors measures how much the effect of a factor variation on a response variable dependson the level of the other factor. Interactions areoften not heeded in practice, or they are stud-ied at the price of spending large amounts of time and money on the associated experimen-tal investigation. In addition, what interactionactually means is often not clearly understood.In particular, interaction is not to be confusedwith correlation. Two variables are said to becorrelated if an increase of one variable tends

to be associated with an increase or decrease of the other. Especially factorial experimental de-signs (see Chap. 3) allow, among other things,a quantitative determination of interactions bet-ween varied experimental factors.

2.5. Different Experimental Strategies

When processes are to be improved or noveltechnical solutions are to be tested, but also

when plants are started up, several factors areoften varied, in the hope of meeting with short-term success, by using an unsystematic itera-tive trial-and-error approach until satisfactoryresults are eventually produced.The expenditureinvolved quickly takes on unforeseeable propor-tions without affording important insights intothe cause-and-effect relationships of the system.The result of this procedure is that, in the end, acomprehensible documentation is not available,and objective, reliable reasons for process op-

erations or factor settings are missing. Further-more, very little is known in most cases aboutthe impact of factor variations. Experimentingin this way might be acceptable for orientationpurposes in a kind of pre-experimental phase.However, one should switch to a judicious pro-gram as soon as possible.

To study causal relationships systematically,the experimental factors or parameters are usu-ally varied separately and successively, and thevalues of a response variable (product, process,or quality characteristic), such as the yield of achemical product, are shown in a diagram (seeFig. 4). This one-factor-at-a-time method (see

7/28/2019 e08_e01



Section 2.6), however, provides only few in-sights into the subject under study because theeffect of a particular factor is only known at asingle factor-level combination of the other fac-tors. The response variable may have quite an-other shape if the levels of the remaining factorsare set differently. If the experimental factorsin their effect on a response variable do not actadditively according to the superposition prin-ciple, i.e., if the factors inuence each other intheir effect on the response variable by existinginteractions, a misinterpretation of the results iseasily possible, particularly when optimum con-ditions are to be attained.

When statistical experimental design meth-ods are used, all considered factors are varied in

a systematic and balanced way so that a maxi-mum of information is gained from the analy-sis of the corresponding experiments. This maycomprise the statistically sound quantitative de-termination of the effects of factor variations onone or several response variables (see Chap. 3)or a systematic optimization of factor settings(see Chap. 6). Depending on the experimenter’sintention, the following questions can be an-swered:

– What are the most important factors of thesystem under investigation?

– To what extent and in which direction does aresponse variable of interest change when anexperimental factor is varied?

– To what extent is the size and direction of theeffect of a factor variation dependent on thesettings of other experimental factors (inter-actions)?

– With which factor settings does one obtaina desired state of a response variable (maxi-mum, minimum, nominal value)?

– How can this state be made insensitive to dis-turbing environmental factors or how can anundesired variability of a response variable bereduced (robust design)?

The question of which experimental strategyshould be chosen from a comprehensive rangeof methods will be governed by the objectivesto be achieved in each individual case, taking,e.g., system-inherent, nancial, and time-relatedboundary conditions into account. Every projecthas its peculiarities. Carefully planned experi-ments cover the experimental region to be in-vestigated as evenly as possible, while ensur-

ing to the largest possible extent that changingvalues in a response variable can be attributedunambiguously to the right causes. Informationof crucial importance is frequently obtained bya simple graphical analysis of the data withouthaving to employ sophisticated statistical anal-ysis methods, such as variance analysis or re-gression analysis. On the other hand, the beststatistical analysis is not capable of retrievinguseful information from a badly designed seriesof experiments. It is therefore decisive to con-sider basic DoE principles right from the begin-ning, above all, however, before conducting anyexperiments.

2.6. Drawback of theOne-Factor-at-a-Time Method

A crystallization process is used in the follow-ing to illustrate the deciency of the frequentlyused one-factor-at-a-time method. Factors inu-encing this system are, for instance, crystalliza-tion conditions such as geometry of the crystal-lizer, type andspeed of the agitator, temperature,residence time, and concentrations of additiveslike crystallization and lter aids, as well as of

two presumed additives A and B. Possible re-sponse variables may be bulk density, abrasion,hardness, and pourability of the crystallizationproduct. Let us assume the simple case that theeffects of the two experimental factors — addi-tive A and additive B — on the material’s bulk density are to be systematically examined withthe aim of obtaining a maximum bulk density.As mentioned before, the experimental factorsare usually examined and/or optimized sepa-rately and successively. In the example consid-

ered here, one would therefore begin by keep-ing factor B constant and varying A over a cer-tain range until A has been optimally adjustedin terms of a maximum bulk density and enterthe result in a diagram (see Fig. 4). The opti-mal value for A would then be selected and keptconstant.Thesame procedurewould then be em-ployed for B. The result of this is a presumablyoptimal setting for A and B, and hasty exper-imenters would jump to the conclusion that, inthis case, after varying A between 15 and 40 g/Land B between 5 and 17.5 g/L, the highest bulk density is obtained by setting A to 33 g/L and Bto 8.5 g/L and that its value is approximately 825

7/28/2019 e08_e01



Figure 4. The one-factor-at-a-time method can lead to misinterpretations in systems that are subject to interactions

g/L. However, the response surface in Figure 4,which shows the complete relationship betweenboth experimental factors and the response vari-able, reveals how misleading such a conclusioncan be. This drastic misinterpretation is basedon the, in this case, false assumption that the ef-fect of varying one factor is independent of thesettings of the other factor. For instance, by us-ing the response surface, one can show that thebulk density values take on a decidedly differentshape compared to the rst diagram when vary-ing A for B = 15 g/L. The following should benoted: If there are interactions between experi-mental factors, the one-factor-at-a-time methodis an unsuitable tool for a systematic analysisof these factors, which holds true in particularwhen factor settings are to be optimized. If suchinteractions can denitely be ruled out, it mightwell be used. In chemistry, however, it is ratherthe rule that interactions occur.

3. Factorial Designs

3.1. Basic Concepts

The statistical experimental designs most fre-quently used in practice are the two-level facto-rial designs . These designs are called two-levelbecause there are only two levels of settings, alower (

−) and an upper level (+), for each of the

experimental factors. A full two-level factorialdesign species all combinations of the lowerand upper levels of the factors as settings of theexperimental runs (2 n design, where n denotesthe number of factors). Their principle is illus-trated by a simple example of a chemical reac-tion for which the inuence of 2 (2 2 design) and3 (23 design) experimental factors on the prod-uct yield is to be examined (Sections 3.2 and3.3). For a growing number of factors, the num-ber of runs of a full factorial design increasesexponentially, and it provides much more infor-mation than is generally needed. Particularly forn > 4, the number of experimental settings can

7/28/2019 e08_e01



be reduced by selecting a well-dened subgroupof all 2 n possible settings of the factors withoutlosing important information. This leads to thefractional factorial designs (Section 3.4).

The restriction of initially using just two lev-els for each experimental factor often causessome uneasiness for experimenters using thismethod for the rst time. But by using two-levelfactorial designs, a balanced coverage of the in-teresting experimental region is achieved veryeconomically. Moreover — owing to the specialcombination of factor levels — it is also possibleto gain deeper insights from the associated indi-vidual values of the response variables. A deci-sive advantage of the two-level factorial designsis that they allow the effects of factor variations

to be systematically and reliably analyzed andquantied and that they provide information onhow these effects depend on the settings of theother experimental factors. These insights aregained by calculating so-called main effects andinteraction effects . In the calculation of these ef-fects, allexperimental results canbe used andareincluded to form well-dened differences of cor-responding averages (see Figs. 6 and 9), therebyincreasing the degree of reliability. The essentialresults of this effect analysis can be visualized

by simple diagrams (see, e.g., Fig. 7).In a factorial design, not only continuous ex-perimental factors, such as temperature, pres-sure, and concentration, which can be set to anyintermediate value, but also discrete or categor-ical factors, such as equipment or solvent type,may be involved. If at least one categorical vari-able with more than two levels is involved or if curvatures in the response variables areexpectedand to be explored, factorial designs with morethan two levels may be used, e.g., 3 n designs, inwhich all factors are studied at three levels each,or hybrid factorial designs with mixed factor lev-els like the 2 ×32 design, in which one factoris varied at two levels, and two factors at threelevels [24]. However, especially in the case of continuous factors, other so-called response sur-face designs are more efcient (see Chap. 4). Inthe following, the expression “factorial designs”always refers to two-level factorial designs.

For the sake of simplicity, replicates are ne-glected in the following examples, and variabil-ity in the process and in measurement are as-sumed to be very small. Note, however, thatbeing aware of the impact of experimental er-

ror on the reliability or signicance of calcu-lated effects is an essential principle of DoE andcrucial to drawing valid conclusions. Variabil-ity within individual runs having the same set-tings of the experimental factors will propagateand cause variability in each calculated variable,e.g., main effect or interaction effect, deducedfrom these single results. Experimental designs,particularly the factorial designs, minimize errorpropagation.

Factorial designs are treated in most text-books on DoE, e.g., [16, 24, 25, 29, 30].

3.2. The 2 2 Factorial Design

Let us suppose the inuence of two factors— catalyst quantity A and temperature B — onyield y of a product in a stirred tank reactor is tobe examined. Figure 5 shows the two levels of both factors involved, as well as a tabular and agraphical representation of the associated exper-imental 2 2 factorial design. The — in this casetwo — columns which contain the settings of theexperimental factors form the so-called designmatrix. The resultant values of the response vari-able y obtained for the settings A

−B

−, A+B

−,

A−B+, and A+B+ are referred to as yA − B − , yA+B − , yA − B+ , and yA+B+ respectively. Theyare entered in the column of the response vari-able and in the corresponding positions of thegraph.

Due to the special constellation of the exper-imental runs, it is possible to see how y changeswhen factor A is varied at the two levels of Band what happens when factor B is varied atthe two levels of A. Figure 5 reveals that, at thelower temperature of 70 ◦ C (B

−), an increase

of the catalyst quantity from 100 g (A −) to 150g (A+) increases the response variable yield by2 %, while at the higher temperature of 90 ◦ C(B+), increasing the catalyst quantity enlargesthe value of the response variable by 18 %, i.e.,

Effect of A for B −=E B − (A)= yA+B − − yA − B −

=+2%Effect of A for B +

=E B+ (A)= yA+B+ − yA − B+=+18%

7/28/2019 e08_e01


7/28/2019 e08_e01



a clearly higher impact on the response vari-able than the additive superposition of the in-dividual effects, E B − (A) and E A − (B), wouldlead one to expect. If the value of yA+B+ werenot 92 but 76 %, there would be no interactionbetween A and B. The effect of a variation of one of the two factors on the response vari-able would be independent of the adjustment of the other factor. In this case, the two individ-ual effects would be identical for each factor,i.e., E B − (A) = E B+ (A) = + 2 % and E A − (B) =EA+ (B) = + 14 %. The corresponding lines inthe interaction diagrams would then be parallel.It seems reasonable now to determine a quanti-tative measure for the interaction of two factorsas the difference of these individual effects, i.e.,

Figure 6. Calculation of the two main effects and the inter-action effect in a 2 2 factorial design

Interaction effect between A and B= IE (AB )= 1

2 [EB+ (A) − EB − (A)]

= (y A+B+ − y A − B+ )− (y A+B − − y A − B − )2

= y A+B+ + y A − B −2 − y A − B+ + y A+B −

2

=(y A+B+ − y A+B − )− (y A − B+ − y A − B − )

2= 12 [EA+ (B) − EA − (B)]

= IE (BA)=+8%

From these sequences of expressions it canbe seen that the interaction between A and Bcould be equally dened through the differenceof the individual effects of A or the individualeffects of B, giving the same numerical resulteach time. Moreover, the expression in the mid-dle shows that the interaction between A and B— as is the case for the main effects — is noth-ing but the difference of two averages (this isbasically the reason why a factor of 1

2 is intro-duced in the denition). This corresponds to thedifference of the averages of the results locatedon the diagonals in Figure 6.

The calculation of the two main effects andof the interaction effect is also represented geo-metrically in Figure 6. The corresponding anal-ysis table contains columns of signs, which al-low calculation of these effects. Each effect maybe computed as the sum of signed response val-ues divided by half the number of experiments,where the signs are taken from the column of thedesired effect. Note that the signs of the interac-tion column AB can be generated by a row-wisemultiplication of the main-effect columns A andB. Today, however, the effects do not need to becomputed like this anymore. Specic DoE soft-ware tools (see Chap. 8) use methods such asthose described in Section 4.4 and yield numer-

ical and graphical results more easily.A nal interpretation of the experimentscould read as follows: The catalyst shows a notyet satisfactory activity at the lower tempera-ture (70 ◦ C). An increase of the catalyst quan-tity from 100 to 150 g gives a slight improve-ment but does not yet yield satisfactory values.The situation is different at the higher tempera-ture (90 ◦ C), where the catalyst clearly performsbetter. In addition, an increase in the catalystquantity at this temperature has a strong impact

on yield.If all experimental factors are continuous, it

will be possible and useful to perform experi-

7/28/2019 e08_e01



Figure 7. Diagrams of main effects, interactions, and response surfaces illustrate conspicuously important relations governingthe system

mental runs at the center point. It is obtainedby setting each factor to the midpoint of its fac-tor range, in our example, to 125 g of catalystand 80 ◦ C. This isolated additional experimen-tal point gives a rough impression of the behav-ior of response variables inside the experimentalregion of interest. If the result in the center pointdoes not correspond to the mean of the resultsobtained in the corner points of the factorial de-sign, then the response surface of the responsevariable will have a more or less pronouncedcurvature that depends on the magnitude of thisdeviation. The graph in Figure 5 shows a resultof 70 % at the center point, which is slightly be-low the 72 % obtained by averaging the four re-sults of the factorial design. Thus, the responsesurface must be imagined as slightly sagging inits middle. Of course, with this single additionalexperiment, it is impossible to determine whichof the experimental factors is (are) ultimatelyresponsible for the curvature. This question can

only be settled by a response surface design (seeChap. 4).

The analysis shown for one response variableis performed for each response variable so thatthe effects of factor variations on every responsevariable are nally known.

3.3. The 2 3 Factorial Design

The concepts and notions introduced in the Sec-tion 3.2 can be generalized to three or more fac-tors.Let us suppose that, in addition to the exam-ple in Section 3.2, not only the effects of tem-perature andcatalyst quantity but also the impactof changing the agitator type on the yield are tobe examined. In analogy to Figure 5, the experi-mental factors with their respective two settingsas well as the 2 3 factorial design, which is ob-tained by realizing all possible combinations of factor settings, are represented in graphical andtabular form in Figure 8.

7/28/2019 e08_e01



Figure 8. Example of a 2 3 factorial design, including a graphical and a tabular representation of the experimental settings andresults. The experimental settings for agitator type “current” correspond to the 2 2 factorial design in Figure 5

By comparing the values at the ends of thevarious edges of the cube in Figure 8, it is pos-sible to perform a very elementary analysis. Foreach factor, the effect of its variation can be stud-ied for the four different constellations of theother two factors. For example, the change from

the current to the new agitator type does not leadto the presumed improvement in yield. This maybe veried by looking at the four edges goingfrom the front face to the back face of the cube.The deterioration is particularly severe at hightemperature where the yield decreases from 74to 67 % (for the lower level of catalyst) and from92 to 82 % (for the upper level of catalyst).

More detailed information about the specicand joint effects of the factors is obtained bycalculation of the main effects and interactioneffects introduced in Section 3.2.

The main effect thereby indicates how mucha response variable is affected on average by avariation of a factor and is measured as the dif-ference in theaverage response for the twofactorlevels. Figure 9 illustrates this for the factors A(catalyst) and B (temperature). The main effectfor C (agitator type) is obtained analogously bycalculating the difference of the two averages atthe back face and the front face of the cube.

The interaction effect of two factors, moreprecisely, the two-factor interaction was intro-duced in Section 3.2. Generally, there will be

n2

=n · (n − 1)

2

two-factor interactions, where n denotes thenumber of factors. For three factors there are

32 =3

two-factor interactions, namely, AB, AC, andBC. The calculation of a two-factor interactionin a 23 factorial design will be demonstrated forAB. This interaction is obtained by calculatingthe two-factor interaction IE C − (AB)at the lowerlevel of factor C and the two-factor interactionIEC+ (AB) at the upper level of factor C, andthen by averaging these two values:Interaction effect between A and B

= IE (AB )= IEC − ( AB )+ IEC+ ( AB )

2

=60+92

2− 62+74

2 + 57+822

− 59+672

2= 60+92+57+82

4 − 62+74+59+674

=+7 .25%

A two-factor interaction in a factorial designwith more than two factors is obtained by tak-ing the average of all individual two-factor in-teractions at the different constellations of theother factors. The calculation of IE(AB) is alsoillustrated in Figure 9, where it is seen to be thedifference of averages between results on twodiagonal planes. Due to the inherent symmetry

7/28/2019 e08_e01



Figure 9. Calculation of main effects and interaction effects in a 2 3 factorial design shown for the main effects of A and Band their two-factor interaction AB

of the design, the calculation of the other two-factor interactions, IE(AC) and IE(BC), can beperformed in a similar way. This leads to the

respective columns of signs within the analysistable in Figure 9 and to the corresponding diag-onal planes within the cube.

Now, if the interaction between A and B at thelower level of C differs from their interaction atthe upper level of C, there will be a three-factor

interaction, which is dened by the difference of these two individual two-factor interactions:

7/28/2019 e08_e01



IE (ABC )= 1

2 [IEC+ (AB ) − IEC − (AB )]= 1

257+82

2 − 59+672 − 60+92

2 − 62+742

= 62+74+57+824 − 60+92+59+67

4= − 0.75%

Note that this result can also be obtainedby using the column of signs corresponding tothe three-factor interaction ABC in Figure 9. Asin the case of two-factor interactions, the signswithin this column are obtained by a row-wisemultiplication of the signs of the main effectcolumns A, B, andC. Obviously, the three-factorinteraction is also the difference of two averages(as in the case of the two-factor interaction pre-sented in Section 3.2, this is the reason whythe factor of 1

2 is introduced in the denition).

They are obtained by averaging the results lo-cated at the vertices of the respective two tetra-hedra which make up the cube.

Thenumerical results of all effects obtainablefrom the 2 3 factorial design are summarized inthe following:

ME(A) = 9.25 %ME(B) = 19.25 %ME(C) = − 5.75 %IE(AB) = 7.25 %IE(AC) = − 0.75 %IE(BC) = − 2.75 %IE(ABC) = − 0.75 %

Note again that numerical results like theseand graphical results like those of Figure 7 areeasily obtained by using specic DoE software(see Chap. 8). Diagrams of main effects and in-teraction effects may also be generated for fac-torial designs with more than two factors.

Even though three-factor interactions mayre-ally exist in some cases, it is seldom that theyplay an essential role. So, if their absolute valueis clearly higher than most of the main effectsand two-factor interactions, it seems reasonableto conclude that the experimental error is of amagnitude that does not allow the reliable esti-mation of most of the effects (perhaps becausethey are very small) and/or that at least one re-sponse value has been corrupted by a gross sys-tematic error.

If, on the whole, systematic errors can beexcluded, it is a legitimate practice to neglecthigher-order interactions, such as three- and

four-factor interactions, because main effectstend to be larger than two-factor interactions,which in turn tend to be larger than three-factor

interactions, and so on. Moreover, if no infor-mation about the magnitude of the experimentalerror is available, it will be possible to obtaina rough estimate of this error by using higher-order interactions, like ve-, four-, and eventhree-factor interactions.

3.4. Fractional Factorial Designs

For a growing number n of experimental fac-tors, the number of experimental settings 2 n in-creases exponentially in a full factorial design.Simultaneously, the proportion of higher-orderinteractions increases rapidly. For instance, if n= 5, there are 5 main effects, 10 two-factor in-teractions, and 16 interactions of higher order.

Obviously, if n is not small, there is some re-dundancy in a full factorial design, since higher-order interactions are not likely to have appre-ciable magnitudes. At this point, the followingquestions arise:– Is it possible to reduce the amount of exper-

imental effort in a sophisticated way so thatthe most important information can still beobtained by analysis of the data?

– Is it possible to study more experimental fac-tors instead of higher-order interactions withthe same number of experimental settings?The answer in both cases is “yes”. It leads

to the 2 n − k fractional factorial designs, where2n − k denotes the number of experimental set-tings, n the number of experimental factors,and k the number of times by which the num-ber of settings has been halved compared tothe corresponding complete 2 n design with thesame number of experimental factors (1/2 k ·2n

= 2n − k ). The application of fractional factorial

designs yieldsa reduction in experimental effort,which is adapted to the complexity of the sys-tem under investigation and to the informationrequired.

Once the experimental runs have been per-formed, fractional factorial designs are analyzedlike full factorial designs, except that fewer val-ues are available. As introduced in Sections 3.1,3.2, and 3.3, effects are obtained by calculat-ing corresponding differences of averages. How-ever, by doing so, one will discover that effectsare confounded. Confounding is dened as a sit-uation where an effect cannot unambiguously beattributed to a single main effect or interaction.

7/28/2019 e08_e01



Figure 10. Example of a 2 3− 1 fractional factorial design. In the top section the experimental settings and results are shown ingeometrical and tabular form. In the bottom section the problem of confounding is illustrated: Calculating ME(C) and IE(AB)leads to the same expression, which is actually the sum of both. It is not possible to determine whether the calculated value iscaused by the main effect of C, by the interaction effect of A and B, or by both

Let us consider the 2 3− 1 design, in which theeffects of three factors are studied with four ex-perimental settings (see Figure 10). This is anexample of a half-fraction factorial design (seeFigure 11). By calculating the main effect of C

[ME(C) on the left-hand side of Fig. 10] the out-come is not the difference between the averagesof four (as in the case of the 2 3 design) but theaverages of only two results which are calcu-lated. A similar situation occurs when calculat-ing the interaction between A and B [IE(AB) onthe right-hand side of Fig. 10]. Moreover, calcu-lating these effects leads to the same expression.It is actually the sum ME(C)+IE(AB) of both ef-fects. It can also be veried that ME(A) is con-founded with IE(BC), and ME(B) with IE(AC).This is an example of a so-called resolution IIIdesign.

A slightly different situation occurs whenconsidering the 2 4 − 1 design (see Fig. 11). Herethe main effects are confounded with three-factor interactions, e.g. ME(A) with IE(BCD),and the two-factor interactions are confounded

with each other, e.g., IE(AC) with IE(BD). Thisis an example of a resolution IV design. The25 − 1 design also shown in Figure 11 is an ex-ample of a very efcient resolution V design.

The resolution of a fractional factorial designlargely characterizes the informationcontent ob-tainable by analyzing the results of a fractionalfactorial design:

– A fractional factorial design of resolution IIIdoes not confound main effects with eachother but does confound main effects withtwo-factor interactions.

– A fractional factorial design of resolution IVdoes not confound main effects with two-

7/28/2019 e08_e01



Figure 11. Geometric representation of the most important half-fraction factorial designs and their respective tabular repre-sentation, i.e., their design matrices

factor interactions but does confound two-factor interactions with other two-factor in-teractions.

– A fractional factorial design of resolution V

does not confound main effects and two-factor interactions with each other but does

confound two-factor interactions with three-factor interactions.

The resolution of the most important frac-tional factorial designs can be seen in Table 1.

Designs of resolution III are used in the earlystages of experimental investigations to gain arst insight into the possibilities and behaviors

7/28/2019 e08_e01



Table 1. The 2 n − k (fractional) factorial designs with a maximum of 32 experimental settings and their resolution. 2 n − k denotes the numberof experimental settings, n the number of experimental factors, and 1/2 k the factor by which the number of settings has been reducedcompared to the corresponding full factorial design with the same number of experimental factors.

Resolution Number of experimental settings

4 8 16 32

III2

3 − 1

25 − 2

– 27 − 4

29 − 5

– 215 − 11

217 − 12

– 231 − 26

IV 24 − 1 26 − 2 – 2 8 − 4 27 − 2 – 2 16 − 11

V 25 − 1

VI 26 − 1

Full 22 23 24 25

of systems. Particularly benecial are saturated designs for 2 n −1 factors in which all degreesof freedom of a 2 n design are exploited. Theabove-mentioned 2 3− 1 design is an example of

such a design in which three factors are studiedwith four experimental settings. It is “generated”by replacing all columns of signs in the analy-sis table of Figure 6 by experimental factors, i.e.,the settings of factor C at the top of Figure 11 aredetermined by the column of the two-factor in-teraction AB in the analysis table of Figure 6. Afurther important example of a saturated designis the 2 7− 4 design, which allows seven factors tobe studied with eight experimental settings. It isgeneratedbyreplacingABbyD,ACbyE,BCbyF,andABCbyGintheheadingofthecolumnsof signs used for the calculation of effects in Figure9. With resolution III designs special care mustbe taken when interpreting the analysis results.Calculated main effects may also be two-factorinteractions of other factors.

Resolution IV designs are employed to gainunambiguous information about the individualimpact of the experimental factors, while unam-biguous information about their two-factor in-teractions is not yet required. Designs of resolu-tion III and IV are mostly used to nd out whichfactors play an important role. This techniqueof isolating the important factors is sometimesreferred to as screening .

Using designs of resolution V or higher doesnot lead to any loss of decisive information,sincethe main effects and two-factor interactions areonly confounded with higher-order interactions,which in most cases can be neglected. Particu-larly when n > 4, experimental settings can be

halved to achieve designs of at least resolutionV.

Fractional factorial designs support the itera-tive nature of experimentation. The experimen-tal runs of two or more fractional factorial de-signs conducted sequentially may be combined

to form a larger design with higher resolution.In this way it is possible to resolve ambiguitiesby the addition of further experimental runs.

Half-fraction designs possess an interestingand useful projection property: the omission of one arbitrary column in the designs always leadsto a full factorial design with respect to the re-maining columns or factors. So, if a factorprovesto have no signicant effect on a response vari-able, the remaining factors can be analyzed as inthe full factorial design. For example, the omis-sion of factor B in the 2 4− 1 design of Figure 11leads to a 2 3 design for the factors A, C, and D.This may be veried by examining the three re-maining columns in the table but also by pushingthe upper faces of the two cubes into the lowerfaces.

4. Response Surface Designs

In some situations factorial designs in which allfactors are varied by only using two settings arenot adequate for describing the behavior of anexperimental system, because a more detailedinsight is needed to predict its responses or tond optimal factor settings. In this case, it is of-ten necessary to extend factorial designs and todo additional experiments at other points in theexperimental domain. To decide which pointsto use, response surface designs are used [14,31]. These designs are based on mathematical

models that describe how responses depend onexperimental factors.

7/28/2019 e08_e01



4.1. The Idea of Using Basic EmpiricalModels

A model is a way to describe a part of reality;ideally it is much simpler and more manageablethan that which it describes, but it should never-theless adequately fulll a predened modelingpurpose. Since a model can only be an approx-imate description of the original, it is importantto be aware of this purpose when constructingmodels (see Chap. 2 for typical goals in conjunction with DoE). Models used in DoE arepolynomials in several variables, in which the yvariable is a response, and the x variables are ex-perimental factors. A set of coefcients is usedto describe how the y variables depend on the x

variables. Often a process or experimental sys-tem can only be adequately described by morethan one response, in which case there will beone model for each response, and each modelwill have its own set of coefcients.

Coefcients are estimated from experimen-tal data which are collected in a correspondingexperimental design. Estimating model coef-cients from experimental data is called modeltting and represents the principle task of sta-tistical analysis to be performed as soon as re-

sponse values from experiments are available.An equally important task consists of assessingthe quality of a tted model, as an important steptowards qualifying it for use in prediction andoptimization or whatever other purpose. In factit is often possible to improve the performanceof a model by taking small corrective measuressuch as those described in Section 5.4.

The important ideas behind empirical mod-eling are:

– The model must describe the entire behaviorof the process or experimental system that isrelevant to answering the questions of the ex-perimenter.

– The experimental design is based on themodel and determines which information canbe extracted from the experimental results.Statistical analysis is only the tool for extract-ing this information.

– Neglecting experimental design essentiallymeans missing out on nding all the relevantanswers. Picking thewrongdesignalso meansmissing out on relevant answers.

– Enlarging the scope of the questions alwaysmeans extending the model and adding exper-iments to the design.

Using models and setting up designs in thisfashion requires that one proceed in the system-

atic way that has been described in Chapter 2.There are several additional aspects that play animportant role in modeling:

– Denition of the experimental domain inwhich the model should be useful for predic-tion

– Selection of the correct model type– Choice of the experimental design that corre-

sponds optimally to the experimental domainand to the model type chosen

– Estimation of coefcients by regression anal-ysis– Qualication and renement of the model by

continued statistical analysis– Validation of model predictions by conrma-

tory experiments– Use of the model for the purpose of nding

optimal factor settings

4.2. The Class of Models Used in DoE

The class of models that is normally used in DoEcontains only models which are linear with re-gard to the unknown coefcients. This is why, inorder to estimate coefcients, linear regressionmethods can efciently be used [32]. Nonlin-ear or rst-principle models, such as mechani-cal models, reaction-kinetic models, and moregeneral dynamic models are only rarely useddirectly when designing experiments; they arecommonly approximated by simple polynomialmodels at the cost of restricting the domain of validity of the model. A direct generation of op-timal designs for nonlinear models, i.e., modelsthat are nonlinear in the parameters that are tobe estimated, is sometimes possible. However,it is particularly important that such models arevery accurate, that experimental errors are smalland that initial estimates of the parameters arealready available. It is this last point that oftenmakes setting up the correct design very difcult

[17].Polynomial models used in DoE are built upas a sum of so-called model terms:

7/28/2019 e08_e01



y= b0 + bA xA + bB xB + bAB xA xB +bAA x2

A + bBB x2b + ε

This is an example of a quadratic model fortwo factors A and B, containing a constant termb0 , linear terms bA x A and bB x B , an interac-tion term bAB x A x B , quadratic terms bAA x2

A andbBB x2B , and an error term .

The x represent the settings of the factors inthe experimental domain. In factorial designs,theyare coded as −1 and + 1. Whenfactors havecontinuous scales, like temperature or pressure,this coding can be understood as a simple lineartransformation of the factor range onto the in-terval [ −1, + 1]. This transformation is calledscaling and centering; the corresponding equa-tion is:

x centered & scaled =2 ( x− xcenter ) / (xmax − xmin ) .

Since x center = ( x max + x min )/2, x max trans-forms into + 1 and x min transforms into −1(or simply + and − respectively, when usingfactorial designs). Centering and scaling allowsthe inuences of different factors with differentscales to be compared. In the following discus-sion it is assumed that all factors are either codedor scaled and centered in this fashion.

The b are the coefcients that are estimated

by regression after the experiments have beencompleted. A question that arises is: How do cal-culated effects in a factorial design as describedin Chapter 3 compare to estimated coefcientsof linear or interaction terms in a tted model?

The answer to this question is quite interest-ing. Estimating coefcients by multiple linearregression and calculating effects for factorialdesigns as described in Chapter 3 aremathemat-ically equivalent. In fact, coefcients are simplyhalf of the corresponding effects: Calculating a

main effect of a factor ME(A) means estimat-ing the difference ∆ y in the response that hasbeen provoked by changing the factor A fromits lower to its upper level. In contrast, the cor-responding coefcient, bA , is the geometricalslope of the curve describing the dependencyof y upon x , i.e., ∆ y / ∆ x . Since DoE is based onscaled and centered variables, ∆ x is exactly 2.So bA can be estimated by ME(A)/2. This is alsotrue for interaction effects: bAB canbe estimatedby IE(AB)/2. Estimators are often denoted by ,so bA = IE (AB ) / 2. The constant b0 can be es-timated by calculating the mean of all responsevalues.

For the example that was discussed in Sec-tion 3.2, the model equation is y = b0 + bA x A +bB x B + bAB x A x B and the estimated coefcientsare:

b0 = 15 (74+92+70+60+62)=71 .6%

bA = ME (A) / 2=5%bb = ME (B) / 2=11%bAB = IE (AB ) / 2=4%

The benet of using coefcients lies in thegreater generality of the response surface mod-els. These allow:

– Prediction of response variables within theex-perimental region

– Use of quadratic models for modeling max-ima and minima

– Use of mixture models for modeling formu-lations (see Section 7.1)

– Use of dummy variables for modeling cate-gorical factors (see Section 7.2)

– Correcting factor settings when prescribedsettings cannot be exactly met in the exper-iment

– Nonstandard domains for the model (and thedesign) that are subject to additional con-straints

4.3. Standard DoE Models andCorresponding Designs

For standard DoE models the designs can becho-sen off the peg. This means that the design struc-ture is predened and available in the form of adesign table (see, e.g., Fig. 11 for half-fractionfactorial designs). For nonstandard models anoptimal design has to be generated by using amathematical algorithm (see Section 7.3).

Standard DoE models are

– Linear models (i.e., linear with regard to thefactor variables), containing the constant termand linear terms for all factors involved

– Interaction models, which additionally con-tain interaction terms of the factors involved

– Quadratic models, which, in addition to allinteraction terms, contain quadratic terms

Standard models for two and three factors areshown in Table 2, examples of response surfacesare shown in Figure 12.

7/28/2019 e08_e01



Figure 12. Examples of response surfaces of standard models for two factors: linear (A), interaction (B), and quadratic (C)models

Table 2. Standard models for two and three factors

Linear model Interaction model Quadratic model

Two factors y = b0 + bA x A + bB x B . . . + bAB x A x B . . . + bAA x 2A + bBB x 2

BThree factors y = b0 + bA x A + bB x B + bC x C . . . + bAB x A x B + bAC x A x C + bBC x B x C . . . + bAA x 2

A + bBB x 2B + bCC x 2

C

Adding isolated interaction terms to linearmodels, taking away interaction terms from in-teractionmodels, takingaway square terms fromquadratic models, or even adding cubic termslike bAAB x2

A xB , bABC x A x B xC , or bAAA x3A ,

to quadratic models gives nonstandard modelswhich can also be used for DoE. Nonstandardmodels are also obtained when mixture com-ponents are investigated together with normalfactors, or when so-called dummy variables (in-dicator variables) are used to code categoricalfactors that have three or more settings.

Optimal designs for standard and nonstan-dard models are:

– Linear models: resolution III factorial de-signs or so-called Plackett – Burman designs(which are very similar to factorial designs) if no interactions are present

– Interaction models with only some interaction

terms: resolution IV factorial designs or so-called D-optimal designs (see Section 7.3);resolution IV designs are also used for linearmodels when interactions may be present butare assumed to be unimportant

– Interaction models with all interaction terms:resolution V (or higher) factorial designs orD-optimal designs

– Quadratic models: central composite de-signs (CCD; see below) or so-called Box –Behnken designs

– All nonstandard models: D-optimal designs(see Section 7.3)

Table 3. Design matrix of a CCD for two factors. The run Nos. 1 to4 form the factorial design, Nos. 5 to 8 the star points, and No. 9the center point.

No. A B y

1 − 1 − 12 1 − 13 − 1 14 1 15 − 1.41 06 1.41 07 0 − 1.418 0 1.419 0 0

Factorial designs are discussed in detail inChapter 3. Central composite designs (CCD)are extensions of factorial designs, in which so-called star points and additional replicates at thecenter points are added to allow estimation of quadratic coefcients (see Fig. 13 and Table 3).The number of star points is simply twice the

number of factors and, ideally, the number of replicates at the center is roughly equal to thenumber of factors. The distance α from the cen-ter point to the star points should be greater thanone, i.e., the star points should be outside the do-main dened by the factorial design. A star dis-tance α = 1 is sometimes used; then star pointslie in the faces of the factorial design. A goodalternative to a CCD is the Box – Behnken de-sign, in which no design points leave the facto-rial domain [14]. Box – Behnken designs do not

contain a classical factorial design as a basis.CCDs are normally used with full factorial

designs, and sometimes with resolution V de-

7/28/2019 e08_e01



Figure 13. Central composite design for two factors (left) and three factors (right)

signs. Another class of designs, called Hartley

designs [33], are similar to CCDs and are basedon resolution III factorial designs. When, as issometimesdone in industrial practice,only somequadratic terms are added to an existing interac-tion model, it is useful to include star points onlyfor those factors for which quadratic terms havebeen added.

4.4. Using Regression Analysis to FitModels to Experimental Data

Factorial designs are orthogonal for linear andinteraction models which meansthatcoefcientscan be estimated independently of each other.This is why calculating effects as described inChapter 3 is so easy. For more complex modelsand designs, the analysis will be based on mul-tiple linear regression (MLR) and its variants,such as stepwise regression, variable subset se-lection (VSS), ridge regression (RR), and partial

least squares (PLS) [34]. All of these methodsrepresent ways of tting the model to the data, inthe sense of minimizing the sum of squared dis-tances from measured response values to modelvalues (Fig. 14). They differ in that the mini-mization procedure is subject to different con-straints, and their performance differs only in thecase of badly conditioned designs, i.e., when thedesign is not really adequate for estimating allmodel coefcients.

If yi stands for the observed response valueat experiment i and yi represents the predictedvalue at that point, then the least-squares esti-mates for the coefcients ( b0 , bA , bB , bAB , . . .)

are those for which Σ( yi

−yi )2 is minimized (re-

member: yi depends on the b values).

5. Methods for Assessing, Improving,and Visualizing Models

There are manystatistical tools that allow a basic judgement of whether a tted model is sound. Auseful selection of these is:

– The regression measure R2 to check the qual-ity of t

– The prediction measure Q2 to check the po-tential for prediction and to prevent so-calledover-t, which means that the model is soclose to the data that it models experimentalerrors

– Analysis of variance (ANOVA), to comparethevarianceexplained by themodel with vari-ance attributed to experimental errors and tocheck for signicance

– Lack-of-t test (LoF) to assess the adequacyof the model– Analysis of the residuals to nd structural

weaknesses in the model and outliers in thecollected data

These methods are used both to qualify mod-els for prediction and optimization purposes andalso to nd indications of how to improve themodels in thesense of increasing their reliability.The different statistical methods are explainedand ways to interpret and use them toward modelimprovement are discussed.

7/28/2019 e08_e01



Figure 14. How least-squares regression works: squared distances from the model to the observed values are minimized

Figure 15. From the example in Section 3.2: observed val-ues yi andpredictedvalues yi from a modelwith coefcientsb0 = y =71 .6, bA =5 , bB =11 , bAB =4 (see Section 4.2),so that yi , yi − yi and yi − y can be calculated

5.1. R2 Regression Measure and Q2

Prediction Measure

The regression measure R2 is the quotientof squared deviations due to the model,SS reg =Σ(ˆyi −y)

2, and the total sum of squared

deviations of the measured data about the mean

value y SS tot =Σ( yi − y)2

, i.e.,R 2 = SS reg / SS tot .

If ordinary least squares is used for t-ting the model, R2 = 1 −SS res /SS tot , whereSS res =Σ( yi −yi )2 , because SS res = SS tot −SS reg , the sum of squared residuals. An examplefor the calculation of R2 is given in Figure 15and Table 4. R2 is always between 0 and 1 andshould be as close as possible to 1; how closeit should be depends upon the context of theapplication. When a measuring device is cali-brated, R2 should be above 0.99, whereas whenthe output of a chemical reaction with several

inuencing factors is examined, R2 may wellbe around 0.7 and still belong to a very usefulmodel.

R2 is a deceptive measure because it is proneto manipulation: by adding terms to a model, itwill always be possible to get a value of R2 thatis almost one, without really improving thequal-ity of the model. On the contrary, many modelswith a very high R2 tend to “over-t” the data,i.e., they model the experimental errors. This isa common and unwanted phenomenon becauseit decreases the prediction strength of a model.Especially when a design is not orthogonal, oneshould be wary of over-t.

To counteract overt and improve the reliabil-ity of model predictions it is useful to considerthe prediction measure Q2 . The calculation of Q2 is similar to that of R2 , except that in the sec-ondequation, theprediction error sumof squares(PRESS) is used instead of SS res :

Q2 =1 − PRESS / SS tot

Here, PRESS =Σ yi −ˆyi

2, where ˆyi is the

prediction for the ith experiment from a modelthat has been tted by using all experimentsexcept this ith one. In a sense yi −ˆyi is a fairmeasure of prediction errors. PRESS is alwaysgreater then SS res , and therefore Q2 is alwayssmaller then R2 . The relationship between Q2

and R2 and an example for the calculation of Q2

are given in Figure 16 and Table 5.In good models, Q2 and R2 lie close together,

and Q2 should at least be greater than 0.5. Thismay not be the case if the design is saturated oralmost saturated, which means that the numberof experiments that have been carried out equalsor only slightly exceeds the number of terms in

7/28/2019 e08_e01



Table 4. Calculation of R2 for the example above: SS res = 3.2, SS reg = 648, SS tot = 651.2, hence R2 = 1 − SS res /SS tot = 0.995. This isa very good value for the regression measure R2 .

A B yi y i y i − y i y i − y

1 − − 60 59.6 0.4 − 11.6

2 + − 62 61.6 0.4 − 9.6

3−

+ 74 73.6 0.4 2.44 + + 92 91.6 0.4 20.4

5 0 0 70 71.6 − 1.6 − 1.6

Squared sum SS res = 3.2 SS tot = 651.2

Figure 16. How Q2 relates to R2 : PRESS ≥ SS res and Q2 ≤ R2 ; yi is usually between yi and ˆyi

Table 5. Calculation of Q2 for the example above: PRESS = 260 and Q2 = 1 − 260/651.2 = 0.601. This is quite a reasonable Q2 value.Hence, there is no indication of over-t.

A B yi ˆy i y i − ˆy iy i − y

1 − − 60 52 8 − 11.6

2 + − 62 54 8 − 9.6

3 − + 74 66 8 2.4

4 + + 92 84 8 20.4

5 0 0 70 72 − 2 − 1.6

Squared sum 260 651.2

the model. In this case, Q2 may underestimatethe quality of the model, because leaving out sin-gle measurements may destroy the structure of

the design.It is more dangerous, however, to overesti-mate the quality of a model, which may happenif many experiments are replicated. In these ex-periments ˆyi will be very close to yi becauseonly one of the measurements is left out in thecalculation of ˆyi . This means that PRESS maybe unduly close to SS res , and Q2 unduly close to R2 . Nevertheless, Q2 is normally quite a usefulmeasure to prevent overt.

5.2. ANOVA (Analysis of Variance) andLack-of-Fit Test

Analysis of variance, usually abbreviated asANOVA, is a general statistical tool which isused to analyze and compare variability in dif-ferent data sets. It becomes a powerful tool thatcan be used for signicance testing when as-sumptions about the error structure underlyingthe data can be made. In the context of DoE,ANOVA can be used to complement regressionanalysis and to compare the variability causedby the factors with the variability due to experi-mental error.

Strictly speaking, “analysis of variance”should be referred to as “analysis of sum of squares” or, even more correctly, “analysis of

7/28/2019 e08_e01



the sum of squared deviations”, because it is ac-tually this sum of squares that is decomposed(Fig. 17).

Figure 17. Decomposition of SS tot into SS reg and SS res

However, the aim of ANOVA is to see to whatextent the variability in the measured data is ex-plainable by the model and to judge whether themodel is statistically signicant. To make thiscomparison, SS reg and SS res must rst be madecomparable by considering the degrees of free-dom.

Already when R2 is calculated, it is arguablethat the number of terms in the model p andthe number of runs in the design N should beconsidered in the calculation, and that R2

adjusted= 1 − [( N −1)SS res ]/[( N − p)SS tot ] should beused instead of R2 = 1 −SS res /SS tot . But sinceR2

adjusted normally lies somewhere between R2

and Q2 , these two are quite sufcient for a rstmodel assessment. In any case, N −1 is the totalnumber of degrees of freedom (of variation withrespect to the mean value), p −1 is the numberof degrees of freedom of the model (not countingthe constant), and N − p is the so-called residualnumber of degrees of freedom.

ANOVA compares the model to the residualsand tells us whether the model is statistically sig-nicant under the assumption that the residualscan be used to estimate the size of random ex-perimental error. MS reg = SS reg /( p −1) is com-pared to MS res = SS res /( N − p), also called themean square error (MSE), by subjecting them toa so-called F-test for signicance. If the quo-tient, F emp = MS reg /MS res , is greater than 1,then there is reason to suspect that the modelis needed in order to explain the variability inthe experimental data. If this has to be proven“beyond a reasonable doubt” (i.e., for statisticalsignicance), F emp must be greater than the the-oretical value of F

crit( p

−1, N

− p,γ ), which is

always greater than 1, where γ is the desiredlevel of condence.

Taking the square root of MS res furnishes areasonable estimate of the standard deviation of a single measurement, provided N − p is not toosmall and MS res does not change dramaticallywhen terms with small coefcients are addedto or removed from the model. This number iscalled the residual standard deviation (RSD orSD res ).

In the example that has been used SS reg =648, p −1 = 3, hence MS reg = 216. SS res = 3.2,n − p = 1, hence MS res = 3.2. F emp can be cal-culated as F emp = MS reg

MS res, but a comparison to F crit

(3, 1, 95 %) is not meaningful, because there is just one degree of freedom. Also the estimateRSD = √ MS res =1 .789 should not be taken tooseriously in this example.

The lack-of-t test addresses the question of whether the model may havemissed out on someof the systematic variability in the data. This testcan only be performed if some of the experimen-tal runs have been replicated: When replicatesare present, SS res can be further decomposedinto a pure error part SS p .e . and a remaininglack-of-t part SS lof (Fig. 18).

Figure 18. Decomposition of SS res into SS lof and SS p . e.

The corresponding signicance test com-pares MS lof = SS lof /( N − p − r ) to MS p .e. =SS p .e. / r , where r is the total number of replicatesof experimental runs in the design. (A replicatecount does not include the original run: if thereare ve runs at the center point, then one of theseruns is counted as the original and the four othersas replicates; hence, r = 4; if a design consistingof eight runs is completely replicated, there areeight runs that have each been replicated once,hence r = 8.)

As in ANOVA, a quotient MS lof /MS p .e.greater than 1 means that there may be a lack of t although the evidence is still weak, whereasa quotient greater than F

crit( N

− p

−r , r , 1

−α ) > 1 means a signicant lack of t at the errorprobability level α .

7/28/2019 e08_e01



The lack-of-t test method can be regarded asa goodcomplement to the Q2 predictionmeasurebecause the former works well when manyrepli-cates are involved, while the latter makes sensewhen there are only few replicates. Of course,the latter situation is more common in DoE.

5.3. Analysis of Observations andResiduals

Ideally, whenmodels are tted to data, the modeldescribes all of the deterministic part of the mea-sured values and the part due to experimentalerror is reected by the residuals. It is a pre-sumption in linear modeling, i.e., when usingleast squares tting as a criterion, that the exper-imental error is– Identically (i.e., evenly) distributed over all

measurements– Statistically independent of the measured

value, the order of the experiments, the pre-ceding measurement, the settings of the factorvariables, etc.To verify this assumption and to detect out-

liers, a rudimentary examination of residualsshould always be performed as a step towardqualifying the model. The most important testsfor residual structure are:– Test for inuence of single observations on

the model– Test for normal distribution and test for out-

liers– Test for uniform variance– Test for independency of measurement errors

These tests can be performed formally as de-scribed above for ANOVA and lack-of-t test-

ing. However, formal testing of hypotheses of the type necessary here usually requires a veryhigh number of residual degrees of freedom N − p. Since this number is actively and consciouslykept low in DoE, these tests do not often yielduseful results. This is why it is common prac-tice to plot observations and residuals in differ-ent types of graphs in order to detect structuralweaknesses and to nd hints on what the prob-lem might be and on how to avoid it.

Common plots are:– Observed values versus predicted values to

see whether there are observations that hadundue inuence on the model

– Ordered residuals in a normal probability plotto detect outliers and indications of a possiblynonnormal distribution of residuals

– Residuals versus predicted values in order tospot inhomogeneities in variance

– Residuals versus different factor variables todetect weaknesses in the model

– Residuals versus run order to check latent in-uences of time and autocorrelation

There is a further systematic approach to cop-ing with possible inhomogeneities in variancethat was proposed by G. E. P. Box and D. R.Cox in 1964 and that can be summarized in theso-called Box – Cox plot: In addition to ttinga model to the response y, they suggest ttingmodels to the transformed response ( yλ

−1)/ λ

for different λ and then to choose λmax suchthat residuals are closest to normally distributed.The Box – Cox plot, displays performance of atransformation against λ . It is a practical way of nding indications of what type of transforma-tion to the response data may be useful.

5.4. Heuristics for Improving ModelPerformance

An interestingalthough somewhatdangerousas-pect of DoE and statistical analysis is that of “pruning” models. What is meant by this is thefollowing: When a model displays some weak-nesses in the analysis phase explained above,there is often the possibility to improve the per-formance of a model by simple measures that donot require additional experiments:

– Excluding model terms with insignicant co-efcients

– Introducing a transformation– Excluding observations that seem to undulydominate the model or that lie far away fromthe model

– Include one or two terms in the model withoutoverstressing the design, i.e., without havingto perform further experiments

Of course there arealwayssituations in whicha model remains inadequate and further experi-ments must be performedto reach theobjectives.The following is a set of heuristics, which mayhelp to improve a model in one of several possi-ble situations that may arise during the analysisphase:

7/28/2019 e08_e01



– R2 and Q2 are both small (i.e., there is just badt): check for outliers using a normal proba-bility plot; check that the response values cor-respond to the factor variables; check pure er-ror if experiments have been repeated; check that no important factors and interactions aremissing in the model.

– R2 is high but Q2 is very low (below 0.4, i.e.,tendency for over-t): remove very small andinsignicant terms from the model (they mayreduce the predictive power of the model);check for dominating outliers (by comparingobserved and predicted values) and try ttingthe model without them (remember, however,that for screening designs with few residualdegrees of freedom, Q2 may be low although

the model is good).– There are clear outliers in the normal proba-bility plot: usually these outliers have not hadmuch inuence on the model (otherwise theywould be seen when comparing observed andpredicted values), however, they often leadto low R2 values; check what happens whenthey are removed from the model; check therecords, repeat the experiment; mistrust pre-dictions of the model in the vicinity of suchoutliers; consider that the outlier may contain

important information (maybe this is a newand better product).– There is some structure in a plot showing

residuals versus a factor variable (Fig. 19):this is a sign that the model is too weak andshould be expanded; this can usually not bedone without enlarging the design.

Figure 19. Standardized residuals are residuals divided byRSD = SD res (seeSection 5.2). When plotted against a fac-tor variable they may give an indication of how a model canbe improved

– In rare cases of analysis it may be observedin a plot displaying residuals versus predictedvalues that residuals are not homogeneous;

this may be the case when a response variesover several orders of magnitude and the sizeof errors is either proportional to the responsevalues or satises some other relation. Indi-cations of this can also be seen in the Box –Cox plot if the optimal exponent λmax devi-ates signicantly from unity. A transforma-tion of the response, e.g., taking logarithmsor square roots, may be the correct measure.

Sometimes, themeasures suggestedabove donot lead to a stable model; outliers seem to bepresent in spite of all attempts to improve the sit-uation, there are bends and curves in the normalprobability plot, coefcients are not signicantbut also not negligible, and so on. This is usuallyan indication that not enough experiments havebeen done. It is the authors’ recommendation inthis case to either

– Reduce all “manipulation” to a minimum(eliminate only obvious outliers from the dataand very small coefcients from the model),and use the model knowing that it is weak butmay still be useful, or to

– Strip the model down to linear, nd the opti-malconditions andrepeata largerdesignhere.

In any case the heuristics above cannot repaira model that is simply incorrect, and care mustbe taken not to succumb to the temptation of sys-tematically perfecting such an incorrect model.This is usually not the purpose of modeling.

5.5. Graphical Visualization of Response Surfaces

When statistical tests such as those described

above detect no serious aws in the model, it isuseful to plot the model by reducing it to two orthree dimensions and by representing it as a con-tour diagram (see Fig. 20) or a response surfaceplot (see Fig. 12).

When the model consists of more than twofactors, surplus factors are set constant, usuallyto their optimum level, i.e., where responses areoptimal. Setting a factor constant correspondsto slicing through the experimental domain andreducing its dimension by one. An interestingpossibility is setting a third factor constant atthree levels, and placing the three contours sideby side.

7/28/2019 e08_e01



Figure 20. Contour plot of yield, showing the dependencyon catalyst and temperature (for the data from Fig. 6 andFig. 15). It can be seen that there is an interaction betweenthe two factors

Visualizing models is particularly interestingwhen the modeling involves several responses,because contour diagrams can be used to ndspecication domains for the factors when spec-ication ranges for the response variables havebeen imposed (e.g., by the customer of the chem-ical product to be produced; see Fig. 21).

Figure 21. Superimposed contour plot for two responsesused for nding specication domains for the two factors x A and x B , given the specications for the responses Y (1)

and Y (2)

6. Optimization Methods

A major purpose of using response surface mod-eling techniques is optimization. There are sev-eral approaches to optimization depending onthe complexity of the situation. Single-response

optimizations areeasier to handleandare treatedin Sections 6.1 and 6.2. Multi-response op-timization requires weighting the different re-sponses according to their importance. This canbe quite difcult, and the relevant techniques aretreated in Section 6.3.

6.1. Basic EVOP Approach UsingFactorial Designs

A very basic approach to optimization was pro-posed in the 1960s by G. E. P. Box and N. R.Draper [15]. It is known as Evolutionary Op-eration, or EVOP (and has nothing to do withevolution strategy or genetic optimization algo-

rithms). When performing optimization experi-ments in a running production unit, it is not pos-sible to vary factors over very large domains.Hence, effects will be small and very hard todetect against an underlying natural variabilityof the response. The idea is to simply repeat asmall factorial design for, for instance, two fac-tors, as often as is necessary so that the randomnoise can beaveraged out and the effects becomeapparent.

The direction of maximal improvement of aresponse can be deduced from these effects andcan subsequently be used to nd a new positionfor a second factorial design. At this new posi-tion the second factorial is repeated until signif-icant effects or a satisfactory result is obtained.This procedure can be repeated until a stable op-timum is found (Fig. 22).

Figure 22. Example of a moving EVOP design (2 2 factorialdesign with a center point) that has found a stable optimum

on the right

7/28/2019 e08_e01



6.2. Model-Based Approach

When it is less difcult to establish the signif-icance of a model, as is usually the case on alaboratory scale or in a pilot plant, the modelcan directly be used for optimization. Gradientsof the polynomial models are calculated fromthe model coefcients because they point to thedirections of maximum change in the responses.These directions are then used in an iterativesearch for a maximum or minimum, as was pro-posed by Box and Wilson [18].

For quadratic and interaction models, itmakes sense to use conjugate gradients, whichcorrect the direction of maximum change by thecurvature of the model function (Fig. 23). In thisway the search for an optimum can often be ac-celerated. When a predictedoptimum lies withinthe experimental domain, it is usually quite agood estimate of the real optimum, particularlywhen a response surface model has been used.When it is outside of the domain, where, by con-struction, the model is most probably not ade-quate in describing the process or system understudy, it should be validated in the fashion de-scribed in Section 6.4.

Figure 23. How the conjugate gradient compares to the gra-dient

6.3. Multi-Response Optimization withDesirability Functions

In an industrial environment, it is usually notsufcient to maximize or minimize one responseonly. A typical goal is to maximize yields, min-imize costs, and reach specication intervals ortarget values for quality characteristics. But howshould one proceed when these goals are contra-dictory? This question arises quite frequently inpractice.

For example, yield, cost, and the quality char-acteristics are used as responses in an exper-imental design. The design should be chosento allow adequate models to be tted to all re-sponses. For each response, a model will be t-ted to its corresponding measured data. Now op-timization can be started. Before being able toapply any mathematical optimization algorithm,the goal of the optimization has to be translatedfrom a multidimensional target to a simple max-imization or minimization problem. This can ef-fectively be done by using desirability functions.A desirability function d j ( y(j) ) for one response y(j) , measures how desirable each value of thisresponse is. A high value of d j ( y(j) ) indicatesa high desirability of y(j) . Many types of de-

sirability functions have been proposed in liter-ature (e.g., [35]). Two-sided desirability func-tions are used if a target value or a value withinspecication limits is to be reached (Fig. 24,left), whereas one-sided desirability functionsare used if the corresponding response is to bemaximized (Fig. 24, right) or minimized.

An example of a two-sided desirability func-tion is:

dj y( j ) =0 for y( j ) >y ( j )max or y ( j ) <y ( j )

min ,

dj y( j ) = y( j ) − y( j )min

s / y( j )target − y( j )

mins

for y ( j )target >y ( j ) >y ( j )

min ,

dj y( j ) = y( j )max − y( j ) t

/ y( j )max − y( j )

targett

for y ( j )max >y ( j ) >y ( j )

target .

This function depends on a target value y( j )target ,

a minimum acceptable value y( j )min , a maximum

acceptablevalue y( j )max , and two exponents s,t > 0

(Fig. 24, left). High values of s and t emphasizethe importance of reaching the target; smallervalues leave more room to move within y( j )

min and

y( j )max .

To build one desirability function for all re-sponses, the geometric mean is usually taken:

D (y) = d1 y(1) ·d2 y(2) ·. . . ·dk y( k ) 1/k

By substituting the y by the predicted valuesof the model functions, desirability D becomesa function of the factor variables, D = D( x). Tond the best possible factor settings, this func-tion D should be maximized by using any ef-cient mathematical optimization procedure.

7/28/2019 e08_e01



Figure 24. Two responses with corresponding desirability functions: a two-sided desirability function for y(1) (target valuey(1)

target desired) and two different one-sided desirability functions for y(2) (maximization of y(2) desired)

In many typical applications, production costand yield tend to increase simultaneously, andthe optimizer, in trying to raise yield and lowercost, reaches an optimum somewhere within the

experimental domain. Then it is essential to havegood information about the quality of predic-tions in the interior of the domain. It may benecessary to t a quadratic model to obtain suchgood predictions.

In a similar fashion, quality characteris-tics are often contradictory, e.g., more textilestrength means less skin and body comfort;again, the optimizer will nd an optimal com-promise, which means suboptimal settings forthe individual responses within the experimen-tal domain.

Often, the predicted optimum is at the bor-der of the experimental region. It is in fact quitetypical that, in cases where linear or interac-tion models are used and no compromises asdescribed above are necessary, two or more cor-ners of the experimental domain are found to belocally optimal. From a mathematical point of view this seems to present a problem, but in factit is a very promising situation for further im-provement of the product or the process understudy. Although the models were made to work inside the experimental region, they often alsogive very useful information just outside the ex-perimental domain.

To be able to make a compromise betweendifferent y, it is sometimes more practical to usedesirability functions d j that are not exactly 0for y(j) > y(j)

max or y(j) < y(j)min , but only close to

0. This allows the comparison of factor settingsfor which desirability would otherwise be zero,

because some of the y are outside of specica-tion. In any case, constructing desirability func-tions is quite a delicate task, and it is useful to

play around with the desirability function whenlooking for an optimum.

When extrapolating outside the domain of x ,remember that predictions using the model de-

teriorate when the experimental region is left.This means that predicted optima outside the re-gion must always be validated by further exper-iments.

6.4. Validation of Predicted Optima

There are several reasons why predicted optimafrom empirical models should always be vali-dated by further experiments:

– Errors in observed responses propagate intopredictions of the model and hence into theposition of the predicted optima

– The model may be insufcient to describe therelevant behavior of the experimental systemunder study and lead to bad predictions of op-tima

– The desirability function used in multire-sponse optimization may not correctly depictall aspects of the real goal

– Predicted optima may lie outside the initialexperimental domain, where the quality of themodel is doubtful

– Predicted optima may be unsatisfying in thatnot all responses lie near the desired values orwithin specied ranges.

It is possible within the scope of linear mod-eling to calculate condence intervals for all re-sponses, and hence to get a good ideaof the qual-ity of prediction. This calculation of condenceintervals is only correct for qualied models asexplained in Chapter 5.

Condence intervals of this type must be mis-trusted if there is a signicant lack of t or if Q2

7/28/2019 e08_e01



is very low. They should also be mistrusted if residuals of observed responses in the vicinityof the predicted optima are very large.

In this case and when predicted optima areoutside the experimental domain, further exper-iments are inevitable. A good strategy for thisis to nd a sequence of experimental settingsfor which the predictions gradually improve (ac-cording to the model). This sequence will startwithin the experimental domain and will typi-cally leave it at some point (see Fig. 25). Exper-iments in this sequence should be carefully per-formed and checked against predicted results.This procedure will usually lead to very goodresults. It should be emphasized, however, thatthese experiments are purely conrmatory in na-

ture, they may lead to better products and betterprocesses but they will not lead to better (tted)models.

7. Designs for Special Purposes

The design methods that have been discusseduntil now cover many situations in which thecause and effect relationship between severalfactors and one or more responses of an exper-imental system are investigated. However thereare cases where further methods are necessaryto solve modeling problems. Four characteristicsituations are treated in this chapter:

– Designs for modeling and optimizing mix-tures are used in product optimization

– Designs for categorical factors may be usedfor raw product screening and for implement-ing blocking (as described in Section 2.3)

– So-calledoptimal or D-optimaldesigns which

are implemented for advanced modelingwhen nonstandard models or irregular exper-imental domains are to be investigated

– Robust design techniques have the aim of notonly optimizing response values but also re-sponse variability.

Further interesting topics, such as

– Nested designs, where the levels of factor Bdepend on the setting of factor A [16, 24]

– The eld of QSAR (quantitative structure ac-tivity relationships), which is becoming moreand more interesting in conjunction with thescreening of active ingredients in medicines

– The whole eld of DoE for nonlinear dynamicmodels involving differential algebraic equa-tions

and others, cannot be treated here. The advancedreader is invited to research on his own, particu-

larly on the latter two themes, which are still inconstant motion (both of them incidently makeextensive use of D-optimal designs to be shortlydiscussed in the following).

7.1. Mixture Designs

All designs described above assume that all fac-tor variables can be set and varied independentlyof each other in an arbitrary manner. This is thetypical situation in process optimizations, wherefactors are technical parameters such as temper-atures, pressures, ow rates, and concentrations.However, when the goal is to model the proper-tiesof a mixture in order to nd optimal ratios forits components, then this factor independencemay no longer be presumed. There are manybranches of the chemical industry and rubberindustries where mixtures are investigated, andthere is a whole range of characteristic problems:

– Optimizing themelting temperature of a metalalloy

– Reducingcost in producingpaintswhile keep-ing quality at least constant

– Optimizing the taste of a fruit punch or theconsistency of a yogurt

– Increasing adhesion of an adhesive– Finding the right consistency of a rubber mix-

ture for car tires

These are cases where experimental design

methods must be modied to cope with the factthat all components of the mixture must addup to unity, i.e., for three components, x A + x B+ x C = 1. Nevertheless there are useful modelsand designs for mixtures. They are based on theso-called mixture triangle or the mixture sim-plex when four or more components are involved(Fig. 26).

All simplices have the following properties,which facilitate analysis and the visualization of results:

– Response plots can be generated based on themixture triangle (as contour plots).

7/28/2019 e08_e01



Figure 25. Predicting optima based on the model for increasing domains: A strategy in optimization consists of predicting aseries of optima while slowly increasing the domain. Experiments shouldbe performed at these predicted optima for validationpurposes. In this way either a satisfactory optimum is reached or a further design is employed

Figure 26. For three-component mixtures the cube is no longer the appropriate geometrical object for modeling the exper-imental domain. The equation x A + x B + x C = 1 denes a triangle. Unfolding this triangle leads to the mixture triangle forthree factors (left).Mixtures with four components can be modeled by a simplex; setting one of the factors constant again leads to a triangle forthe others, e.g., x D = 0.5, i.e., x A + x B + x C = 0.5 (right)

– A simplex is very similar to the cube in thatits boundaries and also its sections parallel toits boundaries are again simplices in a lowerdimension. This allows contour plots to beused in the mixture triangle even in situa-tions where more than four components areinvolved; just set the surplus factors constantand put the three important ones into the mix-ture triangle.

– Lower and upper levels of the mixture factorscan be visualized geometrically as cutting off parts of the simplex. An active upper levelmeans cutting off a corner, an active lowerlevel means cutting off the base (or side) of a simplex. When upper levels are active, theexperimental domain is no longer a regular

simplex, but only a subset thereof, a so-calledirregular mixture region.

The best experimental designs for regularmixture regions are so-called simplex lattice de-signs. Simplex lattice designs place experimentsat the corners and along edges and axes of a sim-plex. Depending on the complexity of the modelthat is to be tted, different strategies in pickingout the points are used. The most typical designfor a linear model is the axial simplex designwhich consists of experiments at the corners, atthe centroid (equal amounts of all components)

and midpoints of the axes from the corners to thecentroid (Fig. 27, Table 6). Details can be foundin [36].

7/28/2019 e08_e01



Table 6. Design matrix for an axial simplex design for three mixture-components

Run no. Comp. A Comp. B Comp. C

1 1 0 02 0 1 03 0 0 14 0.6666 0.1667 0.1667

5 0.1667 0.6666 0.16676 0.1667 0.1667 0.66667 0.3333 0.3333 0.3334

Figure 27. Geometrical view of an axial simplex design forthree-component mixtures

When investigating mixtures, it is importantto keep in mind that, although it is possible to

correlate changes in the responses to changes inthe factor settings, it is principally impossibleto tell which factors have caused these changesin the responses, because whenever one fac-tor has changed, at least one other factor hasalso changed. So which one should be made re-sponsible? Example 1: A fruit punch containingpineapple, grapefruit, and orange juice seems tobecome tastier when pineapple juice is added.Example 2: A long drink containing pineappleand orange juice and vodka seems to becomeless bitter when pineapple juice is added. Is itpossible, in the two examples, to attribute thechange in taste to the change in the amount of pineapple?

When mixtures are modeled and optimized,it is not always necessary to use simplex designsandcorresponding models. There areprincipallythree ways to proceed, only one of which in-volves using simplex designs:

– Dene component ratios as factors and use theclassical factorial design or a CCD (instead of ratios, any other transformation leading to in-dependent pseudocomponents can be used).

– Use a simplex design, as described above,or D-optimal design, as described in Section7.3.

– Identify a ller component, for example, x C ,which does not have an effect on any of theresponses, use a factorial design, a CCD, or

a D-optimal design for the remaining compo-nents (together with the constraint x A + x B ≤1, if necessary), and regression analysis fora model in which the ller variable does notappear.

Analyzing mixture designs requires somecare, because due to the dependencies amongstthe mixture factors the models must be adaptedto the situation. H. Scheff e and D. R. Cox werethe pioneers who developed ways to use models

correctly for the mixture problem. Details can-not be included here. The reader is referred tothe literature [25, 36 – 38].

7.2. Designs for Categorical Factors

Not all factors that inuence a product or a pro-cess can be quantied. Examples of nonquanti-able or so-called categorical (or discrete) factorsare:

– The supplier of a raw material, who may in-uence some of the quality characteristics of an end product

– The date or the time of the year may inuencehow well a process will perform

– Different persons doing thesame experiments– The type of catalyst or solvent– Mutants of a strain of bacteria in a fermenta-

tion process

All these are possible inuencing factors thatcannot be quantied in a satisfactory manner.(In the case of investigating solvents it wouldbe a very good idea to consider polarity as a

7/28/2019 e08_e01



quantiable factor instead of just the type of sol-vent.) Hence, particularly when more than twoinstances of a categorical variable must be con-sidered, new design techniques are necessary.

Typical questions that arise in conjunctionwith categorical or discrete factors are:

– How large is the impact of a qualitativefactor?Is it signicant?

– If so, which instance yields the best results?– Does the categorical factor interact with other

factors? Are there interactions between differ-ent categorical factors?

– Is a possible optimum for the other (continu-ous) factors robust with respect to varying thecategorical factor? If so, where does it lie?

If there are just two instances or categoriesof a qualitative factor, it is easy to encode themas “−” and “+” and to use factorial designs. Thecategorical or qualitative factor can then be usedin calculations likea continuous quantitative fac-tor. Effects and coefcients can be calculated inthe usual way, and they measure how large theinuence of changing categories is. The sameis true for interactions with other factors; theymeasure how the inuenceof these factors on theresponses changes when changing categories.

Treating categorical factors with three ormore instances is much more delicate. Withingeneralized linearmodeling, it is possible to treatfactors with three or more instances by increas-ing the number of dimensions of the problem.Dummy variables are used to differentiate bet-ween instances: Let p1 , p2 be two dummy vari-ables andencodeinstances i1 , i2 , i3 in thefashionshown in Table 7.

Table 7. How to encode three instances of a categorical factor by

using two dummy variables

Coding for p1 p2

Instance i1 1 0Instance i2 0 1Instance i3 − 1 − 1

Geometrically the three instances becomethree points in the two-dimensional coordinatesystem of the dummy variables. The coded de-sign with example values for a response y as well

as the corresponding geometrical visualizationis shown in Figure 28.

Figure 28. Geometrical and tabular representation of thedesign and the response values

Now theadvantageof using dummy variablesbecomes apparent: Analysis of data can be doneusing the same regression methods as describedin Section 4.4, and the interpretation of results,although being somewhat different, is still rela-tively straightforward:

The model has the form

y= c0 + c1 p1 + c2 p2

where c1 and c2 are coefcients pertaining to thedummy variables p1 and p2 .

Model predictions for the instances i1 , i2 , i3become:

y (i1 ) = c0 + c1

y (i2 ) = c0 + c2

y (i3 ) = c0 − c1 − c2

So the coefcients c1 , c2 are actually a mea-sure of the extent to which instances i1 and i2differ from the mean c0 .

By additionally setting c3 = − c1 − c2 the

same measure has been found for instance i3 .For the example shown in Figure 28 the co-efcients are c0 = 9, c1 = −3, c2 = −1, c3 = 4.

Designs for categorical variables will nor-mally be adapted to the problem at hand andgenerated by a computer algorithm. In fact opti-mal designs, as described in Section 7.3, shouldbe used.

It is interesting to note that dummy variablescan be used to model interactions: Just use thesevariables in interaction terms of a model. The co-efcients then measure by how much the effectof another factor varies from a given instance toits mean effect taken over all instances (i.e., its

7/28/2019 e08_e01



Figure 29. An optimal design with constraints and inclusion of some old experiments

main effect). It would go beyond the scope of this article to go into details about this, in partic-ular about calculation of degrees of freedom. Ashort digression on dummy variables and theiruse for modeling blocking can be found in [32].

7.3. Optimal Designs

Theusualprocedurein planning anexperimentaldesign is to identify factors and responses, to de-termine ranges for the factors, and then to choosea standard design amongst those that have beendiscussed. This design may be factorial, Plack-ett – Burman, CCD, Box – Behnken, Simplex,or the like. However, in some situations none of

thestandard designs are suitable.Such situationstypically are:

– Use of nonstandard models– Use of nonstandard experimental domains

(i.e., constraints involving more than one fac-tor)

– Special restrictions on the number of runs ortests that can be performed

– Use of mixture designs when either upper fac-tor levels are active (this means that the reg-

ular simplex structure is corrupted) or whennormal continuous factors are to be investi-gated in the same design

– One or more categorical factors with three ormore discrete settings are present.

In these cases it is common practice to gener-ate a design by using a mathematical algorithmthat maximizes the information that the designwill contain. Such designs depend on the modelto be used and on the experimental domain to becovered.

Criteria involved in optimizing designs arebased on the extended design matrix, usuallydenoted by X , which is built up of the design

matrix and extended by a column for each termin the model. The basic idea behind optimal de-signs is to ensure that there are no correlationsbetween the columns of this X matrix. For if there are correlations here, then the inuence of

the terms involved cannot be resolved (this situ-ation is similar to that described in Section 7.1in conjunction with mixture components, whichare always correlated).

Working with optimal designs (Fig. 29) in-volves:– Dening the experimental domain (including

possible constraints)– Choosing the appropriate model– Specifying experiments that the experimenter

explicitly wants to perform or has already per-formed

– Selecting the optimization criteria for the de-sign

– Specifying approximately how many experi-mental runs can be performed.Typically not just one design is generated,

but a whole number of designs of differing size(test or run number). This makes it possibleto evaluate designs by comparing the optimal-ity criteria for different designs. Typical crite-

ria are D-optimality, G-optimality, A-optimality,and E-optimality. They are essentially variance-minimizing design criteria in the sense that thevariances of model predictions or model coef-cients are minimized. The most commonly usedcriterion is D-optimality, which leads to a maxi-mized determinant of the squared X matrix. Formore details, in particular, for further optimalitycriteria, the reader is referred to special literatureon optimal designs [39, 40].

Optimal designs will have a high quality if asufcient number of experimental runs or testshas been allowed for. They are analyzed by re-gression analysis like other designs, and the cor-

7/28/2019 e08_e01



responding models can be used for predictionand optimization purposes.

7.4. Robust Design as a Tool for Quality

EngineeringSometimesthegoal of investigationgoes beyondestimating the effects of inuencing factors andpredictingprocess behavioror quantiable prod-uct characteristics. A typical application of DoEtechniques is toward nding conditions, i.e., fac-tor settings, for which not only quantiable re-sponses are optimal (usually in a multivariateway as described in Section 6.3) but also thevariability of response values is minimal with re-

spect to disturbing or environmental factors thatact during production or eld use of the prod-uct. Typically, these inuences can not all becontrolled or are too expensive to be controlledeven during the experimentation phase.

Doing DoE with the goal of reducing vari-ability is known as robust design. In a robustdesign, control factors or design factors are var-ied in a design as described above, typically in ascreening design, and experiments are repeatedat each trial run so as to estimate the standarddeviation (or variance) at each trial. This dis-persion measure is employed as a new responsevalue. Statistical analysis and optimization toolssuch as those described above are then used toquantify and nally to minimize variation, whileat the same time improving product characteris-tics.

Taguchi [1, 19, 20] introduced an additionalidea into robust design, namely, the concept of so-called noise factors or environmental factors[1, 11]. These are introduced as perturbing fac-tors in a designed experiment with the idea thatthey simulate possible external inuences whichmay effect the quality of products after theyhaveleft the plant. These factors, which will in factincrease the standard deviation, are varied in asecond experimental design which is performedfor each of the trial runs in the design for thedesign factors. To distinguish between the twodesigns, the design for the environmental factorsis sometimes called outer array and that for thedesign factors is called inner array. The use of outer arrays is recommended, when noise fac-

tors can rather cheaply be varied independentlyof the controlled factors.Example: During the production of CDs a

transparent coating is applied to protect the op-tical layer. CDs are placed onto a turntable, adroplet of lacquer is applied, and the CD is spunto spread the lacquer. Parameters to be variedare the size of the droplet, the speed and the ac-celeration of the turntable and the temperaturein the room. CDs are to be subjected to differentextreme climatic conditions in order to simu-

late their performance in the eld. This is to bedone in a climate chamber. Factors to be variedin the corresponding outer array are humidityand temperature in the climate chamber. For theinner array a 2 4− 1 factorial design with 3 re-alizations at the center point is chosen, for theouter array a simple 2 2 full factorial (Table 8).Results from the four experiments in the outer

Table 8. A design for a quality engineering problem, using a 2 4 − 1 factorial with center point as inner array and a 2 2 factorial as an outerarray.

Run Design factors; A, B, C, D Environmental factors: H, T Mean Variance

H = − H = − H = + H = +

T = − T = + T = − T = +

A B C D y1 y2 y3 y4 y-mean y-var

1 − − − −

2 + − − +3 − + − +4 + + − −

5 − − + +6 + − + −

7 − + + −

8 + + + +9 0 0 0 010 0 0 0 011 0 0 0 0

7/28/2019 e08_e01



array are taken together, and the mean and thevariance are calculated and subjected to effectcalculation or regression analysis.

8. Software

Selected DoE software and suppliers are listedin Table 9.

Table 9. A selection of DoE software tools and providers (as of Nov. 2005).

Tool Company WEB link

Design-Expert Stat-Ease Inc. http://www.statease.comD.o.E. FUSION S-Matrix Corp. http://www.s-matrix-

corp.comECHIP ECHIP Inc. http://www.echip.com

JMP, SAS/QC SAS Institute Inc. http://www.sas.comMINITAB Mini tab Inc. http://www.minitab.comMODDE Umetrics http://www.umetrics.comStarre, RS/Series Brooks Automation

Inc.http://www.brooks.com

STATGRAPHICSPlus

Manugistics Inc. http://www.statgraphics.com

STATISTICA StatSoft Inc. http://www.statsoft.comSTAVEX AICOS Technologies

AGhttp://www.aicos.com

9. References

1. R. N. Kacker: “Off-Line Quality Control,Parameter Design, and the Taguchi Method(with Discussion)”, J. Quality Technol. 17(October 1985) no. 4, 176 – 209.

2. M. S. Phadke: Robuste Prozesse durch Quality Engineering (Quality Engineering Using Robust Design) , gfmt, M unchen, 1990(Prentice Hall, London, 1989).

3. T. Pfeifer: Qualit ¨ atsmanagement: Strategien, Methoden, Techniken , Hanser, M unchen 1993.

4. J. Wallacher: Einsatz von Methoden der statistischen Versuchsplanung zur Bestimmungvon robusten Faktorkombinationen in der pr aventiven Qualit ¨ atssicherung , Fortschr.-Ber.VDI Reihe 16 Nr. 70, VDI-Verlag, D usseldorf 1994.

5. G. E. P. Box, S. Bisgaard: The ScienticContext of Quality Improvement, QualityProgr. June (1987) 54 – 61.

6. ISO 3534–3: 1999 (E/F): Statistics—Vocabulary and symbols— Part 3: Design of experiments.

7. A. Orth, M. Schottler, O. Wabersky:Statistische Versuchsplanung, Serie: Qualit atbei Hoechst, Hoechst AG 1993.

8. S. Soravia: “Quality Engineering mitstatistischer Versuchsmethodik”,Chem.-Ing.-Tech. 68 (1996) no. 1 + 2, 71 – 82.

9. DuPont Quality Management & Technology: Design of Experiments — A Competitive Advantage , E. I. du Pont de Nemours and

Company 1993.10. R. A. Fisher: The Design of Experiments , 8th

ed., Oliver & Boyd, London 1966.11. S. Bisgaard: Industrial Use of Statistically

Designed Experiments: Case Study Referencesand Some Historical Anecdotes, Quality Eng.4 (1992) no. 4, 547 – 562.

12. O. L. Davies (ed.): The Design and Analysis of Industrial Experiments , 2nd ed., Oliver &Boyd, London 1956.

13. K. H. Simmrock: “Beispiele f ur das

Auswerten und Planen von Versuchen”,Chem.-Ing.-Tech. 40 (1968) no. 18, 875 – 883.14. G. E. P. Box, N. R. Draper: Empirical

Model-Building and Response Surfaces , JohnWiley & Sons, New York 1987.

15. G. E. P. Box, N. R. Draper: EvolutionaryOperation — A Statistical Method for Process Improvement , John Wiley & Sons, New York 1998.

16. G. E. P. Box, W. G. Hunter, J. S. Hunter:Statistics for Experimenters: Design, Innovation and Discovery , 2nd ed; John Wiley& Sons, New York 2005.

17. G. E. P. Box, H. L. Lucas: Design of Experiments in Nonlinear Situations, Biometrika 46 (1959) 77 – 90.

18. G. E. P. Box, K. B. Wilson: On theExperimental Attainment of OptimumConditions, J. Roy. Statist. Soc. B 13 (1951) 1– 45.

19. G. Taguchi: System of Experimental Design ,vols. I and II , Kraus InternationalPublications, New York 1987.

20. B. Gunter: “A Perspective on the TaguchiMethods”, Quality Progr. (June 1987) 44 – 52.

21. J. S. Hunter: “Statistical Design Applied toProduct Design”, J. Quality Technol. 17(October 1985) no. 4, 210 – 221.

22. K. R. Bhote: Qualit ¨ at — Der Weg zur Weltspitze (World Class Quality) , IQM,Großbottwar 1990 (American ManagementAssociation, New York 1988).

23. B. Mittmann: “Qualit atsplanung mit denMethoden von Shainin”, Qualit ¨ at und

Zuverl ¨ assigkeit (QZ) 35 (1990) no. 4, 209 –212.

7/28/2019 e08_e01



24. D. C. Montgomery: Design and Analysis of Experiments , 6th ed., John Wiley & Sons, NewYork 2005.

25. E. Schefer: Statistische Versuchsplanung und -auswertung — Eine Einf¨ uhrung f ¨ ur Praktiker,3., neu bearbeitete und erweiterte Auage von

“Einf uhrung in die Praxis der statistischenVersuchsplanung”, Deutscher Verlag f ¨ urGrundstofndustrie, Stuttgart 1997.

26. D. E. Coleman, D. C. Montgomery: “ASystematic Approach to Planning for aDesigned Industrial Experiment (withDiscussion)”, Technometrics 35 (1993) no. 1,1 – 27.

27. G. J. Hahn: “Some Things Engineers ShouldKnow About Experimental Design”, J. QualityTechnol. 9 (January 1977) no. 1, 13 – 20.

28. C. D. Hendrix: “What Every TechnologistShould Know about Experimental Design”,CHEMTECH March (1979) 167 – 174.

29. R. L. Mason, R. F. Gunst, J. L. Hess:Statistical Design and Analysis of Experimentswith Applications to Engineering and Science ,John Wiley & Sons, New York 2003.

30. E. Spenhoff: Prozeßsicherheit durchstatistische Versuchsplanung in Forschung, Entwicklung und Produktion , gfmt, M unchen1991.

31. R. H. Myers, D. C. Montgomery: ResponseSurface Methodology: Process and Product Optimization Using Designed Experiments ,John Wiley & Sons, New York 2002.

32. N. R. Draper, H. Smith: Applied Regression Analysis , 3rd ed., John Wiley & Sons, NewYork 1998.

33. H. O. Hartley: “Smallest Composite Designsfor Quadratic Response Surfaces”, Biometrics15 (1959) 611 – 624.

34. I. E. Frank, J. H. Friedman: “A StatisticalView of Some Chemometrics RegressionTools”, Technometrics 35 (May 1993) no. 2,109 – 148.

35. G. Derringer, R. Suich: “SimultaneousOptimization of Several Response Variables”, J. Quality Technol. 12 (Oktober 1980) no. 4,214 – 219.

36. J. A. Cornell: Experiments with Mixtures— Designs, Models, and the Analysis of Mixture Data , 3rd ed., John Wiley & Sons, New York

2002.37. J. Bracht, E. Spenhoff:“Mischungsexperimente in Theorie und Praxis(Teil 1 und 2)” Qualit ¨ at und Zuverl ¨ assigkeit (QZ) 39 (1994) no. 12, 1352 – 1360,Qualit ¨ at und Zuverl ¨ assigkeit (QZ) 40 (1995)no. 1, 86 – 90.

38. R. D. Snee: “Experimenting with Mixtures”,CHEMTECH (November 1979) 702 – 710.

39. A. C. Atkinson, A. N. Donev: Optimum Experimental Designs , Oxford UniversityPress, Oxford 1992.

40. F. Pukelsheim: Optimal Design of Experiments , John Wiley & Sons, New York 1993.

Documents

e08_e01