3 Design of Experiment

56

CHAPTER 3

DESIGN

OF

EXPERIMENTS

57

CHAPTER 3

DESIGN OF EXPERIMENTS

3.1 INTRODUCTION

Design of Experiment (DoE) is a useful method in identifying the significant parameters and

in studying the possible effect of the variables during the machining trials. This method also

can developed experiment between a ranges from uncontrollable factors, which will be

introduced randomly to carefully controlled parameters. The factors must be either

quantitative or qualitative. The range of values for quantitative factors must be decided on

how they are going to be measured and the level at which they will be controlled during the

trials. Meanwhile, the qualitative factors are parameters that will be determined discretely.

The advantages of design of experiments are as follows:

Numbers of trials is significantly reduced.

Important decision variables which control and improve the performance of the

product or the process can be identified.

Optimal setting of the parameters can be found out.

Qualitative estimation of parameters can be made.

Experimental error can be estimated.

Inference regarding the effect of parameters on the characteristics of the process

can be made.

Experiments are performed by investigators in virtually all fields of inquiry, usually to

discover something about a particular process or system. Generally experiments is test or

series of test in which purposeful changes are made to the input variable of the process or

system so that we may observe and identify the reason for changes that may be observed in

the output response and analyzing the result data so that valid and objective conclusions

are obtained.

58

The objective of the experiments may include the following:

I. Determine which variable are most influential on the performance measures or

response.

II. Determine where to set the influential controllable process variables so that response

is almost always near the desired nominal value.

III. Determine where to set the controllable process variables so that variability in

response is small.

IV. Determine where to set the influential process variables so that the effects of the

uncontrollable variables are minimized.

Statistical design of experiments refers to the process of planning the experiment so that

appropriate data that can be analyzed by statistical methods will be collected, resulting in

valid and objective conclusions.

The statistical approach to experimental design is necessary if we wish to draw a

meaningful conclusion from the data [62].

Figure3.1: General model of a process or system [63]

59

3.2 APPROCHE TO DESIGN OF EXPERIMENT

I. Multiple Regression Analysis

II. Mathematical Modeling

III. Orthogonal Array

IV. ANOVA

3.2.1 MULTIPLE REGRESSION ANALYSIS

Main purpose of multiple regression analysis (first used by Pearson in 1908) is to learn about

the several independent or predictor variables and dependable variable [64].

Multiple regression is a statistical technique that allows to predict someone’s score on one

variable on the basis of their scores on several other variables. Multiple linear regression

examines the linear relationships between one continuous response and two or more

predictors. The independent variable that is used to predict values of the dependent, or

response variable in a regression analysis, If the number of predictors is large, then before

fitting a regression model with all the predictors, use stepwise techniques to screen out

predictors not associated with the responses [65].

A current trend in statistics is to emphasize the similarity between multiple regression and

ANOVA, and between correlation and the t-test. All of these statistical techniques are

basically seeking to do the same thing – explain the variance in the level of one variable on

the basis of the level of one or more other variables. These other variables might be

manipulated directly in the case of controlled experiments, or be observed in the case of

surveys or observational studies, but the underlying principle is the same. Thus, although we

have given separate chapters to each of these procedures they are fundamentally all the

same procedure. This underlying single approach is called the General Linear Model [64].

Multiple linear regression attempts to model the relationship between two or more

explanatory variables and a response variable by fitting a linear equation to observed data

[66].

Every value of the independent variable x is associated with a value of the dependent

variable y.

60

The regression line for p explanatory variables x1, x2,. ..., xp is defined to be

𝝁𝒚 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿𝟐 + ⋯ + 𝜷𝒑𝑿𝒑

The mean response 𝝁𝒚, described by the regression line changes with the explanatory

variables. The observed values for ‘y’ vary about their means 𝝁𝒚 and are assumed to have

the same standard deviation 𝜎. The fitted values b0, b1,..., bp estimate the

parameters 𝜷𝒐, 𝜷𝟏, …𝜷𝒑 of the regression line. Since the observed values for ‘y’ vary about

their means 𝝁𝒚, the multiple regression model includes a term for this variation.

The regression model is expressed as Data = Fit + Residual [66], where the

"Fit" term represents the expression 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿𝟐 + ⋯ + 𝜷𝒑𝑿𝒑 .

The "Residual" term represents the deviations of the observed values y from their

means 𝝁𝒚, which is normally distributed with mean 0 and variance 𝝈. The notation for the

model deviations is 𝜺 .

Formally, the model for multiple linear regression, given ‘n’ observations, is

𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝒊𝟏 + 𝜷𝟐𝑿𝒊𝟐 + ⋯ + 𝜷𝒑𝑿𝒊𝒑 + 𝜺𝒊 𝒇𝒐𝒓 𝒊 = 𝟏, 𝟐, …𝒏

In the least-squares model, the best-fitting line for the observed data is calculated by

minimizing the sum of the squares of the vertical deviations from each data point to the

line. Since the deviations are first squared, then summed, there are no cancellations

between positive and negative values. The least-squares estimates b0, b1, ... bp are usually

computed by statistical software.

The values fit by the equation b0 + b1xi1 + ... + bpxip are denoted iy , and the residuals ei are

equal to iiy y , the difference between the observed and fitted values.

The sum of the residuals is equal to zero.

61

The variance 𝝈𝟐 may be estimated by 𝒔𝟐 = 𝒆𝒊

𝟐

𝒏−𝒑−𝟏 also known as the mean-squared error

(or MSE).

The estimate of the standard error s is the square root of the MSE [66].

3.2.2 MATHEMATICAL MODELING

Once experimental design becomes final, the next step is to fit the given data in

mathematical model using regression analysis.

A mathematical model is a description of a system using mathematical concepts and

language. The process of developing a mathematical model is termed mathematical

modeling. Mathematical models are used in engineers, statisticians, research analysts

and economists use mathematical models most extensively. In general, mathematical

models may include logical models, as far as logic is taken as a part of mathematics. In many

cases, the quality of a scientific field depends on how well the mathematical models

developed on the theoretical side agree with results of repeatable experiments. Lack of

agreement between theoretical mathematical models and experimental measurements

often leads to important advances as better theories are developed [67].

3.2.3 ORTHOGONAL ARRAY

Orthogonal array (OA) represents a simplified method of putting together an experiment.

Taguhi’s orthogonal arrays are selected on the basis of the condition that the total degree of

freedom of selected OA must be greater than or equal to the total degree of freedom

required for the experiment [68].

An orthogonal array provides a set of well balance (minimum experimental runs)

experiments and used to design experiments and describe trial condition. Experiments

design using orthogonal arrays yield result that are more reproducible.

Standard notation for orthogonal arrays [69].

LN (XY)

Where,

N = Number of experiments, X = Number of levels, Y = Number of factors

http://en.wikipedia.org/wiki/System

http://en.wikipedia.org/wiki/Engineer

http://en.wikipedia.org/wiki/Statistician

http://en.wikipedia.org/wiki/Economist

http://en.wikipedia.org/wiki/Logical_model

62

For example:

2- Level Arrays: L4 (27), L12 (211), L16 (215)

3- Level Arrays: L9 (34), L18 (2137), L27 (313)

4- Level Arrays: L16 (45), L32 (2148)

Example: L9 (34)

9 = Number of experiments, 3 = Number of levels, 4 = Number of factors

Taguchi’s orthogonal arrays are experimental designs that usually require only a fraction of

full factorial combination. The columns of arrays are balanced and orthogonal i.e, in each

pair of columns, all factors combinations occurs the same number of times. Orthogonal

designs allow estimating the effect of each factor on the response independently of all other

factors.

There are 18 basic types of standard Orthogonal array (OA) in the Taguchi parameter design

[]. Since four factors were studied in the present work, three levels of each were considered.

Therefore an L9 (34) Orthogonal array has been selected in the present study, for multi-

performance optimisation shown in Table 3.1.

63

Table 3.1 – Shows L9 (34) orthogonal array

Factors Runs

A

B

C

D

1

1

1

1

1

2

1

2

2

2

3

1

3

3

3

4

2

1

2

3

5

2

2

3

1

6

2

3

1

2

7

3

1

3

2

8

3

2

1

3

9

3

3

2

1

3.2.4 ANOVA

The purpose of the ANOVA is to investigate which wire EDM process parameters

significantly affect the quality characteristics. This is accomplished by separating the total

variability of the S/N ratios, which is measured by the sum of the squared deviations from

the total mean of S/N ratio, into contributions by each Wire EDM process parameter and

error. The percentage contribution by each of the process parameter in the total sum of

squared deviations can be used to evaluate the importance of the process parameter

change on the quality characteristic. In addition the F-test method can also be used to

determine which Wire EDM process parameter has a significant effect on the quality

characteristic when the F value is large. The fundamental technique is a partitioning of the

total sum of squares S into components related to the effects used in the model. For

http://en.wikipedia.org/wiki/Sum_of_squares_(statistics)

64

example, we show the model for a simplified ANOVA with one type of treatment at different

levels.

𝑺𝑻𝒐𝒕𝒂𝒍 = 𝑺𝑬𝒓𝒐𝒐𝒓 + 𝑺𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔

So, the number of degrees of freedom f can be partitioned in a similar way and specifies

the chi- squared distribution which describes the associated sums of squares

𝒇𝑻𝒐𝒕𝒂𝒍 = 𝒇𝑬𝒓𝒐𝒐𝒓 + 𝒇𝑻𝒓𝒆𝒂𝒕𝒎𝒆𝒏𝒕𝒔

The F-test is used for comparisons of the components of the total deviation. For example, in

one-way or single-factor ANOVA, statistical significance is tested for by comparing the F test

statistic [70].

ANOVA for Multiple Linear Regression:

Multiple linear regression tries to fit a regression line for a response variable by using more

than one explanatory variable. ANOVA calculations for multiple regression are nearly

identical to the calculations for simple linear regression, except that the degrees of freedom

are adjusted to reflect the number of explanatory variables included in the model [71].

3.3 TEST FOR SIGNIFICANCE OF REGRESSION [72]

The test for significance is a test to determine if there is a linear relationship between the

response variable y and a subset of the regressor variables x1, x2,…….., xk.

Once the co-efficient have been estimated and tested for their significance, the estimated

regression equation is then tested for the adequacy of fit.

The appropriate hypotheses are

𝑯𝑶: 𝜷𝟏 = 𝜷𝟐 = ⋯ = 𝜷𝒌 = 𝟎

𝑯𝟏: 𝜷𝒋 ≠ 𝟎 𝑓𝑜𝑟 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑗

Rejection of 𝑯𝑶 in equation indicates that at least one of the regressor variables x1, x2,……..,

xk. Contribute significantly to the model. The test procedure involves an analysis of variance

partitioning of the total sum of squares due to the model (or regression) and a sum of

squares due to residual (or error).

http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)

http://en.wikipedia.org/wiki/Chi-squared_distribution

http://en.wikipedia.org/wiki/F-test

http://www.stat.yale.edu/Courses/1997-98/101/linmult.htm

65

𝑺𝑺𝑻 = 𝑺𝑺𝑹 + 𝑺𝑺𝑬

Now, if null hypothesis 𝑯𝑶: 𝜷𝟏 = 𝜷𝟐 = ⋯ = 𝜷𝒌 = 𝟎 is true, then 𝑺𝑺𝑹/𝝈𝟐 is distributed as

𝑿𝒌𝟐, where the number of degree of freedom for 𝑿𝟐 are equal to the number of regressor

variable in the model.

Computational formula for error sum of square 𝑺𝑺𝑬

𝑺𝑺𝑬 = (𝒚𝒊 − iy𝒏𝒊=𝟏 )2

𝑺𝑺𝑬 = 𝒚,𝒚 − ' 𝑿′𝒚

The regression sum of square

𝑺𝑺𝑹 = ' 𝑿′𝒚 − 𝒚𝒊

𝒏𝒊=𝟏 𝟐

𝒏

And the total sum of square is

𝑺𝑺𝑻 = 𝒚′𝒚 − 𝒚𝒊

𝒏𝒊=𝟏 𝟐

𝒏

Table 3.2 Shows Analysis of Variance (ANOVA) for significance of Regression in Multiple

Regression

Source of

variation

Degree of freedom

(df)

Sum of square

(SS)

Mean square

(ms)

FO

Due to regression

k

SSR

MSR = SSR/k

MSR/MSE

Due to residual

(error)

N – k - 1

SSE

MSE = SSE/(n –k – 1)

Total

N - 1

SST

66

Apply F- test to test the adequacy of fit as below

𝑭 = 𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆 𝒐𝒇 𝒓𝒆𝒈𝒓𝒆𝒔𝒔𝒊𝒐𝒏

𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆 𝒐𝒇 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍=

𝑴𝑺𝑹

𝑴𝑺𝑬

The estimated regression equation fits the data adequately if P<0.05 at 95% confidence

level or if P< 0.99 at 99% confidence level.

The coefficient of multiple determinations R2

R squared: A measure of the amount of reduction in the variability of y obtained by using

regressor variables x1, x2,…….., xk in the model.

𝑅2 =𝑆𝑆𝑅

𝑆𝑆𝑇= 1 −

𝑆𝑆𝐸

𝑆𝑆𝑇

Adjusted R squared: A measure of the amount of variation around the mean explained by

the model, adjusted for the number of terms in the model. The adjusted R-squared

decreases as the number of forms in the model increase, if those additional terms do not

add value to the model.

𝑅2 = 𝑆𝑆𝐸/(𝑛 − 𝑝)

𝑆𝑆𝑇/(𝑛 − 1)= 1 −

𝑛 − 1

𝑛 − 𝑝 (1 − 𝑅2)

PRESS: The prediction sum of squares (PRESS) provides a useful residual scaling.

𝑃𝑅𝐸𝑆𝑆 = 𝑒𝑖

1 − ℎ𝑖𝑖

2𝑛

𝑖=1

Pred R- squared: A measure of the amount of variation in new data explained by the model.

𝑅𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛2 = 1 −

𝑃𝑅𝐸𝑆𝑆

𝑆𝑦𝑦

67

The predicted R2 and adj- R2 should be within 0.20 of each other otherwise there may be

problem with either data or the model. In addition to the adequacy test mention above the

validity of developed models checked by drawing scatter diagram which shows the

relationship between the observed and predicted values of the weld bead dimension.

3.4 TESTING FOR LACK OF FIT [73]

A procedure for checking the adequacy of the model is called as lack of fit test of fitted

model. In general, to say that the fitted model is inadequate or it is lacking in fit is to imply

the proposed model does not contain sufficient number of terms. The inadequacy of model

can be due to

1. Factors (other than these in proposed model) that are omitted the proposed model

but which affect response.

2. The omission of high order terms involving the factor in proposed model which are

needed to adequately explain the behaviour of the response.

Suppose that we have n observation such that

𝒚𝟏𝟏,𝒚𝟏𝟐, 𝒚𝟏𝟑, ……… . , 𝒚𝒏 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑥1

𝒚𝒎𝟏,𝒚𝒎𝟐, 𝒚𝒎𝟑, ……… . , 𝒚𝒎𝒏 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑎𝑡 𝑥𝑚

We can see that there are m distinct levels of x.

The pure error sum of squares

𝑺𝑺𝑷𝑬 =

𝒎

𝒊=𝟏

𝒚𝒊𝒋 − iy 𝟐

𝒏𝒊

𝒋=𝒊

Degree of freedom associated with 𝑺𝑺𝑷𝑬 is

= n – m

The sum of square for lack of fit

𝑺𝑺𝑳𝑶𝑭 = 𝑺𝑺𝑷𝑬 − 𝑺𝑺𝑬

𝑺𝑺𝑳𝑶𝑭 = 𝒏𝒊 iy − iy

𝒎

𝒊=𝟏

𝟐

68

Degree of freedom associated with 𝑺𝑺𝑳𝑶𝑭 is m – p because there are m levels of x, and p

degree of freedom are lost because p parameters must be estimated for the model.

The test statistic for lack of fit is

𝑭𝟎 =𝑺𝑺𝑳𝑶𝑭/(𝒎 − 𝒑)

𝑺𝑺𝑷𝑬/(𝒏− 𝒎)=

𝑴𝑺𝑳𝑶𝑭

𝑴𝑺𝑷𝑬

The present study has utilized the Multiple linear regression analysis to predict model and

find the optimal parameter settings.

Documents

3 Design of Experiment