40
1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsi ung

1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

1

Chapter 8 Indicator Variable

Ray-Bing Chen

Institute of Statistics

National University of Kaohsiung

Page 2: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

2

8.1 The General Concept of Indicator Variables• The Variables in regression analysis:

– Quantitative variables: well-defined scale of measurement. For example: temperature, distance, income, …

– Qualitative variable (Categorical variable): for example: operators, employment status (employed or unemployed), shifts (day, evening or night), and sex (male or female). Usually no natural scale of measurement.

Page 3: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

3

• Assign a set of levels to a qualitative variable to account the effect that variable may have on the response. (indicator variable or dummy variable)

• For example: The effective life of a cutting tool (y) v.s. the lathe speed (x1) and the type of cutting

tool (x2).

Page 4: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

4

Page 5: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

5

Page 6: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

6

Page 7: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

7

Example 8.1 Tool Life Data• The scatter diagram is in Figure 8.2.• Two different regression lines.

Page 8: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

8

Page 9: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

9

Page 10: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

10

Page 11: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

11

Page 12: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

12

• Two separate straight-line models v.s. a single model with an indicator variable:– Prefer the single-model approach (a simpler

practical result)– Since assume the same slope, it makes sense to

combine the data from both tool types to produce a single estimate of this common parameter.

– Can give one estimate of the common error variance 2 and more residual degrees of freedom.

Page 13: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

13

• Different in intercept and slope:

Page 14: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

14

Page 15: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

15

Example 8.2 The Tool Life Data:

Page 16: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

16

Page 17: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

17

Page 18: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

18

Example 8.3 An Indicator Variable with More Than Two Levels

• Total electricity consumption (y) v.s. the size of house (x1) and the four types of sir condition

systems.• Four types of air conditions systems:

Page 19: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

19

3 - 4: relative efficiency of a heat pump compared

to central air conditioning. • Assume the variance doesn’t depend on the types.

Page 20: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

20

Page 21: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

21

Example 8.4 More Than One Indicator Variable• Add the type of cutting oil used in Example 8.1•

Page 22: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

22

Page 23: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

23

Page 24: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

24

Page 25: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

25

Page 26: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

26

Page 27: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

27

8.2 Comments on the Use of Indicator Variables8.2.1 Indicator Variables versus Regression on Alloc

ated Codes• Another approach to measure the levels of the vari

ables is by an allocated code.• In Example 8.3,

Page 28: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

28

Page 29: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

29

• The allocated codes impose a particular metric on the levels of the qualitative factor.

• Indicator variables are more informative because they do not force any particular metric on the levels of the qualitative factor.

• Searle and Udell (1970): regression using indicator variables always leads to a larger R2 than does regression on allocated codes.

Page 30: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

30

8.2.2 Indicator Variables as a Substitute for a Quantitative Regressor

• Quantitative regressor can also be represented by indicator variables.

• In Example 8.3, for income factor:

• Use four indicator variables to represent the factor “income”.

Page 31: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

31

• Disadvantage: – More parameters are required to represent the i

nformation content of the quantitative factor. (a-1 v.s. 1) So it would increase the complexity of the model.

– Reduce the degrees of freedom for error. • Advantage: It does not require the analyst to make

any prior assumptions about the functional form of the relationship between the response and the regressor variable.

Page 32: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

32

8.3 Regression Approach to Analysis of Variance• The Analysis of Variance is a technique frequently

used to analyze data from planned ot designed experiments.

• Any ANOVA problem can be treated as a linear regression problem.

• Ordinarily we do not recommend that regression mothods be used for ANOVA because the specialized computing techniques are usually quite efficient.

Page 33: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

33

• However, there some ANOVA situation, particularly those involving unbalance designs, where the regression approach is helpful.

• Essentially, any ANOVA problem can be treated as a regression problem in which all of the regressors are indicator variables.

n

Page 34: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

34

• Define the treatment effects in the balance case (an equal number of observations per treatment) as 1 + 2 + … + k = n

i = + i is the mean of the ith treatment.

• Test H0 : 1 = 2 = … = k = 0 v.s. H1 : 2 0 for at

least one i

Page 35: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

35

Page 36: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

36

Example: 3 treatments

• Model: yij = + i + ij , i = 1, 2, 3, j = 1, 2, …, n

Page 37: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

37

Page 38: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

38

Page 39: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

39

Page 40: 1 Chapter 8 Indicator Variable Ray-Bing Chen Institute of Statistics National University of Kaohsiung

40