STATISTICS Linear Statistical Models Professor Ke-Sheng Cheng Department of Bioenvironmental Systems...

Preview:

Citation preview

STATISTICS Linear Statistical Models

Professor Ke-Sheng ChengDepartment of Bioenvironmental Systems Engineering

National Taiwan University

The Method of Least Squares • Consider the data shown in the following

table and figure. We are interested in fitting a straight line to the points in order to obtain a simple mathematical relationship for runoff and rainfall.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

2

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

3

• Intuitively, we want that, for each observed value of rainfall, the corresponding value of runoff will be as close as possible to the observed value. It is equivalent to say that we want the vertical deviations to be as small as possible.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

4

• One method of constructing such a straight line to fit the observed data is called the method of least squares. It requires the sum of the squares of the vertical deviations of all the points from the fitted line to be a minimum.

• Let the rainfall and runoff data in the above figure be respectively represented by x and y. The fitted line is expressed by

xy 10ˆ

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

5

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

6

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

7

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

8

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

9

Remarks

n

ini

n

inini

xx

yyxx

1

2

11

)(

))((

n

ini

n

iini

xx

yxx

1

2

1

)(

)(

n

ini

n

inii

xx

yyx

1

2

1

)(

)(

,0)ˆ(1

n

iii yy

n

iiii yyx

1

0)ˆ(

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

10

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

11

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

12

Given a value of x, what dose the predicted value of y really represent?

• Given a value of x, what dose the predicted value of y really represent?– It is unlikely that the predicted value will be the

same as the observed value at all times. – It may even be possible that the predicted value is

the same as the observed value only in very few cases.

– In some cases, the predicted values are far different from observed values.

• We are sure that the linear model may overpredict or underpredict the observed values.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

13

Linear statistical model

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

14

.),0(~,)|( , Given 210 Niidxxyx iiiiii

Random component

We are not able to predict y without errors due to existence of the random component. If a phenomenon is stochastic in nature, it cannot be predicted without errors.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

15

iii

ixY

xxYVar

x

xNxY

ii

2

10|

210

)|(

),(~)|(

model) d(Postulate

.),0(~,)|( , Given 210 Niidxxyx iiiiii

iiiii

ii

exeyy

xy

10

10

ˆˆˆ

ˆˆˆ

Coefficient of determination• How well does the least squares line explain

the variation in the data? • The coefficient of determination represents

the proportion of data variation that can be explained by the linear regression model.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

16

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

17

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

18

Estimating the variance of Y|x

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

19

iii

ixY

xxYVar

xxNxYii

2

10|2

10

)|(

,),(~)|(

Note: The variance of Y|x is NOT the same as the variance of Y.

RSS (Residual sum of squares) = SSE (sum of squared errors)

n

iii

n

iii xyyyRSS

1

210

1

2 )]ˆˆ([)ˆ(

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

20

Unbiasedness of the least squares estimators

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

21

Confidence intervals of the regression coefficients• Pivotal quantities

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

22

10 and

22

000 ~

1

ˆ

n

xx

n

t

s

x

ns

Q

211

1 ~ˆ

n

xx

tss

Q

222

2

3 ~)2(

n

snQ

Hypothesis tests for regression coefficients

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

23

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

24

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

25

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

26

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

27

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

28

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

29

Simple linear regression using R• Useful material– Chapter 11 of Introduction to Probability and

Statistics Using R (G. J. Kerns) is highly recommended.

– http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

30

• Defining linear regression models

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

31

• Conducting regressionlm(y~model)

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

32

s

• Other useful commands

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

33

– For prediction (x values not observed)

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

34

Graphing the Confidence and Prediction Bands

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

35

You may want to change it. For example, data.frame(x=seq(20,30,by=0.5))

Confidence and prediction intervals

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

36

Line of prediction. It represents the estimated conditional expectation of y given x.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

37

• Multiple regression – The following slides are provided for your reference

only. Due to the time constraint, they will not be covered in this class.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

38

• Now let’s consider fitting a linear function of several variables. Suppose that we have the following data set:

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

39

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

40

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

41

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

42

knnn

k

k

xxx

xxx

xxx

X

21

22212

12111

1

1

1

k

1

0

ny

y

y

Y2

1

YXB

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

43

YXXBX TT YXXXB TT 1)(

The Linear Regression Model

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

44

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

45

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

46

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

47

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

48

Covariance and Correlation Coefficient

• Suppose we have observed the following data. We wish to measure both the direction and the strength of the relationship between Y and X.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

49

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

50

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

51

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

52

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

53

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

54

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

55

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

56

The Analysis of Variance (ANOVA)

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

57

• Given X, Y’s are independent normal random variables, i.e.,

• The residual sum of squares (or sum of squared errors, SSE) is expressed by

nI2,~ XBNY

BXYBXYSSET ˆˆ

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

58

BXYYY

BXYXBBXYY

BXYBXYSSE

TT

TTT

T

ˆ

ˆˆˆ

ˆˆ

0ˆ BXYX T

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

59

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

60

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

61

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

62

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

63

• The total sum of squares corrected for the mean is referred to as the total variation. This total variation is split up in two parts:– the regression part (SSRm) “explained by the model”,

and

– the residual part (SSE).

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

64

• The ratio is known as the coefficient of determination.

• If the coefficient of determination is large then the model provides a good fit to the data. It also represents the part of the total variation which is explained by the model.

mm SSTSSRR /2

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

65

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

66

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

67

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

68

Properties of the Estimators

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

69

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

70

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

71

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

72

Confidence Intervals

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

73

• The 100(1 – )% confidence interval of 2 is

2

2,

2

221,

2 )(,

)(

pnpn

spnspn

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

74

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

75

• However, the true value of is unknown, the above equation can not be used to establish the confidence interval of .

• We then use s to substitute and it is known that has a t-distribution with

(n–p) degree of freedom.

i

i

ii

vs

ˆ

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

76

• The 100(1 – )% confidence interval of

is

1ˆˆ

2,2,

2,2,

ipniiipni

pn

i

iipn

vstvstP

tvs

tP

i

ipniipni vstvst 2,2,ˆ,ˆ

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

77

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

78

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

79

Example 1 • A scientist carries out an experiment on the

relationship between the yield Y of a crop and the amount of irrigation water X. It is believed that the relationship between expected yield and amount of irrigation water (ignore the units) can be described adequately as

xxxYE 210)|(

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

80

• The data shown in the following table were collected in the field.

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

81

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

82

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

83

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

84

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

85

Example 2 • Data in the following table are rainfall (x)

and runoff (y) measured during the rainy season in a study area.

• A regression model is postulated for the above data iXY X

ii 10|

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

86

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

87

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

88

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

89

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

90

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

91

Test of Hypotheses

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

92

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

93

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

94

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

95

112/04/10 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

96

Recommended