41
Lecture (14,15) Lecture (14,15) More than one More than one Variable, Variable, Curve Fitting, Curve Fitting, and and Method of Least Method of Least Squares Squares

Lecture (14,15)

  • Upload
    horace

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture (14,15). More than one Variable, Curve Fitting, and Method of Least Squares. Two Variables. Often two variables are in some way connected. Observation of the pairs: X Y X1 Y1 X2 Y2 . . - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture (14,15)

Lecture (14,15)Lecture (14,15)

More than one Variable,More than one Variable,Curve Fitting, Curve Fitting,

and and Method of Least Squares Method of Least Squares

Page 2: Lecture (14,15)

Two Variables Two Variables

Often two variables are in some way connected.

Observation of the pairs: X Y

X1 Y1X2 Y2. .. .. .Xn Yn

Page 3: Lecture (14,15)

CovarianceCovariance

The covariance gives the some information about the extent to which the two random variables influence each other.

1

1

2 2

1

( , ) { { }} { { }}

( , ) { . } { }. { }

it is computed from the sample as,

1( , ) ( )( )

if x=y

1( , ) ( )( )

1( )

n

i

n

i

n

xi

Cov x y E x E x E y E y

Cov x y E x y E x E y

Cov x y x x y yn

Cov x x x x x xn

x xn

Page 4: Lecture (14,15)

Example Covariance

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

x y xxi yyi ( xix )( yiy )

0 3 -3 0 0 2 2 -1 -1 1 3 4 0 1 0 4 0 1 -3 -3 6 6 3 3 9

3x 3y 7

4.15

7)))((

),cov( 1

n

yyxxyx

i

n

ii What does this

number tell us?

Page 5: Lecture (14,15)

Pearson’s R

• Covariance does not really tell us anything– Solution: standardise this measure

• Pearson’s R: standardise by adding std to equation:

),cov( yx

cov( , )xy

x y

x y

Page 6: Lecture (14,15)

Correlation CoefficientCorrelation Coefficient

1

2 2

1 1

( , ) { { }} { { }}( , )

it is computed from the sample as,

1( )( )

( , )( , )

1 1( ) ( )

1 ( , ) 1

if x=y

( , ) 1

( , ) 0 there is no relation betw

x y x y

n

i

n nx y

i i

Cov x y E x E x E y E yx y

x x y yCov x y n

x y

x x y yn n

x y

x x

x y

een x and y

( , ) 1 there is a perfect reverse relation between x and yx y

Page 7: Lecture (14,15)

Correlation Coefficient (Cont.)Correlation Coefficient (Cont.)

- 1 0 0 1 0 2 0 3 0

X

- 4 0

- 2 0

0

2 0

4 0

6 0

Y

0 20 40 60 80 100

X

0

0.2

0.4

0.6

0.8

Y

0 2 0 4 0 6 0 8 0 1 0 0

X

0

2 0

4 0

6 0

8 0

1 0 0

Y

-60 -40 -20 0 20 40 60

X

-60

-40

-20

0

20

40

60

Y

( , ) 0x y ( , )x y

( , )x y ( , ) 1x y

Page 8: Lecture (14,15)

Procedure of Best Fitting (Step 1) Procedure of Best Fitting (Step 1)

How to find out the relation between the two variables?

1. Make observation of the pairs: X Y

X1 Y1X2 Y2. .. .. .Xn Yn

Page 9: Lecture (14,15)

Procedure of Best Fitting (Step 2)Procedure of Best Fitting (Step 2)

2. Make plot of the observations.

It is always difficult to decide whether a curved line fits nicely to a set of data.

Straight lines are preferable.

We change the scale to obtain straight lines.

-40 -20 0 20 40

X

-40

-20

0

20

40

60

80

Y

Page 10: Lecture (14,15)

Method of Least Square (Step 3)Method of Least Square (Step 3)

3. Specify a straight line relation.Y=a+bX

We need to find a and b that minimises the square of the differences between the line and the observed data.

-40 -20 0 20 40

X

-40

-20

0

20

40

60

80

Y

Y=a+bX

Page 11: Lecture (14,15)

Step 3 (cont.)

find best fit of a line in a cloud of observations: Principle of least squares

ε

y = ax + bε = residual error

= , true value= , predicted value

iyy

min)ˆ(

1

2

n

yyn

ii

Page 12: Lecture (14,15)

Method of Least Square (Step 4)Method of Least Square (Step 4)

2

1

2

1

2 2

1

2

The sum of the squared deviation is equal to,

( , ) ( )

Values and for which is minimum,

( , ) ( , )0 and 0

( ) 0

2 ( ) ( ) 0

n

i ii

n

i ii

n

i i i ii

i

S a b y a bx

a b S

S a b S a b

a b

y a bxa

y y a bx a bxa

y

a

2

1

2

[2 ( )] ( ) 0n

i i ii

i

y a bx a bxa a

y

a

2[ ( ) ( ) ii i i

yy a bx a bx

a a

1

1

1

1

1 1

] 2( ) 0

2[ ( )] 2( ) 0

2 2( ) 0

( ) 0

0

n

ii

n

i i ii

n

i ii

n

i ii

n n

i ii i

a bx

y a bx a bxa

y a bx

y a bx

y na b x

Page 13: Lecture (14,15)

Method of Least Square (Step 5)Method of Least Square (Step 5)

2

1

2 2

1

22

1

2

( , )0

( ) 0

2 ( ) ( ) 0

[2 ( )] ( ) 0

n

i ii

n

i i i ii

ni

i i ii

i

S a b

b

y a bxb

y y a bx a bxb

yy a bx a bx

b b b

y

b

2[ ( ) ( ) ii i i

yy a bx a bx

b b

1

1

1

2

1

2

1 1 1

] 2( ) 0

2[ ( )] 2( ) 0

2( ) 2( ) 0

( ) ( ) 0

0

n

i ii

n

i i i ii

n

i i i ii

n

i i i ii

n n n

i i i ii i i

a bx x

y a bx a bx xb

y x a bx x

y x ax bx

x y a x b x

Page 14: Lecture (14,15)

Method of Least Square (Step 6)Method of Least Square (Step 6)

2

1

2 2

1

22

1

2

( , )0

( ) 0

2 ( ) ( ) 0

[2 ( )] ( ) 0

n

i ii

n

i i i ii

ni

i i ii

i

S a b

b

y a bxb

y y a bx a bxb

yy a bx a bx

b b b

y

b

2[ ( ) ( ) ii i i

yy a bx a bx

b b

1

1

1

2

1

2

1 1 1

] 2( ) 0

2[ ( )] 2( ) 0

2( ) 2( ) 0

( ) ( ) 0

0

n

i ii

n

i i i ii

n

i i i ii

n

i i i ii

n n n

i i i ii i i

a bx x

y a bx a bx xb

y x a bx x

y x ax bx

x y a x b x

Page 15: Lecture (14,15)

Method of Least Square (Step 7)Method of Least Square (Step 7)

2

1 1 1 12

2

1 1

n n n n

i i i i ii i i i

n n

i ii i

y x x x ya

n x x

1 1 12

2

1 1

n n n

i i i ii i i

n n

i ii i

n x y y xb

n x x

y a bx

Page 16: Lecture (14,15)

ExampleExample

X y

1 1

3 2

4 4

6 4

8 5

9 7

11 8

14 9

We have the following eight pairs of observations:

Page 17: Lecture (14,15)

Example (Cont.) Example (Cont.)

xi yi Xi^2 xi.yi Yi^2

1 1 1 1 1

3 2 9 6 4

4 4 16 16 16

6 4 36 24 16

8 5 64 40 25

9 7 81 63 49

11 8 121 88 64

14 9 196 126 81

56 40 524 364 256

7 5 65.5 45.5 32

Construct the least square line:

1/n

N=8

Page 18: Lecture (14,15)

Example (Cont.)Example (Cont.)

2

1 1 1 12

2

1 1

40*524 56*364 60.545

8*524 56*56 11

n n n n

i i i i ii i i i

n n

i ii i

y x x x ya

n x x

a

1 1 12

2

1 1

8*364 56*40 70.636

8*524 56*56 11

n n n

i i i ii i i

n n

i ii i

n x y y xb

n x x

b

xi yi Xi^2

xi.yi Yi^2

1 1 1 1 1

3 2 9 6 4

4 4 16 16 16

6 4 36 24 16

8 5 64 40 25

9 7 81 63 49

11 8 121 88 64

14 9 196 126 81

56 40 524 364 256

7 5 65.5 45.5 32

Page 19: Lecture (14,15)

Example (Cont.)Example (Cont.)

0 4 8 12 16

X

0

2

4

6

8

10

Y

Equation Y = 0.545+ 0.636 * X

Number of data points used = 8

Average X = 7

Average Y = 5

Page 20: Lecture (14,15)

i 1 2 3 4 5

xi 2.10 6.22 7.17 10.5 13.7

yi 2.90 3.83 5.98 5.71 7.74

Example (2)

7416238

1626

3201392

6939

5

1

5

1

5

1

2

5

1

. yx

. y

. x

. x

iii

ii

ii

ii

4023.0

69.3951

3.392

)16.26)(69.39(51

7.238

038.269.39

51

3.392

)7.238)(69.39(51

)3.392)(16.26(51

2

2

b

a

x. . y 402300382

Page 21: Lecture (14,15)

Example (3)

Page 22: Lecture (14,15)

Excel Application

• See Excel

Page 23: Lecture (14,15)

Covariance and the Correlation Coefficient

• Use COVAR to calculate the covarianceCell =COVAR(array1, array2)– Average of products of deviations for each

data point pair– Depends on units of measurement

• Use CORREL to return the correlation coefficient Cell =CORREL(array1, array2)– Returns value between -1 and +1

• Also available in Analysis ToolPak

Page 24: Lecture (14,15)

Analysis ToolPak

• Descriptive Statistics• Correlation• Linear Regression• t-Tests• z-Tests• ANOVA• Covariance

Page 25: Lecture (14,15)

Descriptive Statistics

• Mean, Median, Mode

• Standard Error• Standard Deviation• Sample Variance• Kurtosis• Skewness• Confidence Level

for Mean

• Range• Minimum• Maximum• Sum• Count• kth Largest• kth Smallest

Page 26: Lecture (14,15)

Correlation and Regression

• Correlation is a measure of the strength of linear association between two variables– Values between -1 and +1– Values close to -1 indicate strong negative

relationship– Values close to +1 indicate strong positive

relationship– Values close to 0 indicate weak relationship

• Linear Regression is the process of finding a line of best fit through a series of data points– Can also use the SLOPE, INTERCEPT, CORREL and

RSQ functions

Page 27: Lecture (14,15)

Polynomial Regression

• Minimize the residual between the data points and the curve -- least-squares regression

Must find values of a0 , a1, a2, … am

ii x a a y 10 Linear

2210 iii x a x a a y Quadratic

33

2210 iiii x a x a x a a y Cubic

General mimiiii x ax a x a x a a y 3

32

210

Page 28: Lecture (14,15)

Polynomial Regression

• Residual

)( 33

2210

mimiiiii x a x a x a x a a = ye

n

i=

mm

n

i=ir x a x a x ax a a y = e = S

1

233

2210

1

2 )]([

• Sum of squared residuals

• Minimize by taking derivatives

Page 29: Lecture (14,15)

Polynomial Regression

• Normal Equations

n

i=i

mi

n

i=ii

n

i=ii

n

i=i

mn

i=

mi

n

i=

mi

n

i=

mi

n

i=

mi

n

i=

mi

n

i=i

n

i=i

n

i=i

n

i=

mi

n

i=i

n

i=i

n

i=i

n

i=

mi

n

i=i

n

i=i

yx

yx

yx

y

a

a

a

a

xxxx

xxxx

xxxx

xxxn

1

1

2

1

1

2

1

0

1

2

1

2

1

1

1

1

2

1

4

1

3

1

2

1

1

1

3

1

2

1

11

2

1

Page 30: Lecture (14,15)

Example

x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0

y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0

x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0

y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1

-6

-4

-2

0

2

4

6

0 5 10 15 20 25

x

f(x)

Page 31: Lecture (14,15)

Example

n

i=ii

n

i=ii

n

i=ii

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

n

i=i

yx

yx

yx

y

a

a

a

a

xxxx

xxxx

xxxx

xxxn

1

3

1

2

1

1

3

2

1

0

1

6

1

5

1

4

1

3

1

5

1

4

1

3

1

2

1

4

1

3

1

2

1

1

3

1

2

1

369943

26037

9316

301

82235181167127801472752835846342

712780147275283584634223060

2752835846342230606229

84634223060622924

3

2

1

0

.

.

.

.

a

a

a

a

....

....

....

...

x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0

y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0

x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0

y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1

Page 32: Lecture (14,15)

Example

01210

35320

30512

35930

3

2

1

0

.

.

.

.

a

a

a

a

Regression Equationy = - 0.359 + 2.305x - 0.353x2 + 0.012x3

-6

-4

-2

0

2

4

6

0 5 10 15 20 25

x

f(x)

Page 33: Lecture (14,15)

Nonlinear Relationships

• If relationship is an exponential function

To make it linear, take logarithm of both sides

bx aey

(a) + bx (y) lnln

b axy To make linear, take logarithm of both sides

(x)(a) + b (y) lnlnln

Now it’s a linear relation between ln(y) and x

Now it’s a linear relation between ln(y) and ln(x)

• If relationship is a power function

Page 34: Lecture (14,15)

Examples

• Quadratic curve

– Flow rating curve:• q = measured discharge, • H = stage (height) of water behind outlet

• Power curve

– Sediment transport: • c = concentration of suspended sediment• q = river discharge

– Carbon adsorption: • q = mass of pollutant sorbed per unit mass of carbon, • C = concentration of pollutant in solution

b aqc

b axy

2210 x ax a ay

2210 H aH a aq

ncKq

Page 35: Lecture (14,15)

Example – Log-Log

x y X=Log(x)

Y=Log(y)

1.2 2.1 0.18 0.74

2.8 11.5 1.03 2.44

4.3 28.1 1.46 3.34

5.4 41.9 1.69 3.74

6.8 72.3 1.92 4.28

7.9 91.4 2.07 4.52

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8 9

x

y

x vs y

0

0.5

1

1.5

2

2.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

X=Log(x)

Y=

Lo

g(y

)

X=Log(x) vs Y=log(y)

Page 36: Lecture (14,15)

Example – Log-Log

n

i=ii

n

i=i

n

i=i

n

i=i

n

i=i

YX

Y

B

a

XX

Xn

1

1

1

2

1

1

431lnln

119ln

014ln

348ln

5

1

5

1

5

1

5

1

5

1

25

1

2

5

1

5

1

. )(y)(x YX

. ) (y Y

. )(x X

. )(x X

iii

iii

ii

ii

ii

ii

ii

ii

431

119

014348

3486

.

.

B

a

..

.

Using the X’s and Y’s, not the original x’s and y’s

Page 37: Lecture (14,15)

Example – Carbon Adsorption

ncKq

q = pollutant mass sorbed per carbon massC = concentration of pollutant in solution, K = coefficient n = measure of the energy of the reaction

cnKq 101010 log log log

Page 38: Lecture (14,15)

Example – Carbon Adsorption

ncKq

Linear axes: K = 74.702, and n = 0.2289

0

50

100

150

200

250

300

350

0 100 200 300 400 500 600

C

q

Page 39: Lecture (14,15)

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3

X=Log(c)

Y=

Lo

g(q

)

Example – Carbon Adsorption

cnKq 101010 log log log

Logarithmic axes: logK = 1.8733, K = 101.6733 = 74.696, n = 0.2289

Page 40: Lecture (14,15)

Multiple Regression

• Y1 = x11 1 + x12 +…+ x1n n + 1

Y2 = x21 1 + x22 +…+ x2n n + 2

:Ym = xm1 1 + xm2 +…+ xmn n+ m

.

iii baxy Regression model

m

2

1

m1

21

11

m

2

1

xxx

yyy

n

12x22x 2nx

1nx

m2x mnx

Multiple regression model

In matrix notation

Page 41: Lecture (14,15)

m

2

1

m1

21

11

m

2

1

x

x

x

y

y

y

n

12x

22x 2nx1nx

m2x mnx

Multiple Regression (cont.)

Observed data = design matrix * parameters + residuals

XY