Chem 302 - Math 252 Chapter 5 Regression. Linear & Nonlinear Regression Linear regression –Linear in the parameters –Does not have to be linear in the

Chem 302 - Math 252

Chapter 5Regression

Linear & Nonlinear Regression

• Linear regression– Linear in the parameters– Does not have to be linear in the

independent variable(s)– Can be solved through a system of linear

equations

• Nonlinear– Nonlinear in parameters– Usually requires linearization and iteration

0 1y a a x 2

0 1 2y a a x a x

0 1xy a a e

10

a xy a e2

0 1ay a a x

Linear Least-Squares Regression

1

,n

i i ix y

,obs ,calci i iy y

2

1

n

ii

Z

Residual

Sum of Square Residuals

Want to minimize Z ,calc :{ }i i my f x a


1,

n

i i ix y

,obs ,calc ,obs 0 1 ,obsi i i i iy y y a a x

0

1

At the min

0

0

Z

a

Z

a

,calc 0 1iy a a x ,obs 0 1 ,obs10

,obs 0 1 ,obs1 1 1

0 1

0 1

0 2 1

1 0

0

n

i ii

n n n

i ii i i

y x

x y

Zy a a x

a

y a a x

s a n a s

a n a s s

22

,obs 0 1 ,obs1 1

n n

i i ii i

Z y a a x

,obs 0 1 ,obs ,obs11

0 1

0 2n

i i ii

x xx xy

Zy a a x x

a

a s a s s

2,obs ,obs ,obs ,obs ,obs

1 1 1 1

n n n n

x i y i xx i xy i ii i i i

s x s y s x s x y


0

1

x y

x xy xy

n s sa

s s sa

0 2

1 2

y xx xy x

xx x

xy x y

xx x

s s s sa

ns s

ns s sa

ns s

Linear Regression.mws

Example 1.00,3.0 , 2.00,6.0 3.00,7.0 , 4.00,10.0

4

1.00 2.00 3.00 4.00 10.00

3.0 6.0 7.0 10.0 26.0

1.00 4.00 9.00 16.00 30.00

3.0 12.0 21.0 40.0 76.0

x

y

xx

xy

n

s

s

s

s

0 2 2

1 2 2

26.0 30.00 76.0 10.001.0

4 30.00 10.00

4 76.0 10.00 26.02.20

4 30.00 10.00

y xx xy x

xx xx

xy x y

xx xx

s s s sa

ns s

ns s sa

ns s

Linear Least-Squares RegressionUncertainties in Parameters


0

2 2

2 2 20 0

1 1

2 2 2 22 2

22 21 1

22 2 2

221

22 2

22

2

2

2

i

n n

a y yi ii i

n nxx x i xx xx x i x i

y yi ixx x xx x

ny

xx xx x i x ii

xx x

yxx xx x x x xx

xx x

a a

y y

s s x s s s x s x

ns s ns s

s s s x s xns s

s n s s s s sns s

2 22 2

2 2 222

2y xx y xx xx

xx x xxx x xx xxx x

s s Z ss n s s

ns s n ns sns s

y

Z

n m

1

2

2 212

1 2i

n

a yi i xx x

a Z n

y n ns s

Example0.80Z

0

0

22 2

0.80 30.000.6

2 4 2 4 30.00 10.00

0.8

xxa

xx x

a

Z s

n ns s

10.28a



Regression on “y”

Treat x as y and y as x

1.00,3.0 , 2.00,6.0 3.00,7.0 , 4.00,10.0

4

3.0 6.0 7.0 10.0 26.0

1.00 2.00 3.00 4.00 10.00

9.0 36.0 49.0 100.0 194.0

3.0 12.0 21.0 40.0 76.0

x

y

xx

xy

n

s

s

s

s

0 2

1 2

0.36

0.44

y xx xy x

xx xx

xy x y

xx xx

s s s sa

ns s

ns s sa

ns s

0.44 0.36

0.36 / 0.44 2.27 0.82

x y

y x x

Choose x as variable with smallest error

Can also be determined by equation


1,

n

i i ix y

At the min

0j

Z

a

,calc1

m

i k k ik

y a f x

,obs1 1

,obs1 1 1

,obs1 1 1

0 2

0

n m

i k k i j ii kj

n n m

i j i k k i j ii i k

m n n

k k i j i i j ik i i

Zy a f x f x

a

y f x a f x f x

a f x f x y f x

2

,obs1 1

n m

i k k ii k

Z y a f x

In matrix form

CA D

1

,obs1

n

jk kj k i j ii

k k

n

k i k ii

C C f x f x

A a

D y f x

1A C D

Example – Vapour Pressure of Cadmium2

1 3ln lna

p a a TT

1 2 3

1ln 1 lny p f T f T f T T

T

9.00 0.00156 60.04

0.00156 0.0000153 0.07679

60.04 0.07679 400.8

C1

45882.2 4598324.9 5992.4

4598324.9 462691895.1 600209.4

5992.4 600209.4 782.7

C

24.27

0.02605

165.8

D 1

28.74

13449

1.315

A C D

Package

Linear Least-Squares RegressionUncertainties in Parameters

2

12 2 2 2

12 2 2 2 2 1 2 1

1 1 1 1 1 1 1

2 1 1

1 1 1

l i

m

lk kn n n n m n mkl l k

a y y y y lk y lk k ii i i i k i ki i i i

n m m

y lk k i lk k ii k k

C Da a D

C C f xy y y y

C f x C f x

2 1 1

1 1 1

2 1 1

1 1 1

2 1 1 2 1 1

1 1 1 1 1

2 1 1 2 1 2 1

1 1 1

n m m

y lj j i lk k ii j k

n m m

y lj j i lk k ii j k

m m n m m

y lj lk j i k i y lj lk jkj k i j k

m m m

y lk lj jk y lk lk y llk j k

C f x C f x

C f x C f x

C C f x f x C C C

C C C C C

Z

n m

1

llC y

Z

n m

Nonlinear Least-Squares Regression

1,

n

i i ix y

At the min

0j

Z

a

,calc 1 2; , , ,i i my f x

2

,obs ,calc1

n

i ii

Z y y

This results in a system of nonlinear equations

Linearize & solve iteratively

Need initial estimate of parameters

In matrix form

C D

1 ,,

1,obs ,calc

1 ,

1

;

rrii

ri

n

jk kji j k xx

nr

k i i ii k x

r r rk k k

f fC C

fD y y x

1 C D

rrj

1, ,1

Uncertainty in parameters

k

m n m kkmF C Z

n m

Adobe Acrobat Document

Nonlinear Least-Squares Regression - ExampleVan der Waals parameters for nitrogen

2m m

RT ap

V b V

2

2

1

m

m

p

a V

p RT

b V b

p/atm T/K Vm/(L mol-1) p/atm T/K Vm/(L mol-1)

1 223.15 18.28340 5 373.15 6.13064

5 223.15 3.63436 20 373.15 1.53844

10 223.15 1.80389 50 373.15 0.621118

20 223.15 0.889748 5 473.15 7.77970

1 273.15 22.4046 10 473.15 3.89744

10 273.15 2.23174 20 473.15 1.95651

20 273.15 1.11189 50 473.15 0.792572

50 273.15 0.44191

Package

Package

Weighted Least-Squares Regression

2

,obs ,calc1

n

i i ii

Z w y y

may not always want to give equal weight to each point

Applies to linear and nonlinear case

1 ,,

1,obs ,calc

1 ,

Nonlinear case

;

rrii

ri

n

jk kj ii j k xx

nr

k i i i ii k x

f fC C w

fD w y y x

1

,obs1

Linear casen

jk kj i k i j ii

n

k i i k ii

C C w f x f x

D w y f x

Drawbacks of Iterative Matrix Method

• Local minima can cause problems

• Can be sensitive to initial guess

• Derivatives must be evaluated for each iteration

Simplex Method

• Simplex has one more vertex than dimension of space– 2D – Triangle

• m parameters – m+1 vertices

• Simplex Method used to optimize a set of parameters– Find optimal set of ’s such that Z is minimum

• More robust than previous iterative procedure– Often slower

Simplex Method

1. Evaluate Z at m+1 unique sets of parameters

2. Identify ZB (best, smallest) and ZW (worst, largest)

3. Calculate Centroid of all but worst (average of different sets of parameters ignoring worst set)

4. Reflect worst point through Centroid

1 ,2*k k k WR C

,

1k k i

i W

Cm

Simplex Method5. Replace Worst point:

a. If ZR1<ZB (reflected point is better than previous best) calculate

i. If ZR2<ZR1

replace W with R2

ii. Otherwise replace W with R1

b. If ZB<ZR1<ZW replace W with R1

c. If ZR1>ZW a contracted point id calculated

i. If ZR3<ZW replace W with R3

ii. Otherwise move all points closer to the best point

6. Repeat until converged or maximum number of iterations have been performed

2 ,3*k k k WR C

3 ,0.5*k k k WR C

Simplex Regression - ExampleVan der Waals parameters for nitrogen

2m m

RT ap

V b V

p/atm T/K Vm/(L mol-1) p/atm T/K Vm/(L mol-1)

1 223.15 18.28340 5 373.15 6.13064

5 223.15 3.63436 20 373.15 1.53844

10 223.15 1.80389 50 373.15 0.621118

20 223.15 0.889748 5 473.15 7.77970

1 273.15 22.4046 10 473.15 3.89744

10 273.15 2.23174 20 473.15 1.95651

20 273.15 1.11189 50 473.15 0.792572

50 273.15 0.44191

Nonlinear Regression Nitrogen Gas Optimization.mws

Simplex program





Simplex - ExampleIteration 1: Response 0.344652beta Response1.300000 0.050000 0.4254371.326000 0.050500 0.344652 Best1.313000 0.051000 0.579697 Worst1.313000 0.050250 Centroid1.313000 0.049500 0.229741 First reflected point1.313000 0.048750 0.116962 Second reflected point

Iteration 2: Response 0.116962beta Response1.300000 0.050000 0.425437 Worst1.326000 0.050500 0.3446521.313000 0.048750 0.116962 Best1.319500 0.049625 Centroid1.339000 0.049250 0.076378 First reflected point1.358500 0.048875 0.011665 Second reflected point

Iteration 3: Response 0.0116649beta Response1.358500 0.048875 0.011665 Best1.326000 0.050500 0.344652 Worst1.313000 0.048750 0.1169621.335750 0.048812 Centroid1.345500 0.047125 0.041013 First reflected point

Iteration 4: Response 0.0116649beta Response1.358500 0.048875 0.011665 Best1.345500 0.047125 0.0410131.313000 0.048750 0.116962 Worst1.352000 0.048000 Centroid1.391000 0.047250 0.195042 First reflected point1.332500 0.048375 0.027212 Contracted point

Iteration 31: Response 0.00543252beta Response1.393487 0.049624 0.0054331.393340 0.049619 0.005433 Best1.393220 0.049616 0.005433 Worst1.393413 0.049621 Centroid1.393607 0.049627 0.005433 First reflected point1.393317 0.049619 0.005433 Contracted point

Iteration 32: Response 0.00543252beta Response1.393487 0.049624 0.005433 Worst1.393340 0.049619 0.0054331.393317 0.049619 0.005433 Best1.393328 0.049619 Centroid1.393170 0.049613 0.005433 First reflected point1.393408 0.049621 0.005433 Contracted point

Iterations converged. R^2 0.999999

Final Converged Parametersk beta0 1.393321 0.0496186

Simplex – Example (Iteration 1)

0.045

0.046

0.047

0.048

0.049

0.05

0.051

0.052

1.29 1.31 1.33 1.35 1.37 1.39

a

b

BW

C

R1

R2


0.045

0.046

0.047

0.048

0.049

0.05

0.051

0.052

1.29 1.31 1.33 1.35 1.37 1.39

a

b

B

W

C R1

R2


0.045

0.046

0.047

0.048

0.049

0.05

0.051

0.052

1.29 1.31 1.33 1.35 1.37 1.39

a

b

B

W

C

R1


0.045

0.046

0.047

0.048

0.049

0.05

0.051

0.052

1.29 1.31 1.33 1.35 1.37 1.39

a

b

BW

C

R1

Contracted


0.049612

0.049614

0.049616

0.049618

0.04962

0.049622

0.049624

0.049626

1.3931 1.3932 1.3933 1.3934 1.3935 1.3936

a

b B

W

C

R1

Contracted

Comparing Models

• Often have more than 1 equation that can be used to represent the data

• If two equations (models) have the same number of parameters the one with smaller Z is a better representation (fit)

• If two models have different number of parameters then can not do a direct comparison– Need to use F distribution & Confidence level– Model A – fewer number of parameters

Model B – larger number of parameters

Comparing Models

Model B is a better model if (and only if)

, ,1B B

A B

B A A B B Am n m

B BB

B

Z Zm m Z Z m m

FZ n mZ

n m

Usually lookup F in Table and compare ratios

With Maple can calculate confidence level for which B is a better model than A

Linear Regression Heat capacity of CO2.mwsLinear Regression Cd VP 2.mws

Documents

Chem 302 - Math 252 Chapter 5 Regression. Linear & Nonlinear Regression Linear regression –Linear in the parameters –Does not have to be linear in the