35
7- 1 Lecture Note #7 (Chap.11) CBE 702 Korea University Prof. Dae Ryook Yang System Modeling and Identification

Lecture Note #7 (Chap.11)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Note #7 (Chap.11)

7-1

Lecture Note #7(Chap.11)

CBE 702Korea University

Prof. Dae Ryook Yang

System Modeling and Identification

Page 2: Lecture Note #7 (Chap.11)

7-2

Chap.11 Real-time Identification

• Real-time identification– Supervision and tracking of time varying parameters for

• Adaptive control, filtering, prediction• Signal processing• Detection, diagnosis artificial neural networks, etc.

– Identification methods based on set of measurements are notsuitable

– Only few data needed to be stored– Drawbacks

• Requires a priori knowledge on model structure• Iterative solutions based on larger data sets may be difficult to

organize

Page 3: Lecture Note #7 (Chap.11)

7-3

• Recursive estimation of a constant– Consider the following noisy observation of a constant parameter

– The least-squares estimate is found as the sample average

– Recursive form

– Variance estimate of the least-squares estimate

– Note that pk→0 as k→∞

2, { } 0, { } ( 1, )k k k k i j ij ky v E v E v v kφ θ σ δ φ= + = = = ∀

1(1/ )

k

k iik yθ

== ∑

1 1(1/ )( )k k k kk yθ θ θ− −⇒ = + −1

kk ii

k yθ=

= ∑1

1 1( 1)

kk ii

k yθ−

− =− = ∑

( ) 12

1{( )( ) }

k T Tk i ii

p Eσ φ φ θ θ θ θ−

== ≈ − −∑

21 1 1

1 2 21

1 kk k k

k

pp p p

σ σ− − −

−−

= + ⇒ =+

Page 4: Lecture Note #7 (Chap.11)

7-4

• Derivation of recursive least-squares identification– Consider as usual the regressor φi and the observation yi

– The least-squares criterion based on k samples is

– The ordinary least-squares estimate

– Introduce the matrix

– Alternative form (avoiding inversion of matrices)

[ ] [ ]1 1T T

k k k kY y yφ φΦ = =

( ) (1/ 2)( ) ( ) (1/ 2) ( ) ( )T Tk k k k k k k k kV Y Yθ θ θ ε θ ε θ= − Φ − Φ =

1 11 1

( ) ( ) ( )k kT T T

k k k k k i i i ii iY yθ φ φ φ− −

= == Φ Φ Φ = ∑ ∑

1 11

( ) ( )k T T

k i i k kiP φ φ − −

== = Φ Φ∑1 1

1T

k k k kP P φ φ− −−= +

1 11 1 1 11

( ) ( ) ( )k T

k k i i k k k k k k k k k k k k kiP y y P P y P yθ φ φ θ φ θ φ φ θ

− −− − − −=

= + = + = + −∑

1 1 1 11 1 1

11 1 1 1

( ) ( ) ( )

( )

T T T Tk k k k k k k k k k

T Tk k k k k k k k

P P

P P I P P

φ φ φ φ

φ φ φ φ

− − − −− − −

−− − − −

= Φ Φ = Φ Φ + = +

= − +

1 1 1 1 1 1cf) ( ) ( )A BC A A B I CA B CA− − − − − −+ = − + Matrix inversion lemma

10( 0)P− =

0( , with )P Iα α ε=

Page 5: Lecture Note #7 (Chap.11)

7-5

Recursive Least-Squares (RLS)Identification

• The recursive least-squares (RLS) identification algorithm

• Some properties of RLS estimation– Parameter accuracy and convergence

1

1

1 11 0

1

, given1

k k k k k

Tk k k k

Tk k k k

k k Tk k k

P

y

P PP P P

P

θ θ φ ε

ε φ θ

φ φφ φ

− −−

= +

= −

= −+

2

: the parameter estimate

: the prediction error

: the parameter covariance estimate except

k

k

kP

θ

ε

σ

1 1( ) (1/ 2)( ) ( ) (1/ 2)T Tk k k k k k kQ P Pθ θ θ θ θ θ θ− −= − − =

1 11 1 1 1

1 1 21 1 1 1

2 21

2 21

1

2( ( ) ( ))

( ) 2

( ) ( 1 )

1( )

1

T Tk k k k k k k k

T T Tk k k k k k k k k k k

T Tk k k k k k k

Tk k k kT

k k k

Q Q P P

P P P

P

P

θ θ θ θ θ θ

θ θ θ φ ε φ φ ε

θ φ ε φ φ ε

θ φ ε εφ φ

− −− − − −

− −− − − −

−−

− = −

= − + +

= + + − +

= + −+

1k k k k kPθ θ φ ε−= +1 1

1T

k k k kP P φ φ− −−− =

2 2k kε ε+ −

1 11

11

T TT T k k k k k kk k k k k k T

k k k

P PP P

Pφ φ φ φ

φ φ φ φφ φ

− −−

= −+

Page 6: Lecture Note #7 (Chap.11)

7-6

– Under the linear model assumption,

• If vk=0 for all k, Q decreases in each recursion step• If Q tends to zero, it implies that tends to zero as the sequence of

weighting matrix is an increasing sequence of positive definitematrix where for all k>0.

• Theorem 11.1– The errors of estimated parameters and the prediction error for least-

squares estimation have a bound determined by the noise magnitudeaccording to

– It implies

– The parameter convergence can be obtained for a stationary stochasticprocess {vk} if . (c is a constant)

– Thus, poor convergence is obtained in cases of large disturbance and arank-deficient matrix.

1so thatT Tk k k k k k ky v vφ θ ε θ φ−= + = − +

2 21

1

1 1 1( ) ( )

2 2 1k k k kTk k k

Q Q vP

θ θ εφ φ−

− = −+

2kθ1

kP−

1 11k kP P− −

−≥

11 1 1( ) ( ) ( ) ( )

2 2 2T T T

k k k k k k kV Q P v vθ θ ε θ ε θ θ θ−+ = + =

0

2 1 1 1( ) ( )

2 2 2T T T T

k k k k k k kPv vθ ε θ ε θ θ θ= − − Φ Φ

Tk k p pkcI ×Φ Φ >

Tk kΦ Φ

Page 7: Lecture Note #7 (Chap.11)

7-7

• Properties of Pk matrix– Pk is a positive definite and symmetric matrix. (Pk=Pk

T>0)– Pk→ 0 as k → ∞

– The matrix Pk is asymptotically proportional to the parameter estimatecovariance provided that a correct model structure has been used.

– It is often called the “covariance matrix.”

• Comparison between RLS and offline LS identifications– If initial value P0 and can be chosen to be compatible with the results of

the ordinary least-squares method, the result obtained from RLS is sameas that of offline least-squares identification.

– Thus, calculate the initial values for RLS from the ordinary LS methodusing some block of initial data.

Page 8: Lecture Note #7 (Chap.11)

7-8

• Modification for time-varying parameters– The RLS gives equal weighting to old data and new data.– If the parameters are time-varying, pay less attention to old data.– Forgetting factor (λ)

– Modified RLS

– Disadvantages• The noise sensitivity becomes more prominent as λ decreases.• Pk matrix may increase as k grows if the input is such that the

magnitude of Pk-1φk is small (P-matrix explosion or covariance matrixexplosion)

2

1

1( ) ( ) (0 1)

2k k i T

k k i kiJ yθ λ φ θ λ−

== − < ≤∑

1

1

1 11 0

1

1( ), given

k k k k k

Tk k k k

Tk k k k

k k Tk k k

P

y

P PP P P

P

θ θ φ ε

ε φ θ

φ φλ λ φ φ

− −−

= +

= −

= −+

Page 9: Lecture Note #7 (Chap.11)

7-9

• Choice of forgetting factor– Trade-off between the required ability to track a time-varying parameter

(i.e., a small value of λ) and the noise sensitivity allowed– A value of λ close to 1: less sensitive to disturbance but slow tracking of

rapid variations in parameters– Default choice: 0.97≤λ≤0.995– Rough estimate of data points in memory (time constant): 1/(1–λ)

• Example 11.3: Choice of forgetting factor

2

,

{ } 0, { } ( 1, )k k k

k i j ij k

y v

E v E v v k

φ θ

σ δ φ

= +

= = = ∀

1

1

1 11 0

1

1( ), 1

k k k k k

Tk k k k

Tk k k k

k k Tk k k

P

y

P PP P P

P

θ θ φ ε

ε φ θ

φ φλ λ φ φ

− −−

= +

= −

= − =+

0.99, 0.98, 0.95λ =

Faster tracking

1 2θ = →

Page 10: Lecture Note #7 (Chap.11)

7-10

• Delta model– The disadvantage of z-transformation is that the z-transformation

parameters do not converge to the Laplace transformation continuousparameters, from which they were derived, as sampling period decreases.

– Very small sampling periods yield the very small numbers from thetransfer function numerator.

– The poles of transfer function approach the unstable domain as thesampling period decreases.

– These disadvantages can be avoided by introducing a more suitablediscrete model.

– δ-operator:

– This formulation makes the state-space realization and the correspondingsystem identification less error-prone due to favorable numerical scalingproperties of the Φ′ and Γ′ matrices as compared to the ordinary z-transform based algebra.

( 1) /z hδ ≡ −

1 (1/ )( ) (1/ )k k k k k k k k

k k k k

x x u x x u h I x h uy Cx y Cx

δ+ ′ ′= Φ + Γ = Φ + Γ = Φ − + Γ ⇒ = =

Page 11: Lecture Note #7 (Chap.11)

7-11

• Kalman filter interpretation– Assume that the time-varying system parameter θ may be described by the

state-space equation

– Kalman filter for estimation of θk

– Differences from RLS• The dynamic of Pk changes from exponential growth to linear growth

rate for φk=0 due to R1.• Pk of the Kalman filter does not approach zero as k → ∞ for a nonzero

sequence {φk}.

1 1

2

, { } 0, { } , ,, { } 0, { } , ,

Tk k k i i j ij

T Tk k k k i i j ij

v E v E v v R i jy e E e E e e R i jθ θ δ

φ θ δ+ = + = = ∀

= + = = ∀

1

1 2 1

1

1 11 1

2 1

/( )k k k k

Tk k k k k k

Tk k k k

Tk k k k

k k Tk k k

K

K P R P

y

P PP P R

R P

θ θ ε

φ φ φ

ε φ θ

φ φφ φ

− −

− −−

= +

= +

= −

= − ++

1

1

1 11 0

1

1( ), given

k k k k k

Tk k k k

Tk k k k

k k Tk k k

P

y

P PP P P

P

θ θ φ ε

ε φ θ

φ φλ λ φ φ

− −−

= +

= −

= −+

(RLS)

Page 12: Lecture Note #7 (Chap.11)

7-12

• Other forms of RLS algorithm– Basic version: - Normalized gain version

– Multivariable case:

1 1

1

1

1 11

1

[ ]

1( )

Tk k k k k k

k kk k k T

k k k k

Tk k k k

k k Tk k k k k

K y

PK P

P

P PP P

P

θ θ φ θ

φφ

λ φ φ

φ φλ λ φ φ

− −

− −−

= + −

= =+

= −+

11 1

1 1

1

[ ]

( )

11

Tk k k k k k k k

Tk k k k k k

k

k k

R y

R R R

θ θ γ φ φ θ

γ φ φλ

γ γ

−− −

− −

= + −

= + −

= +

1 and k k k k kP R R Rγ−= =

1

1 1

1argmin [ ] [ ]

2

kkT T T

k j k k k k ki j i

y yθ

θ λ φ θ φ θ−

= = +

= − Λ −

∑ ∏

( )( )

1 1

1

1 1

1

1 1 1 1

1 1

[ ]

1( )

( )

Tk k k k k k

Tk k k k k k k k

T Tk k k k k k k k k k k

k

Tk k k k k k

K y

K P P

P P P P P

θ θ φ θ

φ λ φ φ

φ λ φ φ φλ

γ ε ε

− −

− −

− − − −

− −

= + −

= Λ +

= − Λ +

Λ = Λ + − Λ

1

if , ,k

k ij j

j i

jλ λ λ λ −

= +

= ∀ =∏

Output error covariance

(Parameter error covariance update)

(Output error covariance update)

(Prediction gain update)

Page 13: Lecture Note #7 (Chap.11)

7-13

Recursive InstrumentalVariable (RIV) Method

• The ordinary IV solution

• RIV

– Standard choice of instrumental variable:• The variable xk may be, for instance, the estimated output.

– RIV has some stability problem associated with the choice of IV and theupdating of Pk matrix.

1

1 1

1

1 11

1

/(1 )

1

k k k k

Tk k k k k k

Tk k k k

Tk k k k

k k Tk k k

K

K P z P z

y

P z PP P

P z

θ θ ε

φ

ε φ θ

φφ

− −

− −−

= +

= +

= −

= −+

1 11 1

( ) ( ) ( )k kz T T T

k i i i ii iZ Z Y z z yθ φ− −

= == Φ = ∑ ∑

1 1( )A B

Tk k k n k k nz x x u u− − − −= − −

Page 14: Lecture Note #7 (Chap.11)

7-14

Recursive Prediction ErrorMethods (RPEM)

• RPEM– Consider a weighted quadratic prediction error criterion

– General RPEM search algorithm

21 1

1( ) ( ) ( 1)

2k kk i T k i

k k k i k ki iJ yθ γ λ φ θ γ λ− −

= == − =∑ ∑

[ ][ ]

1

1 1

1 1

( ) ( / )

( / ) ( )

( ) ( )

k k ik k i i ii

k k k k k

k k k k k

J

J

J J

θ γ λ ψ ε ψ ε θ

γ λ γ θ ψ ε

θ γ ψ ε θ

−=

− −

− −

′ = − ≡ −∂ ∂

′= −

′ ′= + − −

1 11 1 1

1 1

( )

( )k k k k k k k k k k

Tk k k k k k

R J R

R R R

θ θ θ θ γ ψ ε

γ φ φ

− −− − −

− −

′= − = +

= + −

1 1 1 1 1 1 1 1( ) ( ) ( ) ( ) ( ) ( : optimal)k k k k k k k k k k k kJ J Jθ θ γ ψ θ ε θ θ θ− − − − − − − − ′ ′ ′= + − −

00

Page 15: Lecture Note #7 (Chap.11)

7-15

• Stochastic gradient methods– A family of RPEM– Also, called stochastic approximation or least mean square (LMS)– Uses steepest descent method to update the parameters

– The algorithm: (Time-varying regressor-dependent gain version)

• Rapid computation as there is no Pk matrix to evaluate• Good detection of time-varying parameters• Slow convergence and noise sensitivity

– Modification for time varying parameters

• Keeping the factor rk at a lower magnitude

1

1

11

/ ( 0)

k k k k k

Tk k k k

Tk k k

Tk k k k

y

Q r Q Q

r r Q

θ θ γ φ ε

ε φ θ

γ φ

φ φ

−−

= +

= −

= = >

= +

/ ( ) / for linear modelT Tk k k k k k ky yψ ε θ φ θ θ φ φ θ≡ −∂ ∂ = ∂ − ∂ = =

11 0 1T

k k k kr r Qλ φ φ λ−−≡ ≤ ≤

Page 16: Lecture Note #7 (Chap.11)

7-16

• RPEM for multivariable case

• Projection of parameters into parameter domain DM

( )1 1

1 1

11 1

1 1

( )

( )

k k k k k k k k

Tk k k k k k k

Tk k k k k k

R

R R R

θ θ γ ψ ε θ

γ ψ ψ

γ ε ε

− −− −

−− −

− −

= + Λ

= + Λ −

Λ = Λ + − Λ

1 1

1

[ ]

if if

Tk k k k k k

k k Mk

k k M

K y

DD

θ θ φ θ

θ θθ

θ θ

− −

′ = + −

′ ′ ∈= ′ ∉

Page 17: Lecture Note #7 (Chap.11)

7-17

Recursive PseudolinearRegression (RPLR)

• Recursive pseudolinear regression (RPLR)– Also, called Recursive ML estimation, Extended LS method– The regression model:

– The recursive algorithm

– The regression vector:– The algorithm may be modified to iterate for the best possible ε.

1 1 1( )A B C

Tk k k k

Tn n n

y va a b b c c

φ θθ

= + =

1

1 1

1

1 11

1

/(1 )

1

k k k k

Tk k k k k k

Tk k k k

Tk k k k

k k Tk k k

K

K P P

y

P PP P

P

θ θ ε

φ φ φ

ε φ θ

φ φφ φ

− −

− −−

= +

= +

= −

= −+

1 1 1( )A B C

Tk k n k k n k k ny y u uθ ε ε− − − − − −= − −

Estimate of vk

Page 18: Lecture Note #7 (Chap.11)

7-18

Application to Models• RPEM to state-space innovation model

– Predictor

– Algorithm

1( ) ( ) ( ) ( ) ( )( ) ( )

k k k k

k k

x F x G u K vy H x

θ θ θ θ θθ θ

+ = + + =

Innovation

( )k k kv y y θ= −

1 1

11 1

1 11

1

1 1

1

1 1 1

( )

( )

k k k

Tk k k k k k

Tk k k k k k k

k k k k k k k

k k k k k k k

k k k

k k k k k k k k

T T Tk k k k k

y y

R R R

R

x F x G u K

y H x

W F K H W M K D

W H D

ε

γ ε ε

γ ψ ψ

θ θ γ ψ εε

ψ θ

− −

−− −

− −−

+

+ +

+

+ + +

= −

Λ = Λ + − Λ = + Λ −

= + Λ= + +

== − + −

= +[ ]

, , ,

where

( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )k k k k

k k k k

k k k k

T

k k

T

k k

T

k k

k x u

F F G G

H H K K

dy

d

dW x

d

D H x

M F x G u Kθ ε

θ θ

θ θ

ψ θ θθ

θ θθ

θ θ θθ

θ θ θ εθ

= =

= =

∂ ∂

∂+ +

Page 19: Lecture Note #7 (Chap.11)

7-19

• RPEM to general input-output models– System:

– Predictor

– Error definitions

– Parameter vector

– Regressor

1 11

1 1

( ) ( )( )

( ) ( )k k kB q C q

A q y u eF q D q

− −−

− −= +1 1

1

1 11

1 11

1 11

1 11

( ) 1

( )

( ) 1

( ) 1

( ) 1

a

a

b

b

f

f

c

c

d

d

nn

nn

nn

nn

nn

A q a q a q

B q b q b q

F q f q f q

C q c q c q

D q d q d q

−− −

−− −

−− −

−− −

−− −

= + + +

= + +

= + + +

= + + +

= + + +

1 1 1 1

1 1 1

( ) ( ) ( ) ( )( ) 1

( ) ( ) ( )k k kD q A q D q B q

y y uC q C q F q

θ− − − −

− − −

= − +

1 1 11

1 1 1

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )k k k k k k

D q B q D qy y A q y u v

C q F q C qε θ θ

− − −−

− − −

= − = − =

1

1

( )( )

( )k k

B qw u

F qθ

−= 1( ) ( ) ( )k k kv A q y wθ θ−= −

1 1 1 1 1a b f c d

Tn n n n na a b b f f c c d dθ =

1 1 1 1 1( )a b f c d

T

k k k n k k n k k n k k n k k ny y u u w w v vϕ θ ε ε− − − − − − − − − − = − − − − − −

Page 20: Lecture Note #7 (Chap.11)

7-20

– Error calculations

– Expression for prediction error

– Gradient expressions

1 1 1 1( ) ( ) ( )b b f fk k n k n k n k nw b u b u f w f wθ θ θ− − − −= + + − − −

1 1( ) ( )a ak k k n k n kv y a y a y wθ θ− −= + + + −

1 1 1 1( ) ( ) ( ) ( ) ( ) ( )d d c ck k k n k n k n k nv d v d v c cε θ θ θ θ ε θ ε θ− − − −= + + − − −

1 1 1 1

1 1 1 1 1 1

1 1 1 1

( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( ) ( )

( )

( )

d d c c

a a b b f f

c c d d

k k k n k n k n k n

k k n k n k n k n k n k n

k n k n k n k n

k k

Tk k

v d v d v c c

y a y a y b u b u f w f w

c c d v d v

y y

y

ε θ θ θ θ ε θ ε θ

θ θ

ε θ ε θ θ θ

θ

θ ϕ θ

− − − −

− − − − − −

− − − −

= + + − − −

= + + + − − − + + +

− − − + + +

= −

= − 1

1

1

1 1

1

1 1

1

1

( ) ( )( )

( ) ( )( ) ( )

( ) ( )( ) ( )( ) ( )

( ) 1( )

( )( ) 1

( )( )

kk i

i

kk i

i

kk k i

i

kk i

i

kk i

i

y D qy

a C qy D q

ub C q F q

y D qwf C q F q

yc C q

yv

d C q

θ

θ

θψ θ θ

θε θ

θθ

−−

−− −

−− −

−−

−−

∂ − ∂ ∂ ∂ ∂

= = − ∂ ∂ ∂ ∂ −

1 1 1 1 1 1

1 1

( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( )

k k

k

C q F q y F q C q D q A q y

D q B q u

θ− − − − − −

− −

= − +

( )( ) k

ky θ

ψ θθ

∂=

Page 21: Lecture Note #7 (Chap.11)

7-21

– Algorithm

1 1

11

1 1 1 1

1 1

1 1 1 1

1 1

b b f f

a a

a b f

c d

k k k

Tk k k k k k

k k k k k k

k k n k n k n k n

k k k n k n k

Tk k k n k k n k k n

k k n k k n

k k

y y

R R R

R

w b u b u f w f w

v y a y a y w

y y u u w w

v v

v

ε

γ ψ ψ

θ θ γ ψ ε

ϕ

ε ε

ε

− −

−−

− − − −

− −

+ − + − + − +

− + − +

= −

= + − = +

= + + − − −

= + + + −

= − − − −− −

= + 1 1 1 1

1 1

1 1 1 1

1 1 1 1

1 1 1 1

d d c c

d d c c

d d g g

d d g g

k n k n k n k n

Tk k

k k k n k n k n k n

k k k n k n k n k n

k k k n k n k n k n

k

d v d v c c

y

y y d y d y c y c y

u u d u d u g u g u

w w d w d w g w g w

ε ε

θ ϕ

ε ε

− − − −

+ +

− − − −

− − − −

− − − −

+ + − − −

=

= + + + − − −

= + + + − − −

= + + + − − −

= 1 1

1 1

1 1 1 1

1 1

c c

c c

a b f

c d

k k n k n

k k k n k n

Tk k k n k k n k k n

k k n k k n

c c

v v c v c v

y y u u w w

v v

ε ε

ψ

ε ε

− −

− −

+ − + − + − +

− + − +

− − −

= − − −

= − − − −− −

1 1coefficients of ( ) ( )i k kg C q F q− −=

Page 22: Lecture Note #7 (Chap.11)

7-22

Extended Kalman Filter

• Kalman filter for nonlinear state-space model– System:

– With extended state vector:

where

– Linearization

1 1

2 12

( , ) ( ) ( { } )

( , ) ( { } , { } )

Tk k k k i j ij

T Tk k k i j ij i j ij

x F x G u w E w w R

y H x e E e e R E w e R

θ θ δ

θ δ δ+ ′= + + =

′= + = =

kk

k

xX

θ

= ⇒

1 ( ) ( )

( )k k k k

k k k

X F X G u w

y H X e

θ+ = + +

= +( , )

( ) ( ) ( , )0 0

k k kk k k k k

k

F x G wF X G w H X H x

θθ

θ

′ ′= = = =

( , ) ( , )

k k

k kX X X X

dF X u dH X uF H

dX dX= =

= =

Page 23: Lecture Note #7 (Chap.11)

7-23

• Algorithm– Given (start from k=0)

– To avoid the calculation of large matrices, partition the matrices– Use of latest available measurements to update the parameter estimates

112 2

1

1 1 2

[ ][ ]

( ) [ ]

[ ]

T Tk k k k k k k

k k k k k k k k k

T T Tk k k k k k k k k

K F P H R H P H R

X F X G u K y H X

P F P F R K H P H R K

θ

+

+

= + +

= + + −

= + − +

0 0 0, ,x Pθ

( , ) ( , )( ) ( )

k k k k

kk kX X X X

dF X u dH X uF X H X

dX dX= =

= =

112 2

1

1 1 2

[ ][ ]

[ ]

( )

[ ]

T Tk k k k k k k

k k k k kk k

k k k kk k

T T Tk k k k k k k k k

K F P H R H P H R

X X K y H X

X F X G u

P F P F R K H P H R K

θ

+

+

= + +

= + −

= +

= + − +

Page 24: Lecture Note #7 (Chap.11)

7-24

Subspace Methods forEstimating State-Space Models

– Estimation of system matrices, A, B, C, and D offline

– Assuming minimal realization

• If the estimates of A and C are known, estimates of B and Dcan be obtained using linear least-squares method.

• The estimates of B and D will converge to true values if A and C areexactly known or at least consistent.

• If the (extended) observability matrix (Or) is known,then A and C can be estimated. (r>n)

1 ,n mk k k k

pk k k k

x Ax Bu w x R u R

y Cx Du v y R+ = + + ∈ ∈

= + + ∈

1( )k k k ky C qI A Bu Du v−= − + +1 1

0or ( ) ( )k k k k ky C qI A x C qI A Bu Du vδ− −= − + − + +

1

r

r

CCA

O

CA −

=

Page 25: Lecture Note #7 (Chap.11)

7-25

– For linear transformation, – For know system order (n*=n)

– For unknown system order (n*>n)

• Partition the matrices depending on the singular values and neglect theportion for smaller singular values

• The estimate of the observability matrix can be

• Obtain estimates of A and C from the estimated observability matrix

– Noisy estimate of the extended observability matrix

• If Or explains the system well, the EN stems from noise.• If EN is small, the estimate is consistent

*( ) (1: ,1: )r rG O pr n C O p n= × ⇒ =

( 1: ,1: ) (1: ( 1),1: )r rO p pr n O p r n A+ = −

*( ) (SV )DTrG O pr n G USV= × ⇒ =

1 1 1 1 1 1T T

r r rG USV G U S V O T O TV U S O T= ⇒ = = ⇒ = =

1k k r rx T x O O T−= ⇒ =

1 1 1 1or , etc. ( is invertible)r r rO U S O U O U R R= = ⇒ =

1 1 1 (other terms)T Tr NG USV U S V O T E= = + = +

Page 26: Lecture Note #7 (Chap.11)

7-26

• Using weighting matrices in the SVD– For flexibility, pretreatment before SVD can be applied

– Then the estimate of extended observability matrix becomes

– When the noise is present, W1 has important influence on space spannedby U1 and hence on the quality of the estimate of A and C.

• Estimating the extended observability matrix– The basic expression

1 1 1

1 21 1

1 21 1

k i k i k i k i

k i k i k i k i k i

i i ik k k k i k i

i ik k k i k i

y Cx Du v

CAx CBu Cw Du v

CA x CA Bu CA Bu CBu Du

CA w CA w Cw v

+ + + +

+ − + − + − + +

− −+ + − +

− −+ + − +

= + +

= + + + +

= + + + + +

+ + + + +

1 2 1 1 1T TG W GW USV U S V= = ≈

11 1rO W U R− ′=

Page 27: Lecture Note #7 (Chap.11)

7-27

– Define vectors

– Then– Introduce

– Then– To remove U-term, use

– Choose a matrix Φ/Ν so that the effect of noise vanishes

1 1

1 1

k k

k kr rk k

k r k r

y uy u

Y U

y u

+ +

+ − + −

= =

r rk r k r k kY O x S U V= + +

2 3

0 0 00 0

r

r r

DCB D

S

CA B CA B CB D− −

=

1 2 1 2

1 2 1 2

[ ] [ ]

[ ] [ ]

r r rN N

r r rN N

Y Y Y x x x

U U U V x V

= =

= =

Y X

U V

r rO S= + +Y X U V1( )T

T T⊥ −Π = −U

I U UU U1since ( ) 0T T T T

T TrO⊥ ⊥ ⊥ ⊥ −Π = Π + Π Π = − =

U U U UY X V U U UU UU U

(projection orthogonal to U)

1 1 1T T T

T T Tr r N NG O O T V

N N N⊥ ⊥ ⊥= Π Φ = Π Φ + Π Φ +U U U

Y X V

1lim lim 0

1lim lim ( has fuul rank )

T

T

TNN N

TNN N

VN

T TN

T n

→∞ →∞

→∞ →∞

= Π Φ =

= Π Φ =

U

U

V

X

Page 28: Lecture Note #7 (Chap.11)

7-28

• Finding good instrument– Let

– From the law of large numbers

– Thus, choose ϕks so that they are uncorrelated with Vk.

– Typical choice is

1 2[ ]s s sNϕ ϕ ϕ=F

( ) ( ) ( ) ( )1

1 1 1 1

1 1 1 1T

N N N NT T T TT s r r r r sk k k k k k k k

k k k k

V V U U U UN N N N

ϕ ϕ−

= = = =

Π Φ = −

∑ ∑ ∑ ∑UV

( ){ } ( ){ } ( ){ }11lim T

T T TT s r r sk k k k u k kN

E V E V U R E UN

ϕ ϕ⊥ −

→∞Π Φ = −

UV

( ){ }where Tr r

u k kR E U U= (If V and U are independent)0

1

2

1

1

k

k ssk

k

k s

y

yu

u

ϕ

=

Page 29: Lecture Note #7 (Chap.11)

7-29

• Finding the states and Estimating the noise statistics– r-step ahead predictor

– Least-squares estimate of parameters

– Predicted output

– SVD and deleting small singular values

– Alternatively

– Noise characteristics can be calculated from

r s rk k k kY U Eϕ= Θ + Γ + ⇒ = Θ + Γ +Y F U E

[ ]1

1( )T T

T TT T T T

T T

⊥ ⊥ − ΦΦ Φ Θ Γ = Φ ⇒ Θ = Π Φ ΦΠ Φ Φ

U U

UY YU Y

U UU

11 ( )T Tr r T T

NY Y ⊥ ⊥ − = = Π Φ ΦΠ Φ Φ U UY Y

11 1 1 1 1

T Tr rU S V O R S V O−≈ = =Y X

[ ] 11 1where T

NL x x L R U−= = =X Y

1k k k k k k k kw x Ax Bu v y Cx Du+= − − = − −

1 11 1 1 1 1( )T T TR U U S V R U− −= =X Y

11 1 1( , )T

rR S V O U R−= =X

Page 30: Lecture Note #7 (Chap.11)

7-30

• Subspace identification algorithm1. From the input-output data, form

2. Select weighting matrix W1 and W2 and perform SVD

• MOESP:• N4SID:• IVM:• CVA:

3. Select a full rank matrix R and define . Then solve for C and A.(Typical choices are R=I, R=S1, or R=S1

1/2)

4. Estimate B and D and x0 from the linear regression problem

5. If a noise model is sought, calculate and estimate the noise contributions.

1T

TGN

⊥= Π ΦU

Y

1 2 1 1 1T TG W GW USV U S V= = ≈

11 1rO W U R−=

(1: ,1: )rC O p n=( 1: ,1: ) (1: ( 1),1: )r rO p pr n O p r n A+ = −

0

21 10

, , 1

1arg min ( ) ( )

N

k k k kB D x k

y C qI A Bu Du C qI A xN

δ− −

=

− − − − −∑X

11 2, ((1/ ) )T T

TW I W N ⊥ − ⊥= = ΦΠ Φ ΦΠU U

11 2, ((1/ ) )T

TW I W N ⊥ −= = ΦΠ Φ ΦU

1/ 2 1/ 21 2((1/ ) ) , ((1/ ) )T

T TW N W N⊥ − −= Π = ΦΦU

Y Y1/ 2 1/ 2

1 2((1/ ) ) , ((1/ ) )T TT TW N W N⊥ − ⊥ −= Π = ΦΠ Φ

U UY Y

Page 31: Lecture Note #7 (Chap.11)

7-31

Nonlinear SystemIdentification

• General nonlinear systems

• Discrete-time nonlinear models– Hammerstein models

– Wiener models

( ) ( ( )) ( ( ), ( )) ( )( ) ( ( ), ( ))

x t f x t g x t u t v ty t h x t u t

= + +=

1 ( , )

( , )k k k k

k k k k

x f x u v

y h x u w+ = +

= +

1

1

( )( )

( )k kB z

y F uA z

−=

1

1

( )( )

( )k kB z

y F uA z

−=

Page 32: Lecture Note #7 (Chap.11)

7-32

Wiener Models

– Nonlinear aspects are approximated by Laguerre and Hermite seriesexpansions

– Lauerre operators:

– Hermite polynomial:

– Dynamics is approximated by Laguerre filter and the static nonlinearity isapproximated by the Hermite polynomial.

1 1( )

1 1

i

i

sL s

s sτ

τ τ− = + +

( )2 2

( ) ( 1)i

i x xi i

dH x e e

dx−= −

2 1

1 1

1( )

1 1

i

ia z a

L zaz az

− −

− −= − −

1 2 1 2

1 2

1 2

0 0 0

( ) ( ) ( )n n

n

nk i i i i k i k i k

i i i

y c H x H x H x∞ ∞ ∞

= = =

= ∑∑ ∑( )i

k i kx L z u=

1 2 1 2

1 2

1 2

1 0 0 0

1( ) ( ) ( )

n n

n

Nn

i i i k i k i k i kk i i i

c y H x H x H xN

∞ ∞ ∞

= = = =

= ∑∑∑ ∑

Page 33: Lecture Note #7 (Chap.11)

7-33

Volterra-Wiener Models

• Volterra series expansion

• Volterra Kernel– n-dimensional weighting function,

• Limitations of Volterra-Wiener models– Difficult to extend to systems with feedback– Difficult to relate the estimated Volterra kernels to a priori information

1 1 1 1 2 1 1 2

1 1 2 1 2

1 1 1 1 1 10 1 2 2

0 0 0 0 0 0k

k

L L L L L L

k i k i i k i k i i k i k i k ii i i i i i

y h h u h u u h u u u− − − − − −

− − − − − −= = = = = =

= + + + +∑ ∑∑ ∑∑ ∑(L is same as the model horizon in MPC)

ikh (multidimensional impulse response coefficients)

Page 34: Lecture Note #7 (Chap.11)

7-34

Power Series Expansions

• Example: Time-domain identification of nonlinear system– System:– Unknown parameters:– Collection of data:

– Least-squares solution:– Properties of this approach

• Standard statistical validation tests are applicable without extensivemodifications

• Frequency domain approach is also possible

1 1 11 01k k k k kx a x b x u b u+ = − − +

( )1 11 01Ta b bθ =

1 0 0 0 0 1

11

1 1 1 1 01N N N N N

x x x u u ab

x x x u u b− − − −

− − = − −

( ) 1T TYθ−

= Φ Φ Φ

Page 35: Lecture Note #7 (Chap.11)

7-35

• Continuous-time version– System:– Unknown parameters– Solving ODE:

– Collection of data:

– Least-squares solution:– Power-series expansion of general nonlinear system

• Similarly, The parameters can be obtained from the least-squaresmethod

1 11 01x a x b xu b u= − − +

( )1 11 01T

a b bθ =

1 1 1

0 0 0

1 1 1

1 0 1

11

1 01N N N

N N N

t t t

t t t

t t tN N

t t t

xdt xdt xdtx x ab

x x bxdt xudt udt− − −

− −− = − − −

∫ ∫ ∫

∫ ∫ ∫

1 1 1 1

1 11 01k k k k

k k k k

t t t t

t t t txdt a xdt b xudt b udt

+ + + +

= − − +∫ ∫ ∫ ∫

( ) 1T TYθ−

= Φ Φ Φ

11 1 1

( ) ( ) ( ) ( ) ( )k k m

i i jij

i i j

x t a x t b x t u t v t= = =

= − − +∑ ∑∑