Deep Energy-Based NARX Models -0.3cm 0.990.4pt40pt · 2021. 7. 14. · Deep Energy-Based NARX...

Preview:

Citation preview

Deep Energy-Based NARX Models

Johannes N. Hendriks1, Fredrik K. Gustafsson2

Antonio H. Ribeiro2, Adrian G. Wills1,Thomas B. Schon2

1The University of Newcastle, Australia2Uppsala University, Sweden

Workshop on Nonlinear System Identification Benchmarks2021

Motivation

Common performance criteria such as maximum-likelihood orprediction-error criteria usually involve assumptions about

uncertainty, be they explicit or implicit

Nonlinear SysId Benchmarks, 2021 1 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Nonlinear ARX model (Gaussian noise)

§ Data model:yt “ fθpyt´1, ut´1q ` et ,

where et „ N p0, σq.

§ fθ ù model structure.

§ Maximum likelihood estimator:

pθ “ arg minθ

Tÿ

t“1

}yt ´ fθpyt´1, ut´1q}2.

Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Nonlinear ARX model (Gaussian noise)

§ Data model:yt “ fθpyt´1, ut´1q ` et ,

where et „ N p0, σq.§ fθ ù model structure.

§ Maximum likelihood estimator:

pθ “ arg minθ

Tÿ

t“1

}yt ´ fθpyt´1, ut´1q}2.

Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Nonlinear ARX model (Gaussian noise)

§ Data model:yt “ fθpyt´1, ut´1q ` et ,

where et „ N p0, σq.§ fθ ù model structure.

§ Maximum likelihood estimator:

pθ “ arg minθ

Tÿ

t“1

}yt ´ fθpyt´1, ut´1q}2.

Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Energy-based NARX models

§ Arbitrary distributions:

yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,

§ Energy-based model:

pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q

ş

egθpγ,yt´1,ut´1q dγ,

Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)

§ gθ ù Highly flexible structure: in our case a neural network.

Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Energy-based NARX models

§ Arbitrary distributions:

yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,

§ Energy-based model:

pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q

ş

egθpγ,yt´1,ut´1q dγ,

Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)

§ gθ ù Highly flexible structure: in our case a neural network.

Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Energy-based NARX models

§ Arbitrary distributions:

yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,

§ Energy-based model:

pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q

ş

egθpγ,yt´1,ut´1q dγ,

Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)

§ gθ ù Highly flexible structure: in our case a neural network.

Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Model training

§ Maximum likelihood

pθ “ arg maxθ

Tÿ

i“1

´ log pθpyt | yt´1, ut´1q

“ arg minθ

Tÿ

t“1

ˆ

´gθpyt , xtq ` ln

ż

egθpγ,xtq dγ

˙

§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304

Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Model training

§ Maximum likelihood

pθ “ arg maxθ

Tÿ

i“1

´ log pθpyt | yt´1, ut´1q

“ arg minθ

Tÿ

t“1

ˆ

´gθpyt , xtq ` ln

ż

egθpγ,xtq dγ

˙

§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304

Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Model training

§ Maximum likelihood

pθ “ arg maxθ

Tÿ

i“1

´ log pθpyt | yt´1, ut´1q

“ arg minθ

Tÿ

t“1

ˆ

´gθpyt , xtq ` ln

ż

egθpγ,xtq dγ

˙

§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304

Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 1: AR model

yt “ 0.95yt´1 ` et .

Figure: Gaussian error et

Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 1: AR model

yt “ 0.95yt´1 ` et .

Figure: Gaussian mixture error et

Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 1: AR model

yt “ 0.95yt´1 ` et .

Figure: Cauchy error et

Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 1: AR model

yt “ 0.95yt´1 ` et .

Figure: Gaussian error et with conditional variance

Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 2: ARX model

yt “ 1.5yt´1 ´ 0.7yt´2 ` ut´1 ` 0.5ut´2 ` et ,

(a) Sequence (b) t=56

Figure: Estimates of pθpyt |xtq for a validation data.

Nonlinear SysId Benchmarks, 2021 6 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 3: nonlinear model

Model:

y˚t “

´

0.8´ 0.5e´y˚2t´1

¯

y˚t´1 ´

´

0.3` 0.9e´y˚2t´1

¯

y˚t´2

` ut´1 ` 0.2ut´2 ` 0.1ut´1ut´2 ` vt ,

yt “yt ` wt

Process and output error:

vt „N p0, σ2v q

wt „N p0, σ2v q Figure: System only with

process noise. Input inblue and output in red.

Chen, S., Billings, S.A., and Grant, P.M. (1990). Non-Linear System Identification Using Neural Networks.International Journal of Control, 51(6), 1191–1214.

Nonlinear SysId Benchmarks, 2021 7 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 3: nonlinear model

Table: Simulated nonlinear MSE on the validation set for the fullyconnected network (FCN) NARX model and EB-NARX model

N “ 100 N “ 250 N “ 500FCN EB-NARX FCN EB-NARX FCN EB-NARX

σ “ 0.1 0.122 0.099 0.069 0.070 0.057 0.054σ “ 0.3 0.398 0.390 0.353 0.354 0.289 0.308σ “ 0.5 0.860 0.869 0.809 0.822 0.754 0.779

Nonlinear SysId Benchmarks, 2021 8 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 3: nonlinear model

(a) Sequence (b) t=56

Figure: Estimates of pθpyt |xtq for a validation data.

Nonlinear SysId Benchmarks, 2021 9 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 4: Coupled Electric Drives

Figure: Illustration of the CE8 coupled electric drives system

Wigren, T. and Schoukens, M. (2017). Coupled electric drives data set and reference models. Technical Report.Uppsala Universitet, 2017

Nonlinear SysId Benchmarks, 2021 10 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 4: Coupled Electric Drives

Figure: pθpyt |xtq sequence

Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 4: Coupled Electric Drives

Figure: t “ 40

Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 4: Coupled Electric Drives

Figure: t “ 57

Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Example 4: Coupled Electric Drives

Figure: t “ 60

Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Conclusion

§ Energy based NARX learns the full conditional distributionrather than the point estimate.

§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.

§ Propagate MAP point estimates vs probabilities.

§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.

Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Conclusion

§ Energy based NARX learns the full conditional distributionrather than the point estimate.

§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.

§ Propagate MAP point estimates vs probabilities.

§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.

Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Conclusion

§ Energy based NARX learns the full conditional distributionrather than the point estimate.

§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.

§ Propagate MAP point estimates vs probabilities.

§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.

Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Conclusion

§ Energy based NARX learns the full conditional distributionrather than the point estimate.

§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.

§ Propagate MAP point estimates vs probabilities.

§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.

Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Thank you!

To appear in the 19th IFAC Symposium in System Identification.

Paper: https://arxiv.org/abs/2012.04136

Code: https://github.com/jnh277/ebm arx

Contact:johannes.hendriks@newcastle.edu.aufredrik.gustafsson@it.uu.seantonio.horta.ribeiro@it.uu.seadrian.wills@newcastle.edu.authomas.schon@it.uu.se

Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Thank you!

To appear in the 19th IFAC Symposium in System Identification.

Paper: https://arxiv.org/abs/2012.04136

Code: https://github.com/jnh277/ebm arx

Contact:johannes.hendriks@newcastle.edu.aufredrik.gustafsson@it.uu.seantonio.horta.ribeiro@it.uu.seadrian.wills@newcastle.edu.authomas.schon@it.uu.se

Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Thank you!

To appear in the 19th IFAC Symposium in System Identification.

Paper: https://arxiv.org/abs/2012.04136

Code: https://github.com/jnh277/ebm arx

Contact:johannes.hendriks@newcastle.edu.aufredrik.gustafsson@it.uu.seantonio.horta.ribeiro@it.uu.seadrian.wills@newcastle.edu.authomas.schon@it.uu.se

Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon

Recommended